Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TonY Portal shows job status as RUNNING even after it has finished #193

Open
erwa opened this issue Feb 26, 2019 · 1 comment
Open

TonY Portal shows job status as RUNNING even after it has finished #193

erwa opened this issue Feb 26, 2019 · 1 comment
Assignees

Comments

@erwa
Copy link
Contributor

erwa commented Feb 26, 2019

With the latest TonY code, I've noticed that the TonY Portal shows applications as RUNNING even after they have finished. Consequently, when I click on the "<application_id>" link, it returns an error "Cannot display events because job is still running". However, the tony-final.xml and RM links still work.

@erwa erwa self-assigned this Feb 26, 2019
@erwa erwa changed the title TonY History Server shows job status as RUNNING even after it has finished TonY Portal shows job status as RUNNING even after it has finished May 2, 2019
@erwa
Copy link
Contributor Author

erwa commented Jul 26, 2019

@StanfordMCP , some thoughts on this:

  • if the job finishes normally, the .jhist.inprogress file should get renamed to .jhist (this happens at the end of ApplicationMaster.run() -- eventHandler.stop() is called and that calls moveInProgressToFinal())
  • what might be happening is when the job starts running, the TonY Portal has already loaded it into its in-memory cache as a RUNNING job and when the job finishes, it won't get updated to SUCCEEDED/FAILED until the next time the HistoryFileMover runs. One way to speed this up and have the job's state updated immediately is for the job to query some "I'm finished" endpoint in the TonY Portal so TonY Portal knows it can immediately process that job and update its in-memory state

Jobs that get KILLED in the middle are trickier. As far as I know, if you "yarn kill" an application, whether from the CLI or UI, the application doesn't have an opportunity to do a graceful shutdown (e.g.: change the history file to a finished state) (@hungj , I think this would be a nice feature to add in YARN.), so the history file might forever remain as .jhist.inprogress and TonY Portal will forever show it as RUNNING. Currently, the job files will eventually be cleaned up by the HistoryFilePurger after they've hit the retention period (default 30 days). However, it would be nice if the HistoryFileMover could also periodically check in-progress jobs by querying the RM to detect KILLED jobs. If a job has a .jhist.inprogress file but the RM says the job has already been KILLED, then the HistoryFileMover can go ahead and move the files and update its in-memory state.

Another thing to check if you see this issue coming up for normally-terminated jobs is what version of TonY they're using. Also, you should see if after 5 minutes (the default history file mover check interval), whether the history files are moved to the finished/ directory and the state of the job updated in the TonY Portal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants