-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time averaged field/moment output #57
Comments
It should be supported by just setting the corresponding parameters, where // ======================================================================
// OutputFieldsCParams
struct OutputFieldsCParams
{
const char *data_dir = {"."};
int pfield_step = 0;
int pfield_first = 0;
int tfield_step = 0;
int tfield_first = 0;
int tfield_length = 1000000;
int tfield_every = 1;
Int3 rn = {};
Int3 rx = {1000000, 1000000, 100000};
}; I think the way this works (you can look at
So, as an example, Having said this, I haven't used this in a while, so there's definitely a chance that something may not work as expected, so let me know if you see any issues. Also, for what it's worth, I'm in the process of simplifying that whole area in a way that should make it more straight forward to implement the output directly yourself in the case, making it easier to adapt it to specific needs. |
The code you've shown above looks different than what's in my psc_flatfoil_xz.cxx: // -- output fields // -- output particles I am up to date on the branch - is this located in another file? |
Yeah, sorry, I wasn't clear. the above is from So here in outf_params.pfield_step = 2000; you can add outf_params.tfield_step = 2000;
outf_params.tfield_every = 100; or something like that. |
Ahh perfect, makes perfect sense. Thanks for the quick replies Kai! |
After trying to use the tfields, I find I get a segfault, right as the tfd-000000_h5 file is created (nothing is written into it). This occurs right after field balancing, before the first time step. The log file is located (and should be accessible) at /gpfs/alpine/proj-shared/ast147/jmatteuc/flatfoil-summit-547640/flatfoil_summit004.547640 |
Can you try again with the latest master? I managed to reproduce a crash that was likely it, but maybe that wasn't the only problem. |
Hmmm, just tried but I got another crash. This time the output file only contains this: [c03n17:140751] *** An error occurred in MPI_Alltoallv Sender: LSF System lsfadmin@batch5 Job <flatfoil_summit004> was submitted from host by user in cluster at Thu Aug 8 10:49:34 2019 The output (if any) is above this job summary. I am running the small case on GPUs with tfd and pfd on. |
Can you attach the |
I've narrowed it down to still breaking on 4 GPUs on 1 node |
Well, I still haven't been able to reproduce it. I ran the current master with your The error you're seeing indicates that it happens while redistributing field data for writing output, but it's not very specific beyond that. Can you point me to where the log file is? I see other runs in your dir on summit, but those other runs don't seem to have the logs that go with them. |
Hmmm, I'm pretty stumped, I just cloned a whole new version, just added the two lines for the tfields and ran it, getting the same error. All the files should be here: /gpfs/alpine/proj-shared/ast147/jmatteuc/flatfoil-summit-556325 Also, just a heads up running for 1/2 hour and getting to 3500 steps seems a bit slower than normal - I think something changed with the heating (or injection, but pretty sure heating) that's now taking an inordinate amount of time. Thanks! |
Okay, I can reproduce it now. Somehow it seems that the problem doesn't happen for a Debug build (and I actually tried both yesterday, but I think I ended up confusing myself having both a On the heating performance, I saw the other issue. It sounds like it's related to the change where I generate unique random numbers for each cuda thread now, rather than just 1024 of those, but that shouldn't have a lasting impact on performance, so I'll need to look at that next. |
So unfortunately, I'm still stumped about what's happening. You can workaround the issue by adding To give a bit of background, normally the release flags contain There is one exception to this logic, though I don't think I've done anything like this: If you write // ...
assert(i++ < 100);
// ... having Anyway, there are upcoming changes to the workings of the output, so maybe they'll happen to make a difference, anyway. Hopefully the workaround will be enough to allow you to move on for now. |
Ok thanks Kai. I think for now this will suffice, and it's not the worst deal to just output p-fields. The delay that happens in the heating operator is pretty crippling right now - kinda disables the ability to do any bigger runs, so I'd say that's the top priority bug now.. Thanks again. |
Is this in the code? If so, how do we turn it on? Sorry if I'm missing something obvious that we discussed already!
The text was updated successfully, but these errors were encountered: