Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LCM: Time evaluator gives negative deltaTime after STKDiscretization::writeSolution #2

Open
JustinClough opened this issue Aug 29, 2019 · 14 comments

Comments

@JustinClough
Copy link
Contributor

The value of deltaTime(0) is negative after STKDiscretization::writeSolution(...) is called. I think this may be from workset.current_time being set to zero after the write here.

This is an issue for me as the creep model uses this deltaTime(0) value to compute its state and I happen to be using Gmsh meshes which are built into STK.

I made a commit on a separate branch that shows which other tests could be affected by this. I just added a simple Teuchos check for negative deltaTime(0) values.

Any ideas on a solution or temporary workaround?

@bartgol
Copy link
Contributor

bartgol commented Aug 29, 2019

I have not enough knowledge of LCM (and I think this is LCM-specific, right?). I don't see where we set current_time to zero though. The only place where we change it seems to be inside the Application, and we set it to either itself or the value we get for Time from the parameter library. Perhaps there's some issue with how Time is inited/handled in the parameter library?

@lxmota
Copy link
Collaborator

lxmota commented Aug 29, 2019

I don't see where it's set to zero either. I agree with @bartgol that this may be an issue of how time is initialized in the parameter library.

@ikalash
Copy link
Collaborator

ikalash commented Aug 29, 2019

Didn't @calleman21 put this logic setting the parameter to 0 at some point to hack something within the thermo-mechanics problems?

@lxmota
Copy link
Collaborator

lxmota commented Aug 29, 2019

@ikalash Ah yes, I remember something like that. I don't recall that the time was set to zero, though, although I could be wrong about that. In any case, I'll try to look for that logic, and see if something can be done about it.

@ikalash
Copy link
Collaborator

ikalash commented Sep 8, 2019

OK, I've had a chance to look at this by playing around in your branch @JustinClough .

The problem you bring up does not have anything to do with Coleman's changes awhile ago, as I was originally hypothesizing.

The situation you bring up seems to occur in two situations:

1.) The LOCA stepper for some reason goes past the specified "Max Time" in the parameter list, then has to go back. This happens for SurfaceElement_SERIAL, for example, which you can see in the CDash: http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4895087&build=88800 . If you search for "writeSolution", you'll find:

STKDiscretization::writeSolution: writing time 0 to index 1 in file out_tpetra.e
STKDiscretization::writeSolution: writing time 1.00289034e+00 to index 2 in file out_tpetra.e
STKDiscretization::writeSolution: writing time 1.00000000e+00 with label 2.00000000e+00 to index 3 in file out_tpetra.e

You can see the second time is greater than the third. I don't know all the magic inner workings of LOCA (@agsalin or @rppawlo would be able to say more about it) but I think this sort of situation can happen due to the selection of the Predictor->Method in the LOCA parameterlist. The SurfaceElement problem has it set to "Tangent". If you remove the predictor or set it to "Constant" or "Secant" the issue will go away.

2.) The second situation is the one that affects most of the tests that fail in your branch. The issue is the "IP to Nodal Field" or "Project IP to Nodal Field" responses. Those responses do a solve after a computation completes, and the value of "Time" that the responses get is 0 rather than the actual value. The deltaTime computation gets called, and you end up with a negative deltaTime. I'd have to dig further to figure out why Time isn't updated in the responses. @lxmota may have some ideas. My guess is the value did not matter for those responses, and the deltaTime there is also irrelevant, so no one caught this.

Does your case fall into either of the two scenarios above, Justin?

Also, since you were saying you're not sure where the Time_old gets set, it is here:

https://github.com/SNLComputation/Albany/blob/master/src/Albany_StateManager.cpp#L882

ikalash referenced this issue in sandialabs/Albany Sep 9, 2019
in the case of the IP to Nodal Field or Project IP to Nodal Field
response).
ikalash referenced this issue in sandialabs/Albany Sep 9, 2019
that was showing up with the IP to Nodal Field and Project IP to Nodal Field
respones.

This fix cannot yet go in master, as it involves uncommenting fixTime() call,
which I think is needed for some LCM problem - need to talk to Alejandro.

There are still some failures due to a negative time issue - need
to figure out what is going on with those.
@ikalash
Copy link
Collaborator

ikalash commented Sep 9, 2019

Update: I found the issue for the IP to Nodal Field and Project IP to Nodal Field, and pushed a fix to your branch, @JustinClough . @lxmota , it looks like the following call to fixTime() messes up the time passed to these responses for some problems run quasi-statically: https://github.com/SNLComputation/Albany/blob/master/src/Albany_Application.cpp#L2835 . The fix over-writes the current time with 0. Perhaps you could look into this sometime? I don't want to mess with fixTime() in master b/c if I remember correctly, you introduced it b/c it was needed for some problems (perhaps even Schwarz?).

Unfortunately there are still some failures due to some third unknown at this time issue:

The following tests FAILED:
        145 - HydrogenKfieldBC (Failed)
        168 - MechanicsPorePressureLocalized_Serial (Failed)
        169 - MechanicsPorePressureParallelFlow_Serial (Failed)
        170 - MechanicsRigidBody (Failed)
        171 - MechanicsWithHelium_JustMechanics (Failed)
        172 - MechanicsWithHelium_MechanicsAndHydrogen (Failed)
        178 - MechanicsWithHydrogenOrthogonal_SERIAL (Failed)

I'll try to figure out what is going on with these. I'm curious if the problem you are running is affected by the third (unknown) issue, or one of the 2 I mentioned above (1.) should be fixable by changing the predictor method, 2.) should be fixed now in your branch).

@rppawlo
Copy link
Contributor

rppawlo commented Sep 9, 2019

adding @etphipp for loca

ikalash referenced this issue in sandialabs/Albany Sep 9, 2019
@etphipp
Copy link
Contributor

etphipp commented Sep 9, 2019

Regarding LOCA, if you use pseudo-arclength continuation, then Newton steps are constrained to be orthogonal to the predictor direction, which may include changes to the parameter throughout the nonlinear solve for a step (this is how it gets around turning points). When you reach a bound for the parameter, LOCA does its best to try and pick a step size so the converged solution will hit the bound, but because of the pseudo-arclength constraint described above, it may go past the bound slightly, in which case LOCA does one extra step of natural continuation (where the parameter is fixed). When the parameter is time, this can cause a negative delta-T.

If you know your system does not have bifurcations, the simplest solution is to just use natural continuation. This is similar to using the constant predictor as described above, but results in a simpler nonlinear system. If you really need arclength continuation, you have to accept that there might be steps with negative time steps.

@ikalash
Copy link
Collaborator

ikalash commented Sep 9, 2019

Thanks for the explanation @etphipp . I did notice when I was playing around with it that for the cases where 1.) was the issue, changing to natural continuation fixed the problem.

@lxmota
Copy link
Collaborator

lxmota commented Sep 9, 2019

@ikalash My recollection is that the fixTime function was already there, and I only wrote a special case for Schwarz. Since Justin is not using Schwarz, he should not be affected if there is an error in that logic. Git blame says that I modified this function last, but that was probably just formatting, as I always apply clang-format liberally. In fact, to this day I'm not sure what the purpose if this function is, other that what it seems to do from the logic in the code (use the current time or a supplied value of time if there is one in the parameter list). I only remember that it was messing up the time for Schwarz, and thus the logic for it.

@ikalash
Copy link
Collaborator

ikalash commented Sep 9, 2019

Thanks @lxmota . It could very well be that git blame is misleading, as you say. Another question: were you the original author of setCurrentTime() and getCurrentTime()? That's another routine that leads to part of the problem (seems there is a missing setCurrenTime() call in the Albany::PiroObserver class that is needed given this logic in Albany::ModelEvaluator: https://github.com/SNLComputation/Albany/blob/master/src/Albany_ModelEvaluator.cpp#L691 ). git blame says it was Luca but I seem to remember that the routines were added for Schwarz - perhaps my memory is failing me (again)?

@bartgol
Copy link
Contributor

bartgol commented Sep 10, 2019

There used to be two classes, PiroObserver and PiroObserverT. Could be that during the unification during the Thyra refactor I lost some details.

@JustinClough
Copy link
Contributor Author

Sorry for the delayed response. As far as I know, my situation is the second one you mentioned:

2.) The second situation is the one that affects most of the tests that fail in your branch. The issue is the "IP to Nodal Field" or "Project IP to Nodal Field" responses. Those responses do a solve after a computation completes, and the value of "Time" that the responses get is 0 rather than the actual value. The deltaTime computation gets called, and you end up with a negative deltaTime. I'd have to dig further to figure out why Time isn't updated in the responses. lxmota may have some ideas. My guess is the value did not matter for those responses, and the deltaTime there is also irrelevant, so no one caught this.

@ikalash
Copy link
Collaborator

ikalash commented Sep 18, 2019

No worries. I haven't had time to look at this more since my last comment.

In this case it should be fixed in your branch - please give it a try. You could also turn off the response to see if it makes the issue go away to double check that that is indeed it.

I think you found something interesting and the behavior in master where the dt can be negative should not happen. I think putting an assert to prevent dt from being negative like you did in your branch, and fixing the cases where it happens would be worthwhile in master. Does anyone disagree? I'd still like to understand what is going on with the third case I mentioned that isn't covered by 1.) and 2.). Also my fixes in Justin's branch break a couple of tests for some reason and I'd like to understand that too - something I did must be a bit off.

Anyway, lots to do, but hopefully with a temporary fix in place, it's not super urgent.

@lxmota lxmota transferred this issue from sandialabs/Albany Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants