-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle idempotent retry after error during initializing phase #1406
Conversation
…pPhase enum Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
Codecov Report
@@ Coverage Diff @@
## main #1406 +/- ##
=======================================
Coverage 99.98% 99.99%
=======================================
Files 321 321
Lines 22960 23026 +66
=======================================
+ Hits 22956 23024 +68
+ Misses 2 1 -1
+ Partials 2 1 -1
|
Validated with raw blockchain transactions:
|
_, err = am.operations.RunOperation(ctx, opActivatePool(op, pool)) | ||
_, err = am.operations.RunOperation(ctx, opActivatePool(op, pool), | ||
false, // TODO: this operation should be made idempotent, but cannot inherit this from the TX per our normal semantics | ||
// as the transaction is only on the submitting side and this is triggered on all parties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this saying we insert the operation but not the transaction on the non-submitter members? That seems odd/incorrect.
FWIW activation is an idempotent operation at the connector level (so shouldn't be problematic if we activate multiple times) - not sure if that changes how we want to handle this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my reading of the code - would appreciate if you can confirm for me.
It was based on reading this:
firefly/internal/definitions/handler_tokenpool.go
Lines 99 to 107 in 7067eb5
// Message will remain unconfirmed, but plugin will be notified to activate the pool | |
// This will ultimately trigger a pool creation event and a rewind | |
state.AddPreFinalize(func(ctx context.Context) error { | |
if err := dh.assets.ActivateTokenPool(ctx, pool); err != nil { | |
log.L(ctx).Errorf("Failed to activate token pool '%s': %s", pool.ID, err) | |
return err | |
} | |
return nil | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do eventually persist the token pool TX, after the activation completes:
firefly/internal/events/token_pool_created.go
Line 149 in 7067eb5
return em.confirmPool(ctx, existingPool, pool.Event) |
So in the happy path, you do end up with a TX on all members, and the activation operations end up showing underneath it.
But it feels like we should probably persist the TX earlier, before we start creating operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something you'd like to raise a separate issue for @awrichar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…r creating the TX Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
I extended the fix after testing, as I found in the tokens scenarios if I attempt to recreate the error by having the Then when we came back round We actually needed to know _resubmitted zero, because no operations exist (vs. all operations reached So I added a third |
Completed E2E testing (noting it's important to also have hyperledger/firefly-tokens-erc20-erc721#137 applied). |
internal/assets/operations.go
Outdated
} | ||
switch data.Transfer.Type { | ||
case core.TokenTransferTypeMint: | ||
return nil, false, plugin.MintTokens(ctx, op.NamespacedIDString(), data.Pool.Locator, data.Transfer, data.Pool.Methods) | ||
return nil, core.OpPhaseInitializing, plugin.MintTokens(ctx, op.NamespacedIDString(), data.Pool.Locator, data.Transfer, data.Pool.Methods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these transfer operations all considered "initializing"? If we get a successful return from the plugin, doesn't that mean the transaction has made it all the way through the token connector to the blockchain connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the phase we're in as a result of the error in this layer of the code, rather than the operation state.
e.g. this error occurred before we submitted the operation to the connector, so the calling code can decide if that means we don't store the operation at all, or we store it in state Initializing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not analyzed very scenario to know if the error path is such, that it could never succeed.
If I had done that (quite large) piece of work everywhere, I could have introduced a phase of OpPhaseRejected
to mean - "this is never going to work, tell them it's broken... without even storing the operation at all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - so the phase is more meaningful in an error return case in order to determine how to respond to the error (not so much in a success case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed in some managers you used operations.ErrTernary
, but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - yes, this should be operations.ErrTernary
almost certainly - I'd thought this was an error return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - added a commit for that
approval: approval, | ||
mgr: am, | ||
approval: approval, | ||
idempotentSubmit: approval.IdempotencyKey != "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we really need to store this separate bool here, as we can always compute it from the approval
above at the point we need it (but I'm fine either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to do more investigation to convince myself of that.. and if I did so, I'd probably want to rename approval
to approvalInput
(rather than the implying it's the actual approval object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... if it's ok, think I'd like to leave it as is for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, the thing we store is the input DTO, so possible it's named incorrectly.
if resubmitErr != nil { | ||
// Error doing resubmit, return the new error | ||
return false, resubmitErr | ||
} | ||
if len(resubmitted) > 0 { | ||
if total == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic feels slightly confusing to me. I guess what we're trying to differentiate is these 3 cases:
- no operations have ever been started for this transaction
- some operations were previously initialized for this transaction, but we think they weren't fully submitted, so we've now retried them
- some operations were previously initialized for this transaction, but we think they're all currently in flight, so we can't do anything but continue waiting
We're expressing these 3 states with a combination of two integer returns... wonder if there's a cleaner way to express it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - agreed.
I think pushing all this into a combined operation+transaction manager as a function that has plug points for the differentiation, rather than the whole of this section being boilerplate copied in lots of places would be good.
Choice is whether this little bit of extension to the boilerplate is the point to take on the bigger refactor.
I could merge the SubmitNewTransaction
and ResubmitOperations
functions to a new function, contains this logic in an opinionated way and take in the idempotencyKey
out of the input-request, and return:
- It's all done now - please return without error to the user
- Nothing was done, here's the existing transaction ID
- We've never heard of this idempotency key, here's a shiny new TX
- This is already submitted completely, 409 to your user
- Something else went wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, point taken - it may not be worth continuing to iterate on this interface until we have the cycles to do a larger refactor. We should capture that next step as a task though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #1410
Signed-off-by: Peter Broadhurst <peter.broadhurst@kaleido.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think all remaining feedback has been captured in new issues.
Currently if an error occurs from a blockchain connector (EVMConnect for example) on submission of a transaction, then the operation goes immediately into
Failed
status - even if anidempotencyKey
was supplied.This means that a subsequent attempt to re-submit the transaction fails with:
FF10431: Idempotency key '...' already used for transaction '...'
This is not correct per the architecture described in https://hyperledger.github.io/firefly/head/reference/idempotency.html
Because the error happened during the submission of the idempotent transaction identifier into its FireFly Transaction Manager (FFTM) database, it is safe for a future retry to attempt to insert it again.
At that later point, the connector can return a
409
if it's already received that transaction, or it can successfully accept it into it's DB and process it.This PR makes the following changes to move us towards the architecture defined behavior in this scenario:
RunOperation
used to returncomplete
as abool
, a newOpPhase
has been introduced:OpPhaseComplete
- good or bad, the operation is completeOpPhasePending
- the operation has been accepted, and it's completion will happen asynchronouslyOpPhaseInitializing
- the operation has not transitioned out of initializing yetidempotentSubmit
parameter is addedtrue
for all transactions whereidempotencyKey != ""
true
a transaction will stay inInitializing
(not move toFailed
) if:phase
returned fromRunOperation
isOpPhaseInitializing
These changes thus ensure that the
Initializing
status is held on to longer than is currently true.Before:
Initializing
was only the status until the first attempt to submit the transaction was initiatedAfter:
Initializing
remains the status until the connector receives the idempotent submission of the TX