diff --git a/rfcs/2022-04-20-smp-conf-timeout-recovery.md b/rfcs/2022-04-20-smp-conf-timeout-recovery.md new file mode 100644 index 000000000..7c7f84caa --- /dev/null +++ b/rfcs/2022-04-20-smp-conf-timeout-recovery.md @@ -0,0 +1,21 @@ +# SMP confirmation timeout recovery + +## Problem + +When sending an SMP confirmation a network timeout can lead to the following race condition: +- server receives the confirmation while the joining party fails to receive the server's response; +- joining party deletes the connection together with credentials sent in the confirmation for securing the queue; +- initiating party will receive the confirmation from the server and secure the queue; +- on subsequent attempt to join via the same invitation link initiating party will generate new credentials and fail authorization. + +This renders the joining party permanently unable to join via that invitation link and complete the connection. + +## Solution + +A possible solution is to keep and try to reuse same credentials on subsequent attempts: +- joining party has to remember invitation link when saving the connection; +- if SMP confirmation fails due to network timeout joining party doesn't delete the connection and keeps the credentials; +- when joining, joining party checks whether such invitation link was already used for a connection, if yes: + - joining party tries to send SMP confirmation with the same credentials; + - if this SMP confirmation fails with authorization error (for example it can happen due to race condition explained above) joining party tries to send HELLO message; + - if HELLO message fails with authorization error (it can happen if connection was deleted or secured with different credentials), the recovery is no longer possible and connection can be deleted. diff --git a/src/Simplex/Messaging/Agent.hs b/src/Simplex/Messaging/Agent.hs index 4ed89185a..39e351f99 100644 --- a/src/Simplex/Messaging/Agent.hs +++ b/src/Simplex/Messaging/Agent.hs @@ -278,6 +278,7 @@ joinConn c connId (CRInvitationUri (ConnReqUriData _ agentVRange (qUri :| _)) e2 void $ enqueueMessage c connId' sq HELLO pure connId' Left e -> do + -- TODO recovery for failure on network timeout, see rfcs/2022-04-20-smp-conf-timeout-recovery.md withStore (`deleteConn` connId') throwError e _ -> throwError $ AGENT A_VERSION