Workflow intermittently executing twice (or blocks executing twice)

This is an urgent issue for us, I reported to Retool Support back at the end of November, and received a reply yesterday stating that Support no longer deal with technical defects within the Workflows product, and that community support is the correct channel. This seems odd for something we pay for, however I'm willing to follow the instruction.

I'm investigating a recurring intermittent issue, whereby it appears that when a workflow is triggered, sometimes the blocks run twice, causing issues (e.g. attempting to insert data twice).

For the sake of example, in a workflow triggered by a webhook, we receive an array called subscriptions, which has two objects in it, the values of which will be inserted into a database table.

The blocks involved only have single connections, so are structured as startTrigger > payload > createSubscriptions

This has a startTrigger block (webhook trigger) linked to a JavaScript block (called payload) which contains:

console.log(startTrigger.data)
return startTrigger.data

Which triggers the createSubscriptions block which is a loop block.

In the Run history, when selecting payload we see:

--- Running query: payload --- Th 11/30/2023 14:15:42
the logged data
--- Successfully finished running query: payload --- Th 11/30/2023 14:15:42
--- Running query: payload --- Th 11/30/2023 14:15:46
the logged data
--- Successfully finished running query: payload --- Th 11/30/2023 14:15:46

Indicating that the block ran twice, 4 seconds apart, it should not have run twice.

Then when selecting the createSubscriptions entry in the run history we see:

--- Running query: createSubscriptions --- Th 11/30/2023 14:15:43
--- Running triggered block: query11 --- Th 11/30/2023 14:15:43
--- Running triggered block: query11 --- Th 11/30/2023 14:15:43
--- Successfully finished running triggered block: query11 --- Th 11/30/2023 14:15:44
--- Successfully finished running triggered block: query11 --- Th 11/30/2023 14:15:44
--- Successfully finished running query: createSubscriptions --- Th 11/30/2023 14:15:44
--- Running query: createSubscriptions --- Th 11/30/2023 14:15:47
--- Running triggered block: query11 --- Th 11/30/2023 14:15:47
--- Running triggered block: query11 --- Th 11/30/2023 14:15:47
Error evaluating query11: Duplicate entry '810972e8-649d-4cc3-8883-d1f90a38fc68' for key 'PRIMARY' Th 11/30/2023 14:15:47
--- Failed running triggered block: query11 --- Th 11/30/2023 14:15:47
Error evaluating createSubscriptions: Error: Duplicate entry '810972e8' for key 'PRIMARY' (line 6) Th 11/30/2023 14:15:47
--- Failed running query: createSubscriptions --- Th 11/30/2023 14:15:47
Error evaluating query11: Duplicate entry '8fc3f0d1' for key 'PRIMARY' Th 11/30/2023 14:15:47
--- Failed running triggered block: query11 ---Th 11/30/2023 14:15:47

Here we can see that createSubscriptions ran twice, I assume because payload ran twice, but I don't know why payload executed twice from the webhook trigger.

query11 is the loop lambda, and it ran twice (because there were two objects in the array) to insert the data into the database. Then a few seconds afterwards, it ran again, presumably because the payload block executed twice.

This is causing the workflow to fail (because the same entries cannot be inserted twice).

Why is the workflow or workflow blocks executing twice?

The previous 28 runs of this workflow do not exhibit this behaviour, and then the 29th did the same thing. There is no retry logic on any of the blocks. This is happening sufficiently regularly enough that I'm considering moving our workflows from Retool to AWS Step Functions for reliability.

A couple things to look at maybe?

  • Is your payload by any chance a loop?
    • Does it have any kind of retry options set?
  • It looks like your startTrigger is an array and that's where the ID's originate, have you confirmed they aren't duplicated in there? I realize it looks like everything is running twice, but worth double checking to ensure that's the cause of the error.

Thanks @MikeCB

payload has no retry options set. No loop code in the block, the block's entire JS code is:

console.log(startTrigger.data)
return startTrigger.data

Looking at the startTrigger shows the array received only contained the two objects expected (no duplication).

Selecting the startTrigger in the Run History also shows it ran twice.


--- Running query: startTrigger ---
Th 11/30/2023 14:15:42
--- Successfully finished running query: startTrigger ---
Th 11/30/2023 14:15:42
--- Running query: startTrigger ---
Th 11/30/2023 14:15:46
--- Successfully finished running query: startTrigger ---

Selecting the Global (all blocks) item in the run history and looking for --- Running query: startTrigger --- shows two entries.

That's really strange, particularly for the startTrigger. Maybe try exporting your workflow as a JSON file and having a look at the code? See if you have a second startTrigger block? It might show up in your Outline view if it's there, but I'm really not sure. I've had flows break before and the only way to fix them was to re-create them, like something became corrupted and the entire thing stopped working as expected, but I've never seen a startTrigger fire twice.

I did check the JSON export of the workflow, there is only one object in the blockData array with "pluginId": "startTrigger".

And what is concerning me is that it happens once every 30-40 executions, which makes me think it's the underlying engine at fault, rather than the workflow definition.

JSON is (with UUIDs redacted):

{
  "name": "Subscriptions",
  "description": null,
  "organizationId": 29089A,
  "isEnabled": true,
  "crontab": null,
  "timezone": "Europe/London",
  "blockData": [
    {
      "top": 48,
      "left": 48,
      "uuid": "redacted",
      "pluginId": "startTrigger",
      "blockType": "webhook",
      "editorType": "JavascriptQuery",
      "environment": "production",
      "isMinimized": false,
      "resourceName": "webhook",
      "responseHeight": 702,
      "retryPanelOpen": false,
      "responsePanelState": "open",
      "incomingOnSuccessEdges": []
    },
    {
      "top": 48,
      "left": 560,
      "uuid": "redacted",
      "isStart": true,
      "pluginId": "payload",
      "blockType": "default",
      "editorType": "JavascriptQuery",
      "environment": "production",
      "isMinimized": false,
      "resourceName": "JavascriptQuery",
      "retryPanelOpen": false,
      "incomingOnSuccessEdges": ["createSubscription-uuid"]
    },
    {
      "top": 1264,
      "left": 1584,
      "uuid": "redacted",
      "options": {
        "to": "redacted",
        "from": "redacted",
        "forEachUIType": "code"
      },
      "pluginId": "createSubscriptions",
      "blockType": "forEach",
      "editorType": "JavascriptQuery",
      "environment": "production",
      "isMinimized": false,
      "resourceName": "JavascriptQuery",
      "retryPanelOpen": false,
      "incomingOnSuccessEdges": ["redacted"]
    },
    {
      "top": 352,
      "left": 48,
      "uuid": "redacted",
      "pluginId": "query11",
      "blockType": "default",
      "editorType": "SqlQueryUnified",
      "environment": "production",
      "resourceName": "redacted",
      "incomingPorts": [],
      "incomingOnSuccessEdges": []
    }
  ],
  "triggerWebhooks": [
    {
      "name": "startTrigger",
      "uuid": "startTrigger",
      "inputSchema": {
        "properties": [
          { "name": "subscriptions", "type": "array", "required": false }
        ]
      },
      "useHeaderApiKey": false,
      "exampleInputJSON": ""
    }
  ],
  "customLibraries": [],
  "createdBy": 391070,
  "protected": false,
  "javascriptLanguageConfigurationSaveId": null,
  "pythonLanguageConfigurationSaveId": null,
  "setupScripts": {
    "python": { "codeString": "" }
  }
}

Very strange - the only thing that looks a bit odd to me is that your inputSchema for the webhook trigger (at the bottom) has and actual schema. I've never even seen that as an option!

You may be right, it seems like something outside of your actual flow is causing this. Though not an ideal solution, what about adding a DB step prior to your insert to check for the existence of the ID and if it's already there, do nothing - maybe via an if block that routes the duplicate run into a dead end. You might also be able to do it via the query itself - if you go to the settings, and set finally to continue, the query gets an extra onError connector, you could dead-end that (or use it to log/notify), which the first, successful query would continue on the intended path.

Thank you @MikeCB

I think because the executions are seconds apart, there's still a chance that the first one will be running, and not having done the INSERTs, when the block attempts to check if the IDs are already in the DB (essentially trying to detect a race condition).

I'm also concerned about increasing our usage of Retool Workflows when this type of issue is present in the underlying execution engine. Trying to add protective queries to workflows doesn't seem like a viable option for adoption.

I had assumed that the inputSchema was derived from the Test JSON Parameters in the startTrigger block.

Completely agree re: concerns using Retool Workflows going forward if they are no longer maintaining or addressing bugs. I have been moving things from PA to Retool Workflows because I find them more flexible, but if support is no longer... supporting, them, it might be time to start looking for alternatives.

I see what you're saying about the pre-check for the ID, but you could still potentially use the Settings/Finally/Continue option to handle the bug - hopefully only until it's properly addressed. I'm with you, I don't like having to put in extra code to handle bugs, doesn't feel very sustainable. If you ever figure out what was causing it, please post an update!

Hi Nick — engineer from the Workflows team here. I'll reach out to you via email for more details.

If you are running into this issue, please double check if any of your global error handler blocks have parents, and removing all of those connections if so. We are actively working on a fixing a bug where certain executions that have global error handler blocks with parents are incorrect.

We have also disabled this type of connection in the editor so they cannot be created in the future. Having a parent block connected to a global error handler block has no benefit since the global error handler will not be triggered when the parent block has been run successfully.

2 Likes