Pipe '|' characters in Agent output when using markdown tables

James_Tuxbury · July 24, 2025, 2:53pm

Hello retool forum,

I am seeing an issue with the output from Agents. The agents tend to use markdown tables in their output, but when the data contains pipe '|' characters it breaks. Has anyone had any success getting the agent to correctly handle these characters?

I have tried:

adding a line in the instructions about formatting the data for markdown.
having output instructions in a vector telling it to escape the pipe characters in markdown
adding the instruction to escape the pipe characters to the description of the tool itself

However, it continues to output broken Markdown tables. I was able to get it to escape the pipe characters correctly once by asking it directly in the chat to provide the same output with the pipes escaped. But in subsequent tests, when I asked it to do the same thing, it simply dropped the pipe character and everything after it in the data.

bobthebear · July 24, 2025, 7:59pm

what do you mean by 'it breaks'? does the run fail or does it not display correctly?

you could try putting the following in your prompt:

"Always use the HTML entity number for any non-alphanumeric characters in your output. Do not use the actual characters.
Example:
- Instead of `|`, use `&#124;`
- Instead of `©` , use `&#169;`
- Instead of `™` , use `&#8482;`
- Instead of `&` , use `&#38;`"

depending on how your prompt is designed there are places and ways of adding the above that are better than others:

If you have a highly structured prompt with defined sections

### Instructions and Constraints:
1. Always represent all non-alphanumeric characters using their HTML Entity Number (e.g., &#38; for &, &#60; for <) in your output. Do not use the actual character.
2. ... (other rules)

### Task:
[Your main task description...]

If you start out with assigning a role you can try one of the following:

You are a [your stuff here]

IMPORTANT: Always use the HTML entity number for any non-alphanumeric characters in your output. Do not use the actual characters.
Example:
- Instead of `|`, use `&#124;`
- Instead of `©` , use `&#169;`
- Instead of `™` , use `&#8482;`
- Instead of `&` , use `&#38;`

Analyze and understand the instructions in full before following in order:
[instructions/steps to acomplish something]

IMPORTANT: Always use the HTML entity number for any non-alphanumeric characters in your output. Do not use the actual characters.
Example:
- Instead of `|`, use `&#124;`
- Instead of `©` , use `&#169;`
- Instead of `™` , use `&#8482;`
- Instead of `&` , use `&#38;`

You are a [your stuff here]

Analyze and understand the following instructions in full before following in order:
[instructions/steps to acomplish something]

James_Tuxbury · July 24, 2025, 8:25pm

Thanks for the reply @bobthebear. I will try the HTML entity and get back to you. This is what I mean about the markdown breaking

In this example we have the listing skus: 08213-10W30 | 2 and A0008 | 4 which when put into markdown ends up shifting the data over a column.

bobthebear · July 25, 2025, 1:37am

ahhh I see, the HTML entity might not work correctly in this context, or it might be sort of iffy. technically it should work? but what the Agent does is sorta up to it..... a more targeted solution might be better here since we can guarantee we won't modify any of the actual data, just its formatting. occassionally complex prompts with lots of small details some of the small things can get skipped, you might also have to deal with hallucinations from when it thinks a rule must be applied somewhere but it can't figure out how so it forces it (like cutting and gluing random puzzles pieces from different puzzles together and using it as a replacement for a lost piece.... you technically finished the puzzle... but did you?)

since this is a rendering issue, not a model response issue (technically it's outputting correct Markdown, the | is just being read as a column separator instead part of a column value. Instead of trying to stick a formatting/escaping instruction in the prompt it might be more reliable to give the agent a tool named 'process_response' or something and pass along the full response.

The Simple Solution (not flexible)

.... ok so I was gonna just leave it at that, but then I realized I could actually make use of something like this. There's actually a really interesting problem here since we don't know what column(s) can have a | in their value. I eventually noticed your first column is for SKUs and the other 2 are numerical.... so the solution is simpler if we know for a fact only the left column can contain spare |'s.
markdown_table_processor_simple.csv (13.3 KB)

"""
processes markdown text that may contain tables with problematic pipe
characters within cell content. It intelligently distinguishes between structural
pipes (table separators) and content pipes (data within cells) to preserve table
structure while escaping problematic pipes as HTML entities.

Args:
    text (str): Input markdown text that may contain one or more tables with 
               pipe characters in cell content
    
Returns:
    str: Processed markdown text with:
         - Pipe characters in cell content escaped as HTML entities (&#124;)
         - Cell values wrapped in HTML <code> tags for safe rendering
         - Table structure preserved and properly formatted
         - Non-table content left unchanged

Example:
    >>> markdown_text = '''
    ... # My Document
    ... 
    ... | Product | SKU | Price |
    ... |---------|-----|-------|
    ... | Widget | ABC|123 | $29.99 |
    ... 
    ... Some other content.
    ... '''
    >>> result = process_markdown_with_pipe_escaping(markdown_text)
    >>> print(result)
    # My Document
    
    |<code>Product</code>|<code>SKU</code>|<code>Price</code>|
    |<code>---------</code>|<code>-----</code>|<code>-------</code>|
    |<code>Widget</code>|<code>ABC&#124;123</code>|<code>$29.99</code>|
    
    Some other content.
"""

yay this is neat. However, it merges "columns" from left to right assuming the 1st column will most likely be the issue....

Hmmmmmm

so why am I still typing? lol

well, I thought about this solution and I decided that since you used the wording

In this example

that perhaps this was just the simplest example to show the problem with. You could have much more columns that all could possibly contain extra |s and it might be a column in the middle of the table and the | might occur near the beginning or end of a columns value. so i explored a few options
comprehensive_comparison.csv (8.7 KB)

Spacial Analysis

Context-based pipe classification
- Spatial Analysis examines the positional context around pipe characters to determine their purpose. It analyzes character distances, groupings, and patterns to classify pipes as either structural (table separators) or content (data within cells).
Implementation:
- Scored each cell based on fragment likelihood (length, capitalization, position)
- Used greedy algorithm to iteratively merge highest-scoring adjacent pairs
- Applied heuristics like "short cells are fragments" and "middle cells more likely to split"
- Made quick local decisions without considering global optimality
Pros:
Context-Aware: Considers surrounding characters and spacing
Pattern Recognition: Excellent for emails, URLs, file paths
Distance Analysis: Can identify unusual spacing patterns
Character Classification: Distinguishes word boundaries and types
Cons:
Complex Logic: Many heuristic rules to maintain
False Positives: May misclassify legitimate structural pipes
Performance: Multiple character analysis passes
Edge Cases: Struggles with inconsistent spacing in tables

Greedy Fragment Merging

Heuristic-based sequential merging
- A greedy algorithm that analyzes individual cells for "merge likelihood" and makes locally optimal decisions to combine cells that appear to be fragments.
Implementation:
- Scored each cell based on fragment likelihood (length, capitalization, position)
- Used greedy algorithm to iteratively merge highest-scoring adjacent pairs
- Applied heuristics like "short cells are fragments" and "middle cells more likely to split"
- Made quick local decisions without considering global optimality
Pros:
Fast Performance: O(n) greedy decisions
Simple Logic: Clear heuristics easy to understand
Good for Common Cases: Works well for obvious fragments
Low Memory: No combinatorial explosion
Cons:
Locally Optimal: May miss globally better solutions
Order Dependent: Results can vary based on processing order
Limited Lookahead: Can't consider complex multi-cell patterns
Heuristic Bias: May consistently make suboptimal choices

Pattern Matching

Exhaustive combinatorial optimization
- generates all possible ways to merge excess cells, scores each combination using semantic analysis, and selects the globally optimal result.
Implementation:
- Generated all mathematically possible merge combinations using itertools.combinations
- Applied comprehensive scoring based on content semantics (SKUs, emails, prices)
- Used column position analysis (first=names, last=status, middle=data)
- Selected globally optimal solution through exhaustive evaluation
Pros:
Globally Optimal: Finds the best possible solution
Semantic Intelligence: Understands content patterns (SKUs, emails)
Comprehensive Scoring: Multiple evaluation criteria
Robust: Handles complex edge cases reliably
Cons:
Performance Cost: Exponential complexity for many excess cells
Memory Usage: Stores all combinations
Complex Scoring: Many interacting scoring rules
Overkill: May be excessive for simple cases

Simple Merge

Basic left to right merge
Implementation:
- Identified tables and expected column counts through structure analysis
- When excess cells detected, merged all extra cells into the leftmost column
- Preserved rightmost cells assuming they're more likely to be correct
- Applied simple pipe escaping without complex decision logic

Best Solution (IMO): Pattern Matching

your example seems to be some sort of financial data, so I assumed accuracy is the most important factor here.
here's a full implementation:
markdown_table_processor_fancypants.csv (20.1 KB)
test_markdown_processor.csv (6.3 KB)
demo_usage.csv (2.6 KB)

I didn't wanna type this next one out, so yay Claude!

Note:
sorry for the .csv file type but it doesn't allow me to upload .txt for some reason (I fully understand not allowing code files, but txt files??). Anyway, just rename them with either .py or .txt if you want to use them directly or u can just open the csv in notepad and copy/paste to wherever

James_Tuxbury · July 25, 2025, 2:53pm

Thanks for the very detailed reply @bobthebear

I will get back to you once I try this out.

Jack_T · August 5, 2025, 5:37pm

Thank you so much for the detailed write up @bobthebear!

Hi @James_Tuxbury, I am curious if this issue occurs across LLM model selection. What model are you using when you get this issue and have you changed models across providers?

If we can isolate this to how Retool to formatting the data response from the model then we should be able to figure out why the pipe character is breaking tables and why it can't be easily replaced with another character in the model's instructions

James_Tuxbury · August 8, 2025, 2:27pm

Hello and thanks for the reply.

So far I have been using GPT 4o mini mostly. I briefly tried Calude but ran into the token limits more frequently. I will test some more models and let you know what happens. My underlying data has an usually high amount of pipe characters in it so im seeing this issue a good bit because the agent always outputs the tabular data in markdown. It seems to do a good job of following most of the instructions, but for whatever reason will not follow the instruction to escape the pipe characters.

James_Tuxbury · August 14, 2025, 10:04pm

Hello @Jack_T

I just tried this again using claude instead and it actually worked fine with the pipe characters escaped without having to edit the prompt or anything else. It seems like this is an LLM issue and not a retool one.

Jack_T · August 14, 2025, 11:49pm

That is so great to hear @James_Tuxbury!

Very odd, glad it seems to not be Retool specific.

bobthebear · August 15, 2025, 12:35pm

I gotta learn to start small . I'm really happy you found how to fix it reliably!!

James_Tuxbury · August 15, 2025, 1:43pm

Thanks and I appreciate your effort @bobthebear !

James_Tuxbury · August 19, 2025, 9:33pm

Quick update @Jack_T

I changed the model to gpt 5 since retool supports it now and it seems to be able to correctly escape the pipe characters as well.

Jack_T · August 20, 2025, 5:09pm

Great news to hear @James_Tuxbury!

I am glad all that capital OpenAI has put into their new model has allowed them to overcome this pipe issue

Topic		Replies	Views
Breaking encoding on Retool AI 💬 App Building openai , ai	9	414	August 11, 2025
Force Agent to use a tool 💬 Agents prompt	12	289	October 30, 2025
Aggressive linting of queries - bug fix or feature improvement 💬 App Building bug , sql	16	465	March 31, 2025
How to highlight search targets 💬 Feature Requests	7	147	February 12, 2026
AI Summarisation Action Stops Abruptly with no error warnings 💬 App Building resource-connection , bug , ai	4	358	August 29, 2024