Extract data from markdown table

Does anyone have a good suggestion for extracting data formatted in a markdown table into a Retool table or database?

I'm working on an AI-assisted workflow that analyzes large bodies of text and produces output for review. The way I've got my workflow set up is that I send a series of binary yes/no questions to OpenAI along with specific vectors and it generates a series of answers, formatted into a markdown table with the question, answer, reasoning, etc.

I want to extract these answers to a table/database as individual records for further processing and to append them with more metadata about the job, vectors used in the response, etc.

I've tried the entity extraction AI action, but it doesn't work for the length and nuance of the responses I'm generating. I'm hoping that by spitting out the responses in a consistent, accurate markdown format, I can simply transform the generated results into a real data source. But not sure how...

Any guidance is greatly appreciated! Thanks!

Hi @hstan,

Sure, I have 3 suggestions for you.

Option 1, is to use a regular expression to extract the data from the markdown table. The following regular expression will extract the data from a markdown table with the following format:

| Question | Answer | Reasoning |
|---|---|---|
| What is the capital of France? | Paris | The capital of France is Paris. |
import re

def extract_markdown_table(markdown_table):
  """Extracts the data from a markdown table.

  Args:
    markdown_table: A string containing the markdown table.

  Returns:
    A list of dictionaries, where each dictionary represents a row in the table.
  """

  # Compile the regular expression.
  regex = re.compile(r'\|(.*?)\|.*?\|.*?\|')

  # Extract the data from the markdown table.
  data = []
  for match in regex.finditer(markdown_table):
    row = {}
    columns = match.group(1).split('|')
    for i in range(len(columns)):
      row[columns[i].strip()] = columns[i + 1].strip()
    data.append(row)

  return data

# Extract the data from the markdown table.
data = extract_markdown_table(markdown_table)

Once you have extracted the data from the markdown table, you can insert it into a Retool table or database using the Retool API.

Option 2 is to use a Markdown parser to parse the markdown table and extract the data. There are many different Markdown parsers available, such as Python Markdown and Remark.

The following Python code uses the Python Markdown parser to extract the data from a markdown table:

import markdown

def extract_markdown_table(markdown_table):
  """Extracts the data from a markdown table.

  Args:
    markdown_table: A string containing the markdown table.

  Returns:
    A list of dictionaries, where each dictionary represents a row in the table.
  """

  # Parse the markdown table.
  parsed_markdown = markdown.markdown(markdown_table)

  # Extract the data from the parsed markdown.
  data = []
  for row in parsed_markdown.table:
    row_data = {}
    for column in row.cells:
      row_data[column.header] = column.text
    data.append(row_data)

  return data

# Extract the data from the markdown table.
data = extract_markdown_table(markdown_table)

Once you have extracted the data from the markdown table, you can insert it into a Retool table or database using the Retool API.

Or, option 3, use a Retool plugin to extract data from markdown tables. One popular plugin is the Retool Markdown Table plugin. It allows you to easily extract data from markdown tables into a Retool table or database. To use the plugin, simply add the plugin to your Retool app and then configure the plugin to extract the data from the markdown table.

Once you have configured the plugin, you can insert the extracted data into a Retool table or database using the plugin's built-in functionality.

I hope this helps!

:grinning:

Patrick

Thanks for the awesome feedback!

What I ended up doing was asking OpenAI to return the results in JSON format with specific keys instead of markdown, and then used {{ JSON.parse(query.data) }} to format the results into a query I fed into a Retool table. It works!

1 Like