AI as interface to Database

Continuing the discussion from Open AI querying database:

I was wondering the same thing as Andrew described in the post above (now closed).

I admit I haven't read up on Vectors which are mentioned in the answer, but that doesn't seem to be what I want.

My expectation (and I assume Andrew's) is that the AI should be able to fulfill the role of a "smart" person, who acts as a go-between, extracting data from the database according to requests from a user.

The AI should be able to query the database, acquire all the table information and relationships between them. It should then be able to accept a common language enquiry from a user, and generate the corresponding response from the available data.

Obviously the enquiry would have to contain valid keywords matching the keys/data in the tables. Then the AI should be able to take care of the more "technical" side of creating the right queries and handling the response from the database.

Eg. we ask... "Show me all the bookings for 2025"
The AI understand that "bookings" information is in the Bookings table and that 2025 is a year, hence stored as key "Year".
Being smart the AI asks "Sure - what would you like to know about those bookings?"
We reply "Show me the start date and the customer who made the booking."
So the AI figures out it needs to find keys like "start date" and "customer". It can query the bookings table accordingly, finding start dates, and resolves customer_Id to customer_name via JOIN or separate query to the customer table.
And so on... add columns, create summations, add or remove data...

Isn't that something an AI should be able to do?

Hey there @davblo,

I think this is definitely doable.

I remember watching @kent's demo having something exactly like you're saying, and he provided the workflow and the app here. His demo and explanation can be found here https://youtu.be/e7Bs5qiDqpk?si=pYoMYsm4G2MiFBc6&t=1562

So basically you would create a couple of tools:

  • text to sql, which would do what you mention here:

It should then be able to accept a common language enquiry from a user, and generate the corresponding response from the available data.

  • This tool is definitely a workflow, where you pass yourdatabase schema within the window context, and potentially some examples of the data (although I still need to understand what are best practices for prompt engineering for examples, is it better in the system context, or within the prompt, or something else entirely?)

As an example, if you're using Postgres you can query

SELECT table_schema, table_name, column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_schema, table_name, ordinal_position;

to get the schema, and

SELECT
  tc.table_name AS source_table,
  kcu.column_name AS source_column,
  ccu.table_name AS target_table,
  ccu.column_name AS target_column,
  rc.update_rule,
  rc.delete_rule
FROM 
  information_schema.table_constraints AS tc
JOIN 
  information_schema.key_column_usage AS kcu
  ON tc.constraint_name = kcu.constraint_name
  AND tc.table_schema = kcu.table_schema
JOIN 
  information_schema.referential_constraints AS rc
  ON tc.constraint_name = rc.constraint_name
  AND tc.table_schema = rc.constraint_schema
JOIN 
  information_schema.constraint_column_usage AS ccu
  ON rc.unique_constraint_name = ccu.constraint_name
  AND rc.unique_constraint_schema = ccu.constraint_schema
WHERE 
  tc.constraint_type = 'FOREIGN KEY'
  AND tc.table_schema = 'public'
ORDER BY 
  source_table, source_column;

to get foreign keys relationships.

This should provide a good starting point to the llm to draft an sql query to pull the results you're looking for. The challenge will be in providing enough context around your table's columns so that it can really infer from natural language what is it that the user is searching.

I think that with Retool's evals and data sets you can further improve this. Or maybe not?? @kent, sorry if I double tag you, but are evals related only to question - > tool, or can it go deeper? I.e. question -> expected output?

Thanks!

Get User Select Tool (Custom Tool):

Get User Tool (Workflow):

runPostgresQuery (Function tool):


This Agent will first generate a sql command to find a user with the given attribute and value then it will run the generated command.


1 Like

Wow, thanks - that'll take some time to digest! :slight_smile:

Meanwhile, I'm using mariadb rather than Postgres, so starting from Miguel's interesting suggestion about retrieving foreign key details - I think that's useful in it's own right as reference material about a database. So here is the equivalent for my mariadb (thanks to ChatGPT)...

SELECT
  kcu.TABLE_NAME AS source_table,
  kcu.COLUMN_NAME AS source_column,
  kcu.REFERENCED_TABLE_NAME AS target_table,
  kcu.REFERENCED_COLUMN_NAME AS target_column,
  rc.UPDATE_RULE AS update_rule,
  rc.DELETE_RULE AS delete_rule
FROM
  information_schema.KEY_COLUMN_USAGE AS kcu
JOIN
  information_schema.REFERENTIAL_CONSTRAINTS AS rc
  ON kcu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
  AND kcu.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA
WHERE
  kcu.REFERENCED_TABLE_NAME IS NOT NULL
  AND kcu.TABLE_SCHEMA = 'your_database_name'
ORDER BY
  source_table, source_column;
2 Likes

Still on the slight digression, but now I have an improved query which shows all columns in all tables with information about constraints and foreign key references.

I put the result in a table with rows grouped on table_name, and I find it a quick and easy way to browse the database schema, eg to check the cross references, in case I forget parts I created months ago and to check whether I missed to set any constraints.

As before it is specific to mariadb, but if you show it to ChatGPT it will be more than happy to convert it to your particular flavour of database.

SELECT
	isc.table_name as table_name,
	isc.column_name as column_name,
	isc.data_type as data_type,
	iskcu.CONSTRAINT_NAME AS constraint_name,
	iskcu.REFERENCED_TABLE_NAME AS target_table,
	iskcu.REFERENCED_COLUMN_NAME AS target_column,
	isrc.UPDATE_RULE AS update_rule,
	isrc.DELETE_RULE AS delete_rule
FROM
	information_schema.columns AS isc
LEFT JOIN
	information_schema.KEY_COLUMN_USAGE AS iskcu 
	ON iskcu.TABLE_NAME = isc.table_name
	AND iskcu.COLUMN_NAME = isc.column_name
LEFT JOIN
	information_schema.REFERENTIAL_CONSTRAINTS AS isrc
	ON iskcu.CONSTRAINT_NAME = isrc.CONSTRAINT_NAME
	AND iskcu.CONSTRAINT_SCHEMA = isrc.CONSTRAINT_SCHEMA
WHERE
	isc.TABLE_SCHEMA = '<your-database-name>'
ORDER BY
  isc.table_name, isc.column_name, isc.data_type

Back to the man point...

Yes, Kent's presentation was certainly relevant to this.

It will be a while before I understand enough to be able to delve into such things myself, but it's good to know the possibilities are there. And I'll be watching out for more examples of how it can work in practice.

1 Like

if you want to change mine to MariaDB just so u have another option to test out, anywhere you see POSTGRESQL change it to `MariaDB.

we're actually using almost the same data about the database, the only difference is I only pass along data from information_schema.COLUMNS. I noticed this info contains what table the column is in, so there's no need to pass along information_shcema.TABLES. The real problem in both scenarios will be the context size, as your db grows you can easily end up with a context too large for the ai model. I think if you were to post process the results even more and remove any columns from tables that won't be needed you could save some space.... for example, i'm not entirely sure foreign key references or constraints are needed. you can get the same thing across by sending the model an examples of invalid input and as long as your columns have relevant names the model should be able to figure out the connection:
like if one table has columns (user_id, user_email, user_name) and another table has columns (user_id, street_address, zip_code, state) the model should be able tell that user_id in both tables are the same and are some sort of foreign key (it can even infer types by analyzing the column data)

if you wanna take this a step further.... take your generated MariaDB command, add 'EXPLAIN EXTENDED' to the beginning of the command, run this on the database, give the results to the model with the instruction to use them to improve the command
Generate User Select for POSTGRESQL.json (71.8 KB)

1 Like

Just to add my two cents - I've had success doing similar things via AI driven workflows, so I'm confident that agents would be able to do it even better. The key is typically exposing your database schema to the LLM, either as part of its system prompt or as a vector. With that context, the agent should be easily able to write and execute SQL queries against your resource.

I'm happy to answer any specific questions, though!

2 Likes

ohhhhhhhhhhh the database schema as a vector is a neat idea!! Hadn't thought of that!

1 Like