Hello,
I am using the 'Extract Entity from Text' AI action in a workflow. In this block, the objective is to extract the full name of someone from a paragraph of text. Unlike the "generate text" action, this one doesn't have a field for a system prompt. I have tried putting guidance to the model into the "input" field, like this:
Debtor Information: {{ value.debtor_paragraph }}
SYSTEM INSTRUCTIONS:
Return just the full name. If you see two names like "John Paul Smith and
Mary Scott", just return the first name "John Paul Smith." ... Do not return
results like "The Estate of John Paul Smith", instead just return "John Paul
Smith"...
I am not confident this approach is working very well, as sometimes the entities extracted contain the very errors I attempt to guide against in the system prompt. I am able to deal with these later in a subsequent step in the workflow that cleans up the extracted entities in JavaScript, but I am wondering if there is any guidance on how to improve the performance of entity extraction. The documentation does not say anything about system prompts.
Also, the way it works, the only way to tell the system about the entity you want to extract is in the label in the 'entities to extract' input field. That's fine if you're extracting something where the name of the entity is obvious and self-descriptive, like 'telephone_number', but imagine if you wanted to extract, say, account_number or invoice_number -- you might want to guide the system that invoice numbers always begin with INV- or that account numbers will always be 10 characters long, etc. Again, just curious for advice on how to improve performance of entity extraction. Thanks.
Ok, but if you (not soecifically you fenix but whoevers reading this) are considering using or integrating AI please read over the page just once, they took the time to help keep us safe, help make sure some underpaid content writter gets to keep their job .
I mention this because of first and last names. its considered PII (Personally Identifying Information), and sharing it under certain circumstances and/or in specific countries is a BIG Federal NO NO. In general however we dont want the model to read it and accidently associate it w something else that leads to ut replying to a "hello" with the holocaust 2.0, or more likely itd just degrade model accuracy possibly compounding it every fine-tuning ending in hallucinations, lies and GPT2 lol.
a strategy to get around this is to substitue every name with an id, like a hash map would or a UUID/GUID then store than in a retool db or something.
ok so the secret to a lable like account_number or something, would actually be regex =).
I can definitely make a feature request for another input field that could be used to provide additional context to better define the 'entity to extract'!
Currently it works for simple examples but could be much more refined with additional context as you mentioned in your example with prefixes such as INV or defining the length of a number, etc.
I believe that under the hood the whole 'input' field is sent to the LLM model where it then can try to parse out instructions in the beginning paragraph and then apply such rules/logic to the following information in the 'input' field.
AI tools are new so we appreciate the feedback as we try to improve and fine tune the UI for users to best set up their payload to be properly utilized by the models!