To better learn Custom Components, I wanted to see if I could integrate an existing semantic classifier project I've been working on. I haven’t seen any posts discussing running ML models in a custom component, so I figured I’d share.
This kind of text classification task is similar to what you might use for a chatbot, support ticket routing, or applications where you need to generalize the meaning of short text.
This project implements three classifiers:
- Semantic - Uses a small-language model (SLM) running with
transformers.js
to compare semantic meaning by similarity in vector space - Keyword - Uses
nlp.js
to tokenize/stem input and look for matches - LLM - Uses the
OpenAI API
to generate a classification
The neat part is that each classifier runs client-side in the browser, most importantly the SLM. Huggingface's transformers.js library gives us the ability to load a remote model into memory and use it in our app.
This project uses Xenova/all-MiniLM-L6-v2
, a pre-trained sentence-transformers (SBERT) text embedding model converted to ONNX weights, which allows it to be run efficiently in the browser using WASM/WebGPU backends. There are many different models that can be used but this one gives great performance for its size; the q8
quantized version weighs in at only 22mb!
The actual pipeline code is in semanticClassifier.ts but, stripped down to its most simple, the pipeline can be distilled to:
const MODEL_NAME = 'Xenova/all-MiniLM-L6-v2'
embeddingPipeline = await pipeline(
'feature-extraction',
MODEL_NAME,
{
quantized: true,
}
)
const params = {
pooling: 'mean',
normalize: true
}
const textA = 'taxes'
const textB = 'irs'
const embeddingA = await embeddingPipeline(textA, params)
const embeddingB = await embeddingPipeline(textB, params)
const similarity = cosine(Array.from(embeddingA), Array.from(embeddingB))
For example, you might have a list of topics you want to match incoming SMS messages to. You can provide a list of intents
from which the model will generate embeddings:
[
{
name: 'Taxes',
keywords: [
'tax',
'taxes',
'taxation',
'income tax',
'tax rate',
'tax policy',
'tax reform',
'tax cuts',
'tax increase',
'tax burden',
'corporate tax',
'property tax'
],
examples: [
'We need to lower taxes for the middle class',
'Corporate tax rates are too low',
'How will the new tax policy affect small businesses?',
'Tax reform should focus on simplifying the code',
'Property taxes in my area are becoming unaffordable'
]
},
{
name: 'Immigration',
keywords: [
'immigration',
'immigrant',
'border',
'visa',
'asylum',
'refugee',
'deportation',
'citizenship',
'naturalization',
'undocumented',
'border security',
'migration'
],
examples: [
'Our immigration system needs comprehensive reform',
'Border security should be a top priority',
'Dreamers deserve a path to citizenship',
'How do we balance humane treatment with border enforcement?',
'Legal immigration pathways should be streamlined'
]
},
{
name: 'Economy',
keywords: [
'economy',
'economic',
'jobs',
'unemployment',
'inflation',
'recession',
'growth',
'gdp',
'stimulus',
'wages',
'labor',
'trade',
'interest rates',
'deficit'
],
examples: [
'Inflation is hurting working families',
'We need policies that create good-paying jobs',
'The trade deficit is damaging our manufacturing sector',
'Economic growth should benefit everyone, not just the wealthy',
'How will rising interest rates affect the housing market?'
]
},
{
name: 'Healthcare',
keywords: [
'healthcare',
'health care',
'medical',
'insurance',
'medicare',
'medicaid',
'prescription',
'drugs',
'hospital',
'affordable care',
'coverage',
'pre-existing conditions',
'single-payer',
'universal healthcare'
],
examples: [
'Healthcare should be a right, not a privilege',
'Prescription drug prices are out of control',
'Insurance companies have too much power over medical decisions',
'Medicare should be expanded to cover everyone',
'We need to protect coverage for pre-existing conditions'
]
},
{
name: 'Environment',
keywords: [
'environment',
'climate change',
'global warming',
'pollution',
'emissions',
'carbon',
'clean energy',
'renewable',
'sustainability',
'conservation',
'green new deal',
'fossil fuels',
'regulations'
],
examples: [
'Climate change is the biggest threat to our future',
'We need to transition to renewable energy',
'Environmental regulations are hurting businesses',
'What policies will actually reduce carbon emissions?',
'Conservation efforts must balance economic concerns'
]
},
{
name: 'Education',
keywords: [
'education',
'school',
'college',
'university',
'student',
'tuition',
'loan',
'debt',
'teacher',
'curriculum',
'funding',
'public education',
'private school',
'charter school',
'standards',
'testing'
],
examples: [
'Student loan debt is crippling an entire generation',
'Teachers deserve higher salaries',
'Public schools need more funding',
'College should be affordable for everyone',
'Education standards should be set locally, not federally'
]
}
]
The component will first try to classify the input message Semantic Classifier
using the SLM locally by generating an embedding of the message text and calculating the cosine similarity
of the embeddings in vector space.
If the result doesn't exceed the semanticThreshold
value, it will fall back to the Keyword classifier
. If the result doesn't exceed the keywordThreshold
, it'll then fall back to the LLM classifier
, and return the strongest match out of the three.
Profiling with performance.now()
, the HF model takes 2-5s to load, depending on platform, and 25-50ms to generate a classification.
I think small on-device, task-specific ML models have a lot of potential and it was cool to get one running in Retool.
Github link: https://github.com/trozzelle/retool-semantic-classifier