How Lumio's knowledge engine
actually works.

A deep dive into content sync, TF-IDF vector indexing, the query pipeline, FAQ management, and topic routing. Everything that powers Lumio's zero-hallucination answers.

1

Content Sync

Lumio reads all published WordPress content and builds a structured knowledge base from it. The sync process runs in batches to stay safe on any hosting environment.

📄
Posts & Pages
All published posts, pages, and custom post types are indexed with their full content and metadata.
🛒
WooCommerce Products
Price, SKU, stock status, weight, dimensions, categories, and full product descriptions.
📋
ACF & Custom Fields
Advanced Custom Fields, custom meta values, and custom taxonomies are all captured.
🛡️
Shared hosting safe. Content is synced in batches of 200 posts per batch. This prevents memory exhaustion and timeout errors even on budget shared hosting plans.
// Batch processing configuration
batch_size: 200 // posts per batch
post_types: ["post", "page", "product", "custom_type"]
fields: ["title", "content", "excerpt", "acf_*", "meta_*"]
woo_fields: ["price", "sku", "stock", "weight", "dimensions"]
2

TF-IDF Vector Index

After content sync completes, Lumio builds a TF-IDF (Term Frequency-Inverse Document Frequency) vector index. This mathematical model converts your content into searchable vectors that enable semantic matching.

TF-IDF Index Build Pipeline
1
Tokenization
Content is split into individual terms. Stop words (the, is, at) are removed. Remaining terms are normalized to lowercase.
2
TF Calculation
Term Frequency measures how often each word appears in a document. Higher frequency = more relevant to that document's topic.
TF(t,d) = count(t in d) / total_terms(d)
3
IDF Calculation
Inverse Document Frequency penalizes common words. Terms appearing in many documents get lower weight; rare, distinctive terms get higher weight.
IDF(t) = log(total_docs / docs_containing(t))
4
Vector Generation & Storage
Each document becomes a weighted vector. The complete index is stored as a JSON file in the plugin's data/ folder.
wp-content/plugins/lumio-ai/data/tfidf-index.json
Result: A compact, fast-loading vector index that enables cosine similarity search without any external vector database or cloud service.
📁
No external dependencies. The TF-IDF index is stored as a local JSON file. No vector database, no cloud embedding API, no additional costs. The index loads into memory on demand for fast querying.
3

Query Pipeline

When a visitor asks a question, Lumio runs it through a multi-stage pipeline to find the best possible answer from your content. Here is every stage in order:

A

Keyword Search with Synonym Expansion

The query is scanned for keywords and expanded with synonyms. This catches common rephrasings visitors use.

B

Entity Detection

Lumio recognizes product names, page titles, and category names mentioned in the query. Detected entities get a scoring boost.

C

TF-IDF Vector Search

The query is vectorized and compared against the document index using cosine similarity. This catches conceptual matches beyond exact keywords.

D

FAQ Table Lookup

Your manually curated FAQ pairs are checked. If a match is found, the FAQ answer is used directly, providing precise control over critical responses.

E

Best Excerpt Selection & AI Generation

The highest-scoring content excerpt is selected and sent to the AI model (Groq or OpenAI) along with the system prompt. The AI generates a natural-language answer grounded exclusively in your content.

Synonym Expansion Map

Built-in synonym mappings ensure visitors find answers regardless of how they phrase the question:

Visitor SaysAlso MatchesUse Case
refundreturn, money backReturn policies
shippingdelivery, dispatchShipping info pages
costprice, pricing, feeProduct pricing
hoursschedule, open, timingBusiness hours
contactreach, email, phoneContact pages
4

FAQ Management

FAQs give you precise control over how Lumio answers specific questions. They take priority over auto-generated content matches, ensuring critical information is always accurate.

✍️
Manual Q&A Pairs
Add question-answer pairs directly in the WordPress admin. Perfect for policies, hours, and common inquiries.
📂
CSV Import / Export
Bulk manage FAQs with CSV files. Import hundreds of Q&A pairs at once, or export for backup and editing.
🤖
AI Site Analyzer
PRO Lumio scans your site and auto-generates FAQ suggestions ranked by importance.
// Example CSV format for FAQ import
question,answer
"What is your return policy?","30-day returns on all items..."
"Do you ship internationally?","Yes, we ship to 40+ countries..."
"What payment methods accepted?","Visa, Mastercard, PayPal..."
5

Topic Map & Prompt Routing

The Topic Map lets you define keyword-to-topic routing rules. When a visitor's question matches a topic, Lumio prepends a topic-specific prompt prefix to guide the AI's response tone and depth.

Topic Routing Example
🔍
Visitor query: "How do I return a damaged item?"
Keywords detected: return, damaged
🎯
Topic matched: "Returns & Refunds"
Prompt prefix: "Be empathetic and helpful. Provide step-by-step return instructions. Always include the returns page link."
AI responds with topic-aware context
The response uses the topic-specific tone and always includes the configured returns page URL.
6

Configuration & Maintenance

Fine-tune the knowledge engine and keep your index fresh with these configuration options:

Min content length: 5 chars Clear & rebuild anytime Auto-sync on publish Manual re-index button
⚙️
Minimum content length filter: Posts shorter than 5 characters (default) are excluded from the index. This prevents empty drafts and placeholder content from polluting search results. You can adjust this threshold in Settings.
🔄
Clear and rebuild: You can clear the entire TF-IDF index and rebuild it from scratch at any time. Go to Lumio AI → Settings → Knowledge Base and click "Rebuild Index". This is useful after major content changes or bulk imports.
💡
After large content changes (bulk product import, migration, etc.), always rebuild your index. The chatbot can only answer from what is in the synced index, not directly from the database.