Knowledge Base — How Lumio AI Works Under the Hood

1

Content Sync

Lumio reads all published WordPress content and builds a structured knowledge base from it. The sync process runs in batches to stay safe on any hosting environment.

📄

Posts & Pages

All published posts, pages, and custom post types are indexed with their full content and metadata.

🛒

WooCommerce Products

Price, SKU, stock status, weight, dimensions, categories, and full product descriptions.

📋

ACF & Custom Fields

Advanced Custom Fields, custom meta values, and custom taxonomies are all captured.

🛡️

Shared hosting safe. Content is synced in batches of 200 posts per batch. This prevents memory exhaustion and timeout errors even on budget shared hosting plans.

// Batch processing configuration
batch_size: 200 // posts per batch
post_types: ["post", "page", "product", "custom_type"]
fields: ["title", "content", "excerpt", "acf_*", "meta_*"]
woo_fields: ["price", "sku", "stock", "weight", "dimensions"]

2

TF-IDF Vector Index

After content sync completes, Lumio builds a TF-IDF (Term Frequency-Inverse Document Frequency) vector index. This mathematical model converts your content into searchable vectors that enable semantic matching.

TF-IDF Index Build Pipeline

1

Tokenization

Content is split into individual terms. Stop words (the, is, at) are removed. Remaining terms are normalized to lowercase.

2

TF Calculation

Term Frequency measures how often each word appears in a document. Higher frequency = more relevant to that document's topic.

TF(t,d) = count(t in d) / total_terms(d)

3

IDF Calculation

Inverse Document Frequency penalizes common words. Terms appearing in many documents get lower weight; rare, distinctive terms get higher weight.

IDF(t) = log(total_docs / docs_containing(t))

4

Vector Generation & Storage

Each document becomes a weighted vector. The complete index is stored as a JSON file in the plugin's data/ folder.

wp-content/plugins/lumio-ai/data/tfidf-index.json

✅

Result: A compact, fast-loading vector index that enables cosine similarity search without any external vector database or cloud service.

📁

No external dependencies. The TF-IDF index is stored as a local JSON file. No vector database, no cloud embedding API, no additional costs. The index loads into memory on demand for fast querying.

3

Query Pipeline

When a visitor asks a question, Lumio runs it through a multi-stage pipeline to find the best possible answer from your content. Here is every stage in order:

A

Keyword Search with Synonym Expansion

The query is scanned for keywords and expanded with synonyms. This catches common rephrasings visitors use.

B

Entity Detection

Lumio recognizes product names, page titles, and category names mentioned in the query. Detected entities get a scoring boost.

C

TF-IDF Vector Search

The query is vectorized and compared against the document index using cosine similarity. This catches conceptual matches beyond exact keywords.

D

FAQ Table Lookup

Your manually curated FAQ pairs are checked. If a match is found, the FAQ answer is used directly, providing precise control over critical responses.

E

Best Excerpt Selection & AI Generation

The highest-scoring content excerpt is selected and sent to the AI model (Groq or OpenAI) along with the system prompt. The AI generates a natural-language answer grounded exclusively in your content.

Synonym Expansion Map

Built-in synonym mappings ensure visitors find answers regardless of how they phrase the question:

Visitor Says	Also Matches	Use Case
`refund`	`return`, `money back`	Return policies
`shipping`	`delivery`, `dispatch`	Shipping info pages
`cost`	`price`, `pricing`, `fee`	Product pricing
`hours`	`schedule`, `open`, `timing`	Business hours
`contact`	`reach`, `email`, `phone`	Contact pages

4

FAQ Management

FAQs give you precise control over how Lumio answers specific questions. They take priority over auto-generated content matches, ensuring critical information is always accurate.

✍️

Manual Q&A Pairs

Add question-answer pairs directly in the WordPress admin. Perfect for policies, hours, and common inquiries.

📂

CSV Import / Export

Bulk manage FAQs with CSV files. Import hundreds of Q&A pairs at once, or export for backup and editing.

🤖

AI Site Analyzer

PRO Lumio scans your site and auto-generates FAQ suggestions ranked by importance.

// Example CSV format for FAQ import
question,answer
"What is your return policy?","30-day returns on all items..."
"Do you ship internationally?","Yes, we ship to 40+ countries..."
"What payment methods accepted?","Visa, Mastercard, PayPal..."

5

Topic Map & Prompt Routing

The Topic Map lets you define keyword-to-topic routing rules. When a visitor's question matches a topic, Lumio prepends a topic-specific prompt prefix to guide the AI's response tone and depth.

Topic Routing Example

🔍

Visitor query: "How do I return a damaged item?"

Keywords detected: return, damaged

🎯

Topic matched: "Returns & Refunds"

Prompt prefix: "Be empathetic and helpful. Provide step-by-step return instructions. Always include the returns page link."

✅

AI responds with topic-aware context

The response uses the topic-specific tone and always includes the configured returns page URL.

6

Configuration & Maintenance

Fine-tune the knowledge engine and keep your index fresh with these configuration options:

Min content length: 5 chars Clear & rebuild anytime Auto-sync on publish Manual re-index button

⚙️

Minimum content length filter: Posts shorter than 5 characters (default) are excluded from the index. This prevents empty drafts and placeholder content from polluting search results. You can adjust this threshold in Settings.

🔄

Clear and rebuild: You can clear the entire TF-IDF index and rebuild it from scratch at any time. Go to Lumio AI → Settings → Knowledge Base and click "Rebuild Index". This is useful after major content changes or bulk imports.

💡

After large content changes (bulk product import, migration, etc.), always rebuild your index. The chatbot can only answer from what is in the synced index, not directly from the database.

How Lumio's knowledge engineactually works.

Content Sync

TF-IDF Vector Index

Query Pipeline

Keyword Search with Synonym Expansion

Entity Detection

TF-IDF Vector Search

FAQ Table Lookup

Best Excerpt Selection & AI Generation

Synonym Expansion Map

FAQ Management

Topic Map & Prompt Routing

Configuration & Maintenance

How Lumio's knowledge engine
actually works.