Knowledge Base Pack

The Knowledge Base skill pack enables building intelligent document search and retrieval systems with both semantic (vector) and keyword-based (full-text) search.

Included Services

Qdrant

Vector database for semantic search

PostgreSQL

Relational database for structured data

Meilisearch

Fast typo-tolerant full-text search

Skills Provided

Qdrant Memory

Capabilities:

Store document embeddings
Semantic similarity search
Metadata filtering
Hybrid search with filters
Multi-collection management

Example Usage:

# Create a documents collection
curl -X PUT "http://qdrant:6333/collections/documents" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {"size": 1536, "distance": "Cosine"},
    "optimizers_config": {"default_segment_number": 2}
  }'

# Store document chunks with embeddings
curl -X PUT "http://qdrant:6333/collections/documents/points" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [{
      "id": 1,
      "vector": [0.05, 0.61, 0.76, ...],
      "payload": {
        "document_id": "doc-123",
        "chunk_index": 0,
        "text": "Machine learning is a subset of artificial intelligence...",
        "source": "ml-handbook.pdf",
        "page": 5,
        "created_at": "2025-01-15T10:30:00Z"
      }
    }]
  }'

# Semantic search
curl -X POST "http://qdrant:6333/collections/documents/points/search" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.2, 0.1, 0.9, ...],
    "limit": 10,
    "with_payload": true,
    "filter": {
      "must": [
        {"key": "source", "match": {"value": "ml-handbook.pdf"}}
      ]
    }
  }'

PostgreSQL Query

Capabilities:

Store document metadata
Track document versions
User permissions and access control
Full ACID transactions
Complex relational queries

Example Usage:

-- Create documents table
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title VARCHAR(255) NOT NULL,
  content_type VARCHAR(50),
  file_path TEXT,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW(),
  created_by UUID,
  tags TEXT[],
  metadata JSONB
);

-- Create chunks table (for RAG)
CREATE TABLE document_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
  chunk_index INTEGER,
  text TEXT NOT NULL,
  qdrant_id BIGINT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Insert a document
INSERT INTO documents (title, content_type, file_path, tags)
VALUES (
  'Machine Learning Handbook',
  'application/pdf',
  '/data/documents/ml-handbook.pdf',
  ARRAY['ai', 'ml', 'handbook']
);

-- Search by tags
SELECT * FROM documents
WHERE 'ai' = ANY(tags)
ORDER BY created_at DESC
LIMIT 10;

Meilisearch Index

Capabilities:

Lightning-fast full-text search
Typo-tolerant search
Faceted filtering
Highlighting and snippets
Instant search as-you-type
Ranking customization

Example Usage:

# Create an index
curl -X POST "http://meilisearch:7700/indexes" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "uid": "documents",
    "primaryKey": "id"
  }'

# Configure searchable attributes
curl -X PATCH "http://meilisearch:7700/indexes/documents/settings" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "searchableAttributes": ["title", "content", "tags"],
    "filterableAttributes": ["content_type", "created_at", "tags"],
    "sortableAttributes": ["created_at", "title"]
  }'

# Add documents
curl -X POST "http://meilisearch:7700/indexes/documents/documents" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "doc-123",
      "title": "Machine Learning Handbook",
      "content": "A comprehensive guide to machine learning...",
      "content_type": "pdf",
      "tags": ["ai", "ml"],
      "created_at": 1705315800
    }
  ]'

# Search (typo-tolerant)
curl -X POST "http://meilisearch:7700/indexes/documents/search" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "machne lerning",
    "limit": 20,
    "attributesToHighlight": ["title", "content"],
    "filter": "tags = ai"
  }'

Use Cases

RAG (Retrieval-Augmented Generation)

Build a complete RAG system:

Ingest documents
- Upload PDFs, markdown, etc.
- Store metadata in PostgreSQL
- Index full text in Meilisearch
Chunk and embed
- Split documents into chunks
- Generate embeddings with Ollama
- Store vectors in Qdrant
Search
- Full-text search with Meilisearch
- Semantic search with Qdrant
- Combine results for hybrid search
Generate answers
- Retrieve relevant chunks
- Pass to LLM for generation
- Cite sources from PostgreSQL

Document Management System

Build an intelligent DMS:

# Upload document
1. Store file in MinIO (from Video Creator pack)
2. Extract text content
3. Insert metadata in PostgreSQL
4. Index in Meilisearch
5. Generate embeddings and store in Qdrant

# Search documents
- Full-text: Meilisearch
- Semantic: Qdrant
- Metadata filters: PostgreSQL

Knowledge Graph

Connect documents with relationships:

-- Create relationships table
CREATE TABLE document_relationships (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  source_doc_id UUID REFERENCES documents(id),
  target_doc_id UUID REFERENCES documents(id),
  relationship_type VARCHAR(50),
  confidence FLOAT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Find related documents
SELECT d.*
FROM documents d
JOIN document_relationships r ON d.id = r.target_doc_id
WHERE r.source_doc_id = $1
  AND r.relationship_type = 'similar'
  AND r.confidence > 0.8
ORDER BY r.confidence DESC;

Wiki / Documentation Site

Build a searchable wiki:

Store pages in PostgreSQL
Index content in Meilisearch
Generate embeddings for “related pages”
Use Qdrant for semantic “see also” suggestions

Example RAG Pipeline

Complete document processing pipeline:

#!/bin/bash
# RAG Pipeline: Ingest Document

DOC_PATH="/data/uploads/handbook.pdf"
DOC_ID=$(uuidgen)

# 1. Extract text from PDF
TEXT=$(pdftotext "$DOC_PATH" -)

# 2. Store metadata in PostgreSQL
psql -h postgresql -U postgres -d knowledge_base <<EOF
INSERT INTO documents (id, title, content_type, file_path)
VALUES (
  '$DOC_ID',
  'ML Handbook',
  'application/pdf',
  '$DOC_PATH'
);
EOF

# 3. Index in Meilisearch
curl -X POST "http://meilisearch:7700/indexes/documents/documents" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[{
    "id": "'$DOC_ID'",
    "title": "ML Handbook",
    "content": "'$(echo $TEXT | jq -Rs .)'"
  }]'

# 4. Chunk text
CHUNKS=$(echo "$TEXT" | split -b 1000 -)

# 5. Generate embeddings and store in Qdrant
CHUNK_IDX=0
for CHUNK in $CHUNKS; do
  # Generate embedding with Ollama
  EMBEDDING=$(curl -s -X POST "http://ollama:11434/api/embed" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"nomic-embed-text\", \"input\": [\"$CHUNK\"]}" \
    | jq -r '.embeddings[0]')
  
  # Store in Qdrant
  curl -X PUT "http://qdrant:6333/collections/documents/points" \
    -H "Content-Type: application/json" \
    -d "{
      \"points\": [{
        \"id\": $RANDOM,
        \"vector\": $EMBEDDING,
        \"payload\": {
          \"document_id\": \"$DOC_ID\",
          \"chunk_index\": $CHUNK_IDX,
          \"text\": \"$CHUNK\"
        }
      }]
    }"
  
  CHUNK_IDX=$((CHUNK_IDX + 1))
done

echo "Document $DOC_ID ingested successfully"

Configuration

Environment Variables

# Qdrant
QDRANT_HOST=qdrant
QDRANT_PORT=6333

# PostgreSQL
POSTGRES_HOST=postgresql
POSTGRES_PORT=5432
POSTGRES_DB=knowledge_base
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<generated>

# Meilisearch
MEILISEARCH_HOST=meilisearch
MEILISEARCH_PORT=7700
MEILISEARCH_API_KEY=<generated>

Collection Patterns

Recommended structures: Qdrant Collections:

documents - Document chunk embeddings
questions - FAQ embeddings
code_snippets - Code example embeddings

PostgreSQL Tables:

documents - Document metadata
document_chunks - Chunk text and pointers
users - User accounts
permissions - Access control

Meilisearch Indexes:

documents - Full document content
users - User search
tags - Tag autocomplete

Memory Requirements

Qdrant: ~512 MB base + vector data
PostgreSQL: ~256 MB base + table data
Meilisearch: ~512 MB base + index data

Total: ~2 GB minimum (scales with data)

Performance Tips

Qdrant

Use appropriate vector dimensions (1536 for OpenAI, 384 for small models)
Create payload indexes on filtered fields
Batch operations when ingesting large datasets

PostgreSQL

Create indexes on frequently queried columns
Use JSONB for flexible metadata storage
Enable connection pooling (pgBouncer)

Meilisearch

Configure searchableAttributes to only necessary fields
Use filterableAttributes for faceted search
Set appropriate ranking rules for your use case

Hybrid Search Strategy

Combine all three for best results:

// 1. Keyword search (fast, exact matches)
const keywordResults = await meilisearch.search(query);

// 2. Semantic search (understands meaning)
const embedding = await generateEmbedding(query);
const semanticResults = await qdrant.search(embedding);

// 3. Merge and rank results
const combined = mergeResults(keywordResults, semanticResults);

// 4. Fetch full metadata from PostgreSQL
const enriched = await fetchMetadata(combined);

return enriched;

Included Services

Qdrant

PostgreSQL

Meilisearch

Skills Provided

Qdrant Memory

PostgreSQL Query

Meilisearch Index

Use Cases

RAG (Retrieval-Augmented Generation)

Document Management System

Knowledge Graph

Wiki / Documentation Site

Example RAG Pipeline

Configuration

Environment Variables

Collection Patterns

Memory Requirements

Performance Tips

Qdrant

PostgreSQL

Meilisearch

Hybrid Search Strategy

Next Steps

Local AI Pack

Research Agent Pack

Documentation Index

​Included Services

Qdrant

PostgreSQL

Meilisearch

​Skills Provided

​Qdrant Memory

​PostgreSQL Query

​Meilisearch Index

​Use Cases

​RAG (Retrieval-Augmented Generation)

​Document Management System

​Knowledge Graph

​Wiki / Documentation Site

​Example RAG Pipeline

​Configuration

​Environment Variables

​Collection Patterns

​Memory Requirements

​Performance Tips

​Qdrant

​PostgreSQL

​Meilisearch

​Hybrid Search Strategy

​Next Steps

Local AI Pack

Research Agent Pack

Included Services

Skills Provided

Qdrant Memory

PostgreSQL Query

Meilisearch Index

Use Cases

RAG (Retrieval-Augmented Generation)

Document Management System

Knowledge Graph

Wiki / Documentation Site

Example RAG Pipeline

Configuration

Environment Variables

Collection Patterns

Memory Requirements

Performance Tips

Qdrant

PostgreSQL

Meilisearch

Hybrid Search Strategy

Next Steps