RAG Pipeline Documentation
End-to-end flow for uploads: storage, chunking, embeddings in pgvector, retrieval per question, then model calls with retrieved context attached.
Overview
Users can:
- Upload PDF, TXT, or Markdown
- Ask questions scoped to their library
- Get answers with retrieved chunks feeding the prompt
Architecture
Document Upload → Storage → Vectorization → Embeddings DB
↓
User Question → Embed Query → Similarity Search → Context Injection
↓
AI Response
Upload Flow
1. File Upload (/api/upload)
// Accepts: PDF, TXT, MD files (max 10MB)
// Stores in Supabase Storage: uploads/{user_id}/{filename}
// Creates document record with status: 'processing'
2. Vectorization (/api/rag)
Triggered after upload:
// 1. Download file from Storage
// 2. Extract text (PDF uses pdf2json)
// 3. Split into chunks (1000 chars, 200 overlap)
// 4. Generate embeddings via OpenAI
// 5. Store in document_embeddings table
// 6. Update document status: 'completed'
Configuration
All RAG settings are in src/config/rag.ts:
Chunking Settings
chunking: {
chunkSize: 1000, // Characters per chunk
chunkOverlap: 200, // Overlap between chunks
maxChunks: 10000, // Safety limit
minChunkSize: 100, // Minimum chunk size
}
Embedding Settings
embedding: {
model: "text-embedding-3-small",
dimensions: 1536,
batchSize: 20,
}
Retrieval Settings
retrieval: {
defaultTopK: 5, // Chunks to return
matchThreshold: 0.0, // Minimum similarity
listQueryMultiplier: 4, // More chunks for list queries
}
Retrieval Process
1. Query Classification
Queries are classified to optimize retrieval:
// List queries: "what are", "list", "enumerate"
// → Retrieves more chunks (topK × 4)
// Meta queries: "summarize", "overview"
// → Uses lower similarity threshold
2. Similarity Search
// 1. Embed user question
// 2. Try RPC-based pgvector search
// 3. Fallback to JS-based similarity
// 4. Apply boosts for structured content
// 5. Diversify results across document sections
3. Context Injection
Retrieved chunks are injected into the system prompt:
const systemPrompt = `
You have access to the following context from uploaded documents.
Use this information to answer questions accurately.
Context from document:
${relevantChunks.map(c => c.content).join('\n\n---\n\n')}
`;
Chat API Integration
When documents are attached to a chat:
// In /api/chat/route.ts
if (documentIds && documentIds.length > 0) {
const relevantChunks = await retrieveRelevantChunks(
userQuestion,
documentIds,
RAG_CONFIG.retrieval.defaultTopK
);
if (relevantChunks.length > 0) {
systemPrompt = buildRagSystemPrompt(contextText);
}
}
Supported File Types
| Type | MIME Type | Extraction |
|---|---|---|
| application/pdf | pdf2json | |
| Text | text/plain | Direct read |
| Markdown | text/markdown | Direct read |
Best Practices
Document Preparation
- Use clear headings - Helps chunk boundaries
- Keep paragraphs focused - Better semantic matching
- Include key terms - Improves retrieval accuracy
Query Tips
- Be specific - "What are the 5 key features?" vs "features?"
- Reference document - "According to the document..."
- Ask follow-ups - Build on previous context
Troubleshooting
No Results Found
- Check document status is 'completed'
- Verify embeddings exist in database
- Lower similarity threshold in config
Irrelevant Results
- Increase
defaultTopKfor more context - Enable query diversification
- Check chunk size settings
Slow Retrieval
- Ensure pgvector index exists
- Reduce
batchSizefor embedding - Use RPC function instead of direct SQL