Building RAG Apps
Create AI applications powered by your documents and text
RAG Flow Overview
Three-step retrieval-augmented generation
User Question
"What are the Q3 revenue figures?"
1 Search
Semantic Search
POST /seeds/queryFind relevant document/text chunks using vector similarity
2 Context
Generate Context
POST /seeds/generate-contextCompile matching seeds into LLM-ready context
3 Generate
LLM Generation
OpenAI / ClaudeSend context + question to LLM for grounded response
AI Response
"Based on the Q3 report, revenue was $12.5M..."
User Question
1 Semantic Search
POST /seeds/query2 Generate Context
POST /seeds/generate-context3 LLM Generation
OpenAI / Claude AI Response
Generate Context
Compile seeds into LLM-ready context
const response = await fetch(`${baseUrl}/seeds/generate-context?${params}`, {
method: 'POST',
headers: { ...headers, 'Content-Type': 'application/json' },
body: JSON.stringify({
seedIds: ['seed-123', 'seed-456'],
model: 'gpt-4'
})
});
const context = await response.json();
// {
// content: "Document: Q3 Report\n\nThe quarterly...",
// totalTokens: 2500,
// seedCount: 2,
// seeds: [{ id: "...", title: "...", tokens: 1500 }]
// }Complete RAG with OpenAI
import OpenAI from 'openai';
const openai = new OpenAI();
async function ragQuery(question: string): Promise<string> {
// 1. Search for relevant chunks
const searchRes = await fetch(`${baseUrl}/seeds/query?${params}`, {
method: 'POST',
headers: { ...headers, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: question, limit: 5, threshold: 0.7 })
});
const { results } = await searchRes.json();
if (results.length === 0) {
return "I couldn't find relevant information.";
}
// 2. Generate context
const seedIds = [...new Set(results.map((r: any) => r.seedId))];
const contextRes = await fetch(`${baseUrl}/seeds/generate-context?${params}`, {
method: 'POST',
headers: { ...headers, 'Content-Type': 'application/json' },
body: JSON.stringify({ seedIds, model: 'gpt-4' })
});
const context = await contextRes.json();
// 3. Generate with OpenAI
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `Answer based on this context:\n${context.content}`
},
{ role: 'user', content: question }
]
});
return completion.choices[0].message.content || '';
}Complete RAG with Claude
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
async function ragQuery(question: string): Promise<string> {
// 1. Search + 2. Generate context (same as above)
const results = await searchSeeds(question);
const context = await generateContext(results);
// 3. Generate with Claude
const message = await anthropic.messages.create({
model: 'claude-3-opus-20240229',
max_tokens: 1024,
system: `Answer based on this context:\n${context.content}`,
messages: [{ role: 'user', content: question }]
});
return message.content[0].text;
}Chatbot with History
class RAGChatbot {
private openai = new OpenAI();
private history: { role: string; content: string }[] = [];
async chat(userMessage: string): Promise<string> {
this.history.push({ role: 'user', content: userMessage });
// Search and get context
const results = await searchSeeds(userMessage);
const context = results.length > 0
? await generateContext(results)
: { content: '' };
// Generate response with history
const completion = await this.openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `You have access to user documents.
${context.content ? `Context:\n${context.content}` : ''}`
},
...this.history
]
});
const response = completion.choices[0].message.content || '';
this.history.push({ role: 'assistant', content: response });
// Keep last 20 messages
if (this.history.length > 20) {
this.history = this.history.slice(-20);
}
return response;
}
}
const bot = new RAGChatbot();
console.log(await bot.chat('What is in my documents?'));
console.log(await bot.chat('Tell me more about the revenue'));Best Practices
Token Management
- Check
totalTokensbefore sending to LLM - GPT-4 Turbo: 128K context, Claude 3: 200K context
- Leave room for response tokens (1000-4000)
Search Optimization
- Start with threshold 0.7, adjust based on results
- Use 3-5 results for focused answers
- Filter by bundles for domain-specific queries
Security
- Never expose API keys in frontend code
- Use environment variables for credentials
- Use externalUserId for proper data isolation