Building RAG Apps

Create AI applications powered by your documents and text

RAG Flow Overview

Three-step retrieval-augmented generation

User Question
1 Semantic Search
POST /seeds/query
2 Generate Context
POST /seeds/generate-context
3 LLM Generation
OpenAI / Claude
AI Response

Generate Context

Compile seeds into LLM-ready context

const response = await fetch(`${baseUrl}/seeds/generate-context?${params}`, {
  method: 'POST',
  headers: { ...headers, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    seedIds: ['seed-123', 'seed-456'],
    model: 'gpt-4'
  })
});

const context = await response.json();
// {
//   content: "Document: Q3 Report\n\nThe quarterly...",
//   totalTokens: 2500,
//   seedCount: 2,
//   seeds: [{ id: "...", title: "...", tokens: 1500 }]
// }

Complete RAG with OpenAI

import OpenAI from 'openai';

const openai = new OpenAI();

async function ragQuery(question: string): Promise<string> {
  // 1. Search for relevant chunks
  const searchRes = await fetch(`${baseUrl}/seeds/query?${params}`, {
    method: 'POST',
    headers: { ...headers, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: question, limit: 5, threshold: 0.7 })
  });
  const { results } = await searchRes.json();

  if (results.length === 0) {
    return "I couldn't find relevant information.";
  }

  // 2. Generate context
  const seedIds = [...new Set(results.map((r: any) => r.seedId))];
  const contextRes = await fetch(`${baseUrl}/seeds/generate-context?${params}`, {
    method: 'POST',
    headers: { ...headers, 'Content-Type': 'application/json' },
    body: JSON.stringify({ seedIds, model: 'gpt-4' })
  });
  const context = await contextRes.json();

  // 3. Generate with OpenAI
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Answer based on this context:\n${context.content}`
      },
      { role: 'user', content: question }
    ]
  });

  return completion.choices[0].message.content || '';
}

Complete RAG with Claude

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function ragQuery(question: string): Promise<string> {
  // 1. Search + 2. Generate context (same as above)
  const results = await searchSeeds(question);
  const context = await generateContext(results);

  // 3. Generate with Claude
  const message = await anthropic.messages.create({
    model: 'claude-3-opus-20240229',
    max_tokens: 1024,
    system: `Answer based on this context:\n${context.content}`,
    messages: [{ role: 'user', content: question }]
  });

  return message.content[0].text;
}

Chatbot with History

class RAGChatbot {
  private openai = new OpenAI();
  private history: { role: string; content: string }[] = [];

  async chat(userMessage: string): Promise<string> {
    this.history.push({ role: 'user', content: userMessage });

    // Search and get context
    const results = await searchSeeds(userMessage);
    const context = results.length > 0
      ? await generateContext(results)
      : { content: '' };

    // Generate response with history
    const completion = await this.openai.chat.completions.create({
      model: 'gpt-4',
      messages: [
        {
          role: 'system',
          content: `You have access to user documents.
${context.content ? `Context:\n${context.content}` : ''}`
        },
        ...this.history
      ]
    });

    const response = completion.choices[0].message.content || '';
    this.history.push({ role: 'assistant', content: response });

    // Keep last 20 messages
    if (this.history.length > 20) {
      this.history = this.history.slice(-20);
    }

    return response;
  }
}

const bot = new RAGChatbot();
console.log(await bot.chat('What is in my documents?'));
console.log(await bot.chat('Tell me more about the revenue'));

Best Practices

Token Management

  • Check totalTokens before sending to LLM
  • GPT-4 Turbo: 128K context, Claude 3: 200K context
  • Leave room for response tokens (1000-4000)

Search Optimization

  • Start with threshold 0.7, adjust based on results
  • Use 3-5 results for focused answers
  • Filter by bundles for domain-specific queries

Security

  • Never expose API keys in frontend code
  • Use environment variables for credentials
  • Use externalUserId for proper data isolation