LLM Provider Setup Guide¶

This guide explains how to configure and switch between different LLM providers in the Poolula Platform chatbot.

Overview¶

The chatbot supports three LLM providers: - Anthropic Claude (default) - Best quality, native tool calling, requires API key - OpenAI GPT (optional) - Alternative provider, native tool calling, requires API key - Ollama (optional) - Local models, privacy-focused, free, prompt-based tool calling

Provider Setup¶

Anthropic Claude (Default)¶

Prerequisites: - Anthropic API key from https://console.anthropic.com/

Installation:

# Already included in base RAG dependencies
uv sync --group rag

Configuration:

# .env
LLM_PROVIDER=anthropic  # This is the default
ANTHROPIC_API_KEY=sk-ant-...
# Optional: Override default model
# ANTHROPIC_MODEL=claude-sonnet-4-20250514

Usage:

from apps.chatbot.rag_system import RAGSystem
from apps.chatbot.config import Config

config = Config()
rag = RAGSystem(config)
response, sources = rag.query("What is our EIN number?")

OpenAI GPT¶

Prerequisites: - OpenAI API key from https://platform.openai.com/

Installation:

# Install OpenAI provider dependencies
uv sync --group openai

Configuration:

# .env
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# Optional: Override default model
# OPENAI_MODEL=gpt-4o  # or gpt-4o-mini for lower cost

Usage:

from apps.chatbot.rag_system import RAGSystem
from apps.chatbot.config import Config

config = Config()  # Will use OPENAI provider from .env
rag = RAGSystem(config)
response, sources = rag.query("What properties do we own?")

Cost Comparison: - GPT-4o: ~$2.50/M input tokens, $10/M output tokens - Claude Sonnet: ~$3/M input tokens, $15/M output tokens - Typical query: ~2K tokens total

Ollama (Local Models)¶

Prerequisites: - Ollama installed: https://ollama.ai/download - Sufficient RAM (16GB+ for 7B models)

Installation:

# 1. Install Ollama from https://ollama.ai
# 2. Pull a model
ollama pull llama3.1:8b-instruct-q4_K_M

# 3. Install local provider dependencies (requests already in base)
uv sync --group local

Configuration:

# .env
LLM_PROVIDER=ollama
LOCAL_MODEL_PATH=llama3.1:8b-instruct-q4_K_M
# Optional: Override Ollama URL
# LOCAL_MODEL_URL=http://localhost:11434

Usage:

from apps.chatbot.rag_system import RAGSystem
from apps.chatbot.config import Config

config = Config()  # Will use Ollama from .env
rag = RAGSystem(config)
response, sources = rag.query("List our business documents")

Recommended Models:

Model	Size (Q4)	Context	Speed (CPU)	Notes
llama3.1:8b-instruct-q4_K_M	4.7GB	128K	Medium	Best balance
mistral:7b-instruct-q4_0	4.1GB	32K	Fast	Concise responses
qwen2.5:7b-instruct-q4_K_M	4.4GB	128K	Medium	Strong reasoning

Limitations: - ⚠️ Slower than API providers (5-15s on CPU vs 1-3s) - ⚠️ Tool calling via prompt engineering (less reliable) - ⚠️ May need prompt tuning per model - ✅ Free and private (no data leaves your machine) - ✅ Offline capable

Switching Providers¶

You can switch providers by changing the LLM_PROVIDER environment variable:

# Test with different providers
export LLM_PROVIDER=anthropic
python scripts/test_query.py "What is our property address?"

export LLM_PROVIDER=openai
python scripts/test_query.py "What is our property address?"

export LLM_PROVIDER=ollama
python scripts/test_query.py "What is our property address?"

Provider Comparison¶

Tool Calling Support¶

Provider	Native Tools	Reliability	Notes
Anthropic	✅ Yes	Excellent	Production-ready
OpenAI	✅ Yes	Excellent	Production-ready
Ollama	❌ Prompt-based	Good	May miss complex tool uses

Latency & Cost¶

Provider	P50 Latency	Cost per 1K queries	Privacy
Anthropic	1-3s	~$6	Data sent to API
OpenAI	1-4s	~$5	Data sent to API
Ollama (CPU)	5-15s	Free	Fully local
Ollama (GPU)	1-4s	Free	Fully local

Use Cases¶

Use Anthropic when: - You need the best quality responses - Budget is not the primary concern - Multi-round tool calling is critical

Use OpenAI when: - You want cost optimization - You have existing OpenAI credits - You need comparable quality to Anthropic

Use Ollama when: - Privacy is critical (medical, legal, sensitive data) - You want zero ongoing costs - You have decent hardware (16GB+ RAM) - You're learning/experimenting with local LLMs - Offline capability is required

Troubleshooting¶

OpenAI Provider¶

Error: "OpenAI provider requires the 'openai' package"

uv sync --group openai

Error: "OPENAI_API_KEY is required"

echo "OPENAI_API_KEY=sk-..." >> .env

Ollama Provider¶

Error: "Failed to connect to Ollama"

# Check if Ollama is running
ollama list

# Start Ollama if needed (it usually runs automatically)
# On macOS: Open Ollama app
# On Linux: systemctl start ollama

Error: "LOCAL_MODEL_PATH is required"

# Pull a model first
ollama pull llama3.1:8b-instruct-q4_K_M

# Then set in .env
echo "LOCAL_MODEL_PATH=llama3.1:8b-instruct-q4_K_M" >> .env

Slow responses:

# Check CPU threads (macOS/Linux)
export OLLAMA_NUM_THREADS=4  # Adjust for your CPU

# Consider using a smaller model
ollama pull qwen2.5:3b-instruct-q4_K_M

Next Steps¶

Evaluate providers: Use scripts/evaluate_chatbot.py to compare quality
Monitor costs: Track API usage for Anthropic/OpenAI
Experiment: Try different models with Ollama to find the best fit

For detailed implementation information, see docs/planning/2025-12-03-llm-agnosticism-plan.md.