LLM Provider Architecture¶
Implementation Status
Current State: This document describes the planned architecture for multi-provider LLM support. The abstraction layer design is documented, but the current implementation (Phase 2) is tightly coupled to Anthropic Claude.
Status: - 🚧 Provider abstraction layer - Designed (Phase 6-7) - ✅ Anthropic Claude integration - Operational - 🚧 OpenAI provider - Planned (Phase 6-7) - 🚧 Ollama local models - Planned (Phase 6-7)
Migration Plan: See docs/planning/2025-12-03-llm-agnosticism-plan.md for detailed decoupling strategy.
Overview¶
The Poolula Platform chatbot is designed to use a provider-agnostic abstraction layer to support multiple Large Language Model (LLM) backends. This design will allow seamless switching between cloud-based providers (Anthropic, OpenAI) and local models (Ollama) without changing business logic.
Architecture Principles¶
- Provider Independence: Core RAG logic doesn't depend on specific LLM APIs
- Unified Interface: All providers implement the same
LLMProviderbase class - Easy Extensibility: Adding new providers requires only implementing the interface
- Configuration-Based Switching: Change providers via environment variable (
LLM_PROVIDER)
Provider Abstraction Layer¶
Base Interface¶
File: apps/chatbot/llm_providers/base.py
class LLMProvider(ABC):
"""Abstract base class for LLM providers"""
@abstractmethod
def generate(
self,
messages: List[LLMMessage],
system_prompt: Optional[str] = None,
tools: Optional[List[Dict]] = None,
temperature: float = 0.0,
max_tokens: int = 800,
**kwargs
) -> LLMResponse:
"""Generate a response from the LLM"""
pass
@abstractmethod
def translate_tool_definition(self, tool_def: Dict[str, Any]) -> Dict[str, Any]:
"""Translate from internal tool format to provider-specific format"""
pass
@property
@abstractmethod
def provider_name(self) -> str:
"""Return provider name for logging/debugging"""
pass
@property
@abstractmethod
def supports_native_tool_calling(self) -> bool:
"""Whether provider has native tool calling support"""
pass
Data Models¶
LLMMessage: Provider-agnostic message format
LLMToolCall: Provider-agnostic tool call representation
LLMResponse: Unified response format
@dataclass
class LLMResponse:
text: Optional[str] = None
tool_calls: Optional[List[LLMToolCall]] = None
stop_reason: str = "complete"
raw_response: Any = None
content: List[Any] = field(default_factory=list)
Supported Providers¶
1. Anthropic Claude (Default)¶
File: apps/chatbot/llm_providers/anthropic_provider.py
Characteristics: - Native Tool Calling: Yes (function calling API) - Models: claude-sonnet-4-20250514 (default), claude-opus-4-5-20251101 - Cost: ~\(3/M input tokens, ~\)15/M output tokens - Latency: 1-3s typical - Best For: Production use, highest quality responses
Configuration:
Key Features: - Direct integration with Anthropic SDK - Multi-round tool calling support - Tool definitions pass through directly (no translation needed) - Excellent at following system prompts
2. OpenAI GPT¶
File: apps/chatbot/llm_providers/openai_provider.py
Characteristics: - Native Tool Calling: Yes (function calling API) - Models: gpt-4o (default), gpt-4o-mini - Cost: ~\(2.50/M input tokens, ~\)10/M output tokens (17% cheaper than Claude) - Latency: 1-4s typical - Best For: Cost-effective alternative to Claude
Configuration:
Key Features: - Translates between Anthropic and OpenAI tool formats - Slightly different response phrasing (affects keyword matching scores) - Compatible with Azure OpenAI endpoints (future enhancement)
Tool Format Translation:
# Anthropic format (internal)
{
"name": "query_database",
"description": "Query the business database",
"input_schema": {"type": "object", "properties": {...}}
}
# OpenAI format (translated)
{
"type": "function",
"function": {
"name": "query_database",
"description": "Query the business database",
"parameters": {"type": "object", "properties": {...}}
}
}
3. Ollama (Local Models)¶
File: apps/chatbot/llm_providers/ollama_provider.py
Characteristics: - Native Tool Calling: No (uses prompt-based approach) - Models: llama3.1:8b-instruct-q4_K_M (recommended), mistral:7b-instruct-q4_0, qwen2.5:7b-instruct-q4_K_M - Cost: Free (runs locally) - Latency: 5-15s on CPU, 1-4s on GPU - Best For: Learning, privacy-focused use, offline capability
Configuration:
LLM_PROVIDER=ollama
LOCAL_MODEL_PATH=llama3.1:8b-instruct-q4_K_M
LOCAL_MODEL_URL=http://localhost:11434
Key Features: - Connects to Ollama via HTTP API - Prompt-based tool calling (instructs model to output JSON) - Free and private (no data leaves machine) - Requires Ollama installed separately
Prompt-Based Tool Calling:
Since local models lack native tool calling APIs, we inject tool descriptions into the system prompt:
def _build_tool_prompt(self, tools: List[Dict]) -> str:
"""Build tool usage instructions for prompt engineering"""
tool_descriptions = [f"- **{tool['name']}**: {tool['description']}" for tool in tools]
return f"""Available Tools:
{chr(10).join(tool_descriptions)}
To use a tool, respond with ONLY JSON: {{"tool": "tool_name", "parameters": {{...}}}}"""
The model responds with JSON, which we parse back into LLMToolCall objects.
Provider Factory¶
File: apps/chatbot/rag_system.py (line 48)
def _create_llm_provider(self, config) -> LLMProvider:
"""Factory method for creating LLM providers based on configuration"""
provider_type = config.LLM_PROVIDER.lower()
if provider_type == "anthropic":
return AnthropicProvider(
api_key=config.ANTHROPIC_API_KEY,
model=config.ANTHROPIC_MODEL
)
elif provider_type == "openai":
from .llm_providers import OpenAIProvider
if OpenAIProvider is None:
raise ImportError("Install with: uv sync --group openai")
return OpenAIProvider(
api_key=config.OPENAI_API_KEY,
model=config.OPENAI_MODEL
)
elif provider_type == "ollama":
from .llm_providers import OllamaProvider
if OllamaProvider is None:
raise ImportError("Install with: uv sync --group local")
return OllamaProvider(
model=config.LOCAL_MODEL_PATH,
base_url=config.LOCAL_MODEL_URL
)
else:
raise ValueError(f"Unknown LLM provider: {config.LLM_PROVIDER}")
Provider Comparison¶
How to Run Comparison¶
# Compare all providers
python scripts/evaluate_providers.py --providers anthropic openai ollama
# Compare specific providers
python scripts/evaluate_providers.py --providers anthropic openai
# View results
cat data/provider_comparison.json
See docs/evaluation/provider-comparison.md for detailed guide.
Baseline Performance (To Be Established)¶
Run the comparison script to establish baseline scores for each provider. Results will include:
- Overall Scores: Average across 15 golden questions
- Component Breakdown: Tool usage, response quality, error handling
- Category Performance: Scores by question category (property_info, compliance, etc.)
- Per-Question Analysis: Where providers diverge
Expected Results:
Based on implementation characteristics, we anticipate:
| Provider | Tool Usage | Response Quality | Error Handling | Overall (Expected) |
|---|---|---|---|---|
| Anthropic | 90-95% | 85-90% | 95-100% | 85-90% |
| OpenAI | 90-95% | 80-85% | 95-100% | 82-87% |
| Ollama | 85-90% | 60-70% | 90-95% | 70-75% |
Key Factors: - Tool usage is high across all providers (abstraction works well) - Response quality varies due to model size and keyword matching - Local models trade quality for privacy/cost benefits
To populate actual results:
1. Run: python scripts/evaluate_providers.py --providers anthropic openai ollama
2. Update this section with real scores
3. Add insights about provider-specific strengths/weaknesses
Decision Matrix: Choosing a Provider¶
Use Anthropic When:¶
- ✅ You need the best quality responses
- ✅ Budget is not the primary concern (~$6/1K queries)
- ✅ Multi-round tool calling is critical
- ✅ Production deployment with high-stakes queries
Use OpenAI When:¶
- ✅ You want cost optimization (~$5/1K queries, 17% savings)
- ✅ You have existing OpenAI credits or infrastructure
- ✅ Quality requirements are 80-85% vs Claude's 85-90%
- ✅ You need comparable performance to Anthropic
Use Ollama When:¶
- ✅ Privacy is critical (medical, legal, sensitive data)
- ✅ You want zero ongoing costs (free)
- ✅ You have decent hardware (16GB+ RAM for CPU, or GPU)
- ✅ You're learning/experimenting with local LLMs
- ✅ Offline capability is required
- ✅ Acceptable quality threshold is 70-75%
Implementation Details¶
Multi-Round Tool Calling¶
The AIGenerator class handles multi-round tool calling in a provider-agnostic way:
def generate_response(self, query, conversation_history, tools, tool_manager):
"""Generate response using provider-agnostic interface"""
messages = [LLMMessage(role="user", content=query)]
# Multi-round tool calling loop
max_rounds = 2
for round_num in range(max_rounds):
response = self.provider.generate(
messages=messages,
system_prompt=system_prompt,
tools=provider_tools,
temperature=0,
max_tokens=800
)
# Check for tool usage
if response.stop_reason == "tool_use" and response.tool_calls:
# Execute tools and continue
tool_results = self._execute_tools(response.tool_calls, tool_manager)
messages.append(LLMMessage(role="assistant", content=response))
messages.append(LLMMessage(role="user", content=tool_results))
continue
else:
# Return final response
return response.text
This works identically across all providers, regardless of native vs prompt-based tool calling.
Error Handling¶
Each provider implements graceful error handling:
- Anthropic: API errors wrapped with retry logic
- OpenAI: API errors wrapped with retry logic
- Ollama: Connection errors if Ollama not running, clear error messages
Testing & Validation¶
Unit Tests¶
File: tests/chatbot/test_ai_generator_integration.py
- 12 tests for AI generator with provider abstraction
- All tests pass with AnthropicProvider
- Mock provider responses for deterministic testing
Integration Tests¶
File: tests/chatbot/test_rag_system.py
- 31/37 tests passing (6 failures unrelated to provider abstraction)
- Tests RAG system with provider factory
Evaluation Harness¶
File: scripts/evaluate_chatbot.py
- 15 golden questions covering 8 categories
- 3-component scoring: tool usage (40%), response quality (40%), error handling (20%)
- Provider-agnostic (tests RAG system, not specific provider)
Multi-Provider Comparison: scripts/evaluate_providers.py
Future Enhancements¶
Short-Term (Next 6 Months)¶
- Azure OpenAI Support: Add Azure endpoints to OpenAI provider
- Streaming Responses: Add streaming interface to base provider
- Token Usage Tracking: Log tokens per query for cost analysis
- Provider-Specific Prompt Tuning: Optimize system prompts per provider
Medium-Term (6-12 Months)¶
- Local Model Fine-Tuning: Fine-tune Llama on Poolula-specific data
- Hybrid Approach: Use cheap provider for simple queries, expensive for complex
- Fallback Chain: Try OpenAI if Anthropic fails, Ollama if both fail
- Custom Evaluation Metrics: LLM-as-judge scoring, semantic similarity
Long-Term (12+ Months)¶
- Provider Auto-Selection: ML model predicts best provider per query
- Cost Optimization: Automatically route queries based on cost/quality trade-off
- Multi-Provider Ensembling: Combine responses from multiple providers
- Local Model Scaling: Support larger local models (70B+) with quantization
Related Documentation¶
- Setup Guide:
docs/workflows/llm-provider-setup.md- How to configure each provider - Comparison Guide:
docs/evaluation/provider-comparison.md- How to run and interpret comparisons - Qualitative Checklist:
docs/evaluation/provider_checklist.md- Manual testing guide - Planning Document:
docs/planning/2025-12-03-llm-agnosticism-plan.md- Original implementation plan - Project Overview:
CLAUDE.md- High-level project context
Revision History¶
- 2025-12-04: Initial architecture document with provider abstraction layer
- Future: Update with actual baseline comparison results after running evaluation