System Design¶
Poolula Platform uses a modular architecture that separates concerns while maintaining simplicity for a single-developer project.
Architecture Principles¶
- Data-First - Database as single source of truth
- Modular Monolith - Clear module boundaries without microservices complexity
- Type Safety - Pydantic models throughout
- Provenance Tracking - Every data point records its source
- Progressive Disclosure - Simple defaults with detailed options available
High-Level Architecture¶
graph TB
subgraph "Data Sources"
A1[Airbnb CSV]
A2[Bank Statements]
A3[Expense CSV]
A4[Manual Entry]
end
subgraph "Import Layer"
B1[import_airbnb_transactions.py]
B2[import_expenses.py]
B3[seed_database.py]
B4[ingest_documents.py]
end
subgraph "Data Layer"
C1[(SQLite Database<br/>5 Tables)]
C2[(ChromaDB<br/>Vector Store)]
end
subgraph "Service Layer"
D1[FastAPI REST API]
D2[RAG System]
D3[Tool Manager]
end
subgraph "Interface Layer"
E1[Frontend<br/>Vanilla JS]
E2[CLI Scripts]
E3[Jupyter Notebooks]
end
subgraph "External Services"
F1[Claude API<br/>Anthropic]
end
A1 --> B1
A2 --> B2
A3 --> B2
A4 --> B3
B1 --> C1
B2 --> C1
B3 --> C1
B4 --> C2
C1 --> D1
C2 --> D2
D2 --> D3
D3 --> F1
D1 --> E1
D1 --> E2
D1 --> E3
style C1 fill:#4A7C59
style C2 fill:#2C5282
style D2 fill:#2C5282
style F1 fill:#E07A5F
Core Components¶
1. Database Layer¶
SQLite Database - Single source of truth for structured data
Tables:
erDiagram
Property ||--o{ Transaction : has
Property ||--o{ Document : has
Property ||--o{ Obligation : has
Transaction ||--|| Provenance : tracks
Property {
uuid id PK
string address
date acquisition_date
decimal purchase_price_total
decimal land_basis
decimal building_basis
decimal ffe_basis
date placed_in_service
string status
}
Transaction {
uuid id PK
uuid property_id FK
date transaction_date
decimal amount
string category
string transaction_type
string description
string source_account
}
Document {
uuid id PK
uuid property_id FK
string filename
string doc_type
date effective_date
string version
string confidentiality
string storage_path
}
Obligation {
uuid id PK
uuid property_id FK
string obligation_type
date due_date
string status
string description
string recurrence
}
Provenance {
uuid id PK
uuid transaction_id FK
string source_type
string source_id
float confidence
string notes
}
Design Decisions:
- SQLite for development/small deployments (easy backup, no server)
- UUIDs for primary keys (distributed system ready)
- Soft deletes via
deleted_attimestamp (audit trail preservation) - Computed properties in SQLModel for derived values (e.g.,
total_basis)
2. Vector Store Layer¶
ChromaDB - Semantic search for business documents
Collections:
- document_catalog - Document metadata (titles, types, dates)
- document_content - Chunked document content with embeddings
Embedding Strategy:
- ONNXMiniLM_L6_V2 (ChromaDB default)
- No external model dependencies (self-contained)
- Chunk size: 800 characters
- Chunk overlap: 200 characters
3. Service Layer¶
FastAPI REST API¶
Responsibilities:
- HTTP request handling
- Input validation (Pydantic models)
- Business logic orchestration
- Response formatting
- CORS handling
Endpoints:
/api/query- Chatbot queries/api/v1/properties- Property CRUD/api/v1/transactions- Transaction CRUD/api/v1/documents- Document metadata/api/v1/obligations- Obligation CRUD/health- Health checks
RAG System¶
Components:
graph LR
A[User Query] --> B[RAG System]
B --> C[Tool Manager]
C --> D{Tool Selection}
D -->|Structured Data| E[Database Query Tool]
D -->|Document Search| F[Document Search Tool]
D -->|List Docs| G[List Documents Tool]
E --> H[SQLite]
F --> I[ChromaDB]
G --> I
H --> J[Results]
I --> J
J --> K[Claude API]
K --> L[Final Answer]
style B fill:#2C5282
style K fill:#E07A5F
Tool System:
- query_database - SQL SELECT queries (properties, transactions, documents, obligations)
- search_document_content - Semantic search in documents
- list_business_documents - List available documents
Multi-Round Tool Calling:
- Maximum 2 rounds per query
- Sequential reasoning (use first tool results to inform second tool)
- Example: Query properties → Search specific property documents
4. Interface Layer¶
Frontend (Vanilla JS)¶
Why Vanilla JS?
- No build step required
- Fast page loads
- Easy to understand and modify
- No framework lock-in
Components:
- Chat interface with markdown rendering (Marked.js)
- 4 persona-based help sections
- Document upload (drag & drop)
- Resources sidebar
CLI Scripts¶
Purpose:
- Data import automation
- Document ingestion
- Database seeding
- Interactive chatbot (for terminal users)
Key Scripts:
cli.py- Interactive chatbot CLIimport_airbnb_transactions.py- Import Airbnb dataingest_documents.py- Process and embed documentsseed_obligations.py- Create compliance deadlines
Data Flow¶
Query Processing Flow¶
sequenceDiagram
participant User
participant Frontend
participant API
participant RAG
participant Tools
participant DB
participant ChromaDB
participant Claude
User->>Frontend: Ask question
Frontend->>API: POST /api/query
API->>RAG: Process query
RAG->>Claude: Determine tools needed
Claude->>RAG: Use query_database tool
RAG->>Tools: Execute database query
Tools->>DB: SQL SELECT
DB->>Tools: Results
Tools->>RAG: Formatted results
RAG->>Claude: Generate response with data
Claude->>RAG: Final answer
RAG->>API: Response + sources
API->>Frontend: JSON response
Frontend->>User: Display answer + sources
Document Ingestion Flow¶
sequenceDiagram
participant Script
participant Processor
participant ChromaDB
participant Metadata
Script->>Metadata: Load metadata CSV
Script->>Processor: Process document
Processor->>Processor: Extract text (PDF/DOCX)
Processor->>Processor: Chunk content (800 chars)
Processor->>ChromaDB: Add document metadata
Processor->>ChromaDB: Add content chunks
ChromaDB->>ChromaDB: Generate embeddings
ChromaDB->>Script: Success
Transaction Import Flow¶
sequenceDiagram
participant CSV
participant Script
participant Parser
participant DB
participant Provenance
CSV->>Script: Airbnb export
Script->>Parser: Parse CSV rows
Parser->>Parser: Apply accrual accounting
Parser->>Parser: Categorize transactions
Parser->>DB: Insert transactions
DB->>Provenance: Record source (CSV)
Provenance->>DB: Link provenance
DB->>Script: Success + summary
Design Patterns¶
1. Provenance Tracking¶
Every transaction records its source:
class Provenance(SQLModel, table=True):
id: UUID = Field(default_factory=uuid4, primary_key=True)
transaction_id: UUID = Field(foreign_key="transaction.id")
source_type: SourceType # CSV_IMPORT, MANUAL_ENTRY, etc.
source_id: str # filename, user ID, etc.
confidence: float = 1.0
notes: Optional[str] = None
Benefits:
- Audit trail for all data
- Identify data quality issues
- Support data lineage queries
2. Soft Deletes¶
Records are never hard-deleted:
class BaseModel(SQLModel):
created_at: datetime = Field(default_factory=datetime.utcnow)
updated_at: datetime = Field(default_factory=datetime.utcnow)
deleted_at: Optional[datetime] = None
Benefits:
- Preserve history
- Enable "undo" functionality
- Maintain referential integrity
3. Type Safety¶
Pydantic models throughout:
from pydantic import BaseModel
class QueryRequest(BaseModel):
query: str
session_id: Optional[str] = None
class QueryResponse(BaseModel):
answer: str
sources: list
session_id: str
Benefits:
- Automatic validation
- IDE autocomplete
- Self-documenting code
4. Tool-Based Architecture¶
AI uses tools instead of direct DB access:
tools = [
{
"name": "query_database",
"description": "Query structured data",
"input_schema": {...}
},
{
"name": "search_document_content",
"description": "Search documents",
"input_schema": {...}
}
]
Benefits:
- Safety (no SQL injection, no writes)
- Flexibility (add new tools easily)
- Transparency (see what AI is doing)
Scalability Considerations¶
Current (Phase 1)¶
- Database: SQLite (single file)
- Users: Single user / small team
- Documents: 100s of documents
- Transactions: 10,000s of transactions
Future (Production)¶
- Database: PostgreSQL (multi-user support)
- Caching: Redis for session management
- Load Balancing: Multiple API instances
- Authentication: OAuth2 / JWT
- Rate Limiting: Per-user limits
Security Considerations¶
Current State¶
- Local-only deployment
- No authentication (development mode)
- Read-only AI queries (no database writes)
- Local file storage (documents not in cloud)
Production Requirements¶
- Authentication: User login required
- Authorization: Role-based access control (RBAC)
- HTTPS: TLS encryption for all traffic
- API Keys: Secure API key management
- Input Validation: Strict validation on all inputs
- Rate Limiting: Prevent abuse
Technology Choices¶
Why Python 3.13?¶
- Latest language features
- Strong typing support
- Excellent AI/ML ecosystem
- FastAPI performance
Why SQLite?¶
- Zero configuration
- Single file database
- Perfect for development
- Easy backups (copy file)
- Upgrade path to PostgreSQL
Why FastAPI?¶
- High performance (async support)
- Auto-generated API docs
- Type hints throughout
- Modern Python (Pydantic v2)
Why ChromaDB?¶
- Easy local deployment
- Built-in embedding functions
- No external dependencies
- Upgrade path to cloud vector stores
Why Vanilla JS?¶
- No build step
- Fast loading
- Easy to modify
- No framework lock-in
Next Steps¶
- Data Models - Detailed schema documentation
- API Design - API architecture details
- Testing Guide - Testing and development workflow
Questions about the architecture? → FAQ