Getting Started with Citation Compass¶
Welcome! Citation Compass is your toolkit for exploring academic citation networks through machine learning, graph analysis, and interactive visualization. Whether you're discovering research connections, analyzing citation patterns, or building reading lists, you're in the right place.
New here? Start with Demo Mode
No database setup required! Try Demo Mode first to explore features with curated datasets, then scale up to your own research collections.
Quick Start (3 Steps)¶
1. Install¶
# Clone and install
git clone https://github.com/dagny099/citation-compass.git
cd citation-compass
pip install -e ".[all]"
Virtual Environment Recommended
Using a virtual environment? Great idea! Run python -m venv venv and source venv/bin/activate first.
2. Configure (Optional for Demo Mode)¶
For demo mode, skip this step and go straight to launching!
For production use with your own data:
# Copy template and add your Neo4j credentials
cp .env.example .env
# Edit .env with your Neo4j database details:
# NEO4J_URI=neo4j+s://your-database-url
# NEO4J_USER=neo4j
# NEO4J_PASSWORD=your-password
Free Neo4j Database
Don't have a database yet? Neo4j AuraDB offers free cloud instances perfect for getting started. Note: free instances pause after 30 days of inactivityโsee our Neo4j Health Monitoring guide for keeping them alive!
3. Launch¶
Your browser will open to http://localhost:8501 with the Citation Compass dashboard!
Your First Analysis (Demo Mode)¶
The fastest way to understand what Citation Compass can do:
- Navigate to Demo Datasets in the sidebar
- Select "complete_demo" (13 high-impact papers across AI, neuroscience, physics)
- Click "Load Dataset" and explore:
- ML Predictions: Generate citation recommendations using synthetic embeddings
- Network Analysis: Detect research communities with graph algorithms
- Interactive Visualizations: Click nodes to explore paper details
- Export Results: Generate reports in LaTeX, CSV, or JSON
Demo mode provides the full platform experienceโno database required!
What You Can Do¶
๐ง ML-Powered Citation Prediction¶
Discover hidden connections between papers using TransE embeddings. The model learns semantic relationships in citation networks: papers that cite similar work cluster together in embedding space. Generate predictions with confidence scores, then validate them against your research intuition.
๐ธ๏ธ Network Analysis¶
Explore citation networks with: - Community detection (Louvain, Label Propagation algorithms) - Centrality measures (PageRank, betweenness, eigenvector) - Temporal analysis (track citation trends over time) - Path analysis (find connections between distant papers)
๐ Interactive Visualization¶
The Streamlit interface provides: - Clickable network graphs with paper details on demand - Real-time progress tracking for data imports - Embedding space explorer for visualizing paper relationships - Multi-format export (LaTeX tables, academic reports, CSV)
๐ Research Notebooks¶
Four comprehensive Jupyter notebooks guide you through: 1. Comprehensive Exploration - Data discovery and network analysis 2. Model Training Pipeline - Train custom TransE models on your data 3. Prediction Evaluation - Validate model performance with MRR, Hits@K metrics 4. Narrative Presentation - Generate publication-ready visualizations
System Requirements¶
Minimum: - Python 3.8+ (3.10+ recommended) - 4GB RAM (8GB+ recommended) - 2GB free disk space
For Large Datasets: - 16GB+ RAM for networks with 100K+ papers - SSD recommended for database operations - Optional: CUDA-compatible GPU for faster model training
Supported Platforms: macOS, Linux, Windows (WSL recommended)
Installation Options¶
Choose the profile that fits your needs:
Everything you need for citation analysis
Includes: ML models, analytics, web interface, notebook supportJust the machine learning components
Includes: TransE models, prediction engine, embeddingsInteractive dashboard without ML
Includes: Streamlit app, network visualization, data importDatabase Setup¶
Option 1: Demo Mode (No Database)¶
Perfect for learning and testing!
No setup requiredโjust launch streamlit run app.py and load a demo dataset. Full functionality with synthetic data.
Option 2: Neo4j AuraDB (Cloud, Free Tier)¶
Best for getting started with your own data
- Create account at Neo4j AuraDB
- Create a free database instance
- Download credentials (URI, username, password)
- Add to
.envfile: - Run database setup:
Free Tier Limits
AuraDB free instances pause after 30 days of inactivity. Check out our Neo4j Health Monitoring guide for an automated solution!
Option 3: Local Neo4j (Docker)¶
For advanced users who want full control
docker run \
--name neo4j \
-p7474:7474 -p7687:7687 \
-d \
-v $HOME/neo4j/data:/data \
--env NEO4J_AUTH=neo4j/your-password \
neo4j:latest
Update .env with NEO4J_URI=neo4j://localhost:7687
Verify Your Setup¶
# Test basic functionality
python -c "
from src.services.ml_service import get_ml_service
from src.database.connection import Neo4jConnection
print('โ
ML Service:', get_ml_service().health_check()['status'])
print('โ
Database:', 'connected' if Neo4jConnection().test_connection() else 'check config')
"
# Run test suite
python -m pytest tests/ -v
Common Workflows¶
๐ Research Discovery¶
Find related papers you might have missed
- Load your dataset (demo or imported)
- Navigate to ML Predictions
- Enter a paper ID or search by title
- Generate predictions with confidence scores
- Export recommended reading list
๐ธ๏ธ Network Exploration¶
Understand citation communities
- Go to Enhanced Visualizations
- View interactive network graph
- Run community detection (try Louvain algorithm)
- Explore cross-field connections
- Generate LaTeX report for publication
๐ Model Training¶
Train custom embeddings on your data
- Import your citation network (via search or file upload)
- Open Jupyter:
jupyter notebook notebooks/ - Run
02_model_training_pipeline.ipynb - Evaluate with
03_prediction_evaluation.ipynb - Use trained model in Streamlit app
Troubleshooting¶
Import errors when running Python code
Ensure you installed in editable mode with -e flag:
Can't connect to Neo4j
Check your .env file has correct credentials, then test:
neo4j+s:// (secure connection). Streamlit won't start
Verify installation: streamlit --version
If missing, reinstall: pip install -e ".[web]"
ML predictions show errors
Check model files exist:
Should showtranse_citation_model.pt, entity_mapping.pkl, training_metadata.pkl. If missing, train models using the notebook pipeline or use demo mode.
Next Steps¶
Explore the Interface: - User Guide - Complete walkthrough of all features - Demo Datasets - Details on curated demo collections - Interactive Features - Clickable nodes, real-time progress
Scale Up: - Data Import - Import your research collections - Notebook Pipeline - Advanced analysis workflows - ML Predictions - Train custom models
Extend & Customize: - Developer Guide - System architecture and design decisions - API Reference - Programmatic access to all features - Resources - Helpful guides for common tasks
Welcome to Citation Compassโhappy exploring! ๐งญโจ