Skip to content

๐Ÿ“š Citation Compass - Streamlit App

A comprehensive web application for academic citation analysis powered by machine learning.

๐Ÿš€ Features

๐Ÿค– ML Predictions

  • Citation Prediction: Use our trained TransE model to predict which papers are most likely to cite a given paper
  • Confidence Scoring: Get probability-like confidence scores for each prediction
  • Interactive Results: Explore predicted papers with detailed information and export results
  • Paper Search: Find papers by title, author, or direct paper ID

๐Ÿงญ Embedding Explorer

  • Vector Space Exploration: Dive deep into learned paper embeddings
  • Similarity Analysis: Compare papers and find semantically similar research
  • Dimensionality Reduction: Visualize embeddings in 2D/3D space using PCA and t-SNE
  • Embedding Statistics: Analyze embedding properties and distributions

๐Ÿ“Š Enhanced Visualizations

  • Network Visualization: Interactive citation network graphs with prediction overlays
  • Advanced Charts: Multi-dimensional analysis with customizable visualizations
  • Export Capabilities: High-quality outputs in multiple formats (PNG, SVG, PDF)
  • Real-time Updates: Dynamic visualization updates based on ML predictions

๐Ÿ“” Interactive Analytics Pipeline

  • Interactive Analysis: Jupyter-style notebook execution within Streamlit
  • Advanced Analytics: Network analysis, community detection, temporal trends
  • Batch Processing: Large-scale citation analysis and reporting
  • Custom Workflows: User-defined analytical pipelines with export capabilities

๐Ÿ“ˆ Advanced Analytics (New)

  • Network Analysis: Centrality measures, community detection, path analysis
  • Temporal Analysis: Citation trends, growth patterns, impact over time
  • Author Analytics: Collaboration networks, influence metrics, career trajectories
  • Performance Metrics: System health, prediction accuracy, cache efficiency

๐Ÿ› ๏ธ Installation & Setup

Prerequisites

  • Python 3.8+
  • PyTorch (for ML models)
  • Streamlit
  • Required Python packages (see requirements)

Quick Start

  1. Install Dependencies:

    pip install streamlit torch plotly scikit-learn pandas numpy
    

  2. Run the Application:

    streamlit run app.py
    

  3. Open Browser: Navigate to http://localhost:8501

Configuration

The app automatically detects and loads: - TransE Model: Locally trained model from models/ directory - Entity Mapping: Paper ID to model entity mappings - API Configuration: Semantic Scholar API settings

๐ŸŽฏ How to Use

ML Predictions Page

  1. Input Paper:
  2. Enter a paper ID directly
  3. Search by title or keywords
  4. Browse search results and select

  5. Configure Predictions:

  6. Set number of predictions (1-50)
  7. Adjust confidence threshold
  8. Check model health status

  9. View Results:

  10. Interactive results table with confidence scores
  11. Confidence distribution charts
  12. Detailed paper information
  13. Export results as CSV

Embedding Explorer Page

  1. Individual Embeddings:
  2. Enter paper ID to get embedding vector
  3. View embedding statistics and distributions
  4. Visualize embedding dimensions

  5. Compare Papers:

  6. Enter multiple paper IDs (one per line)
  7. View cosine similarity matrix
  8. Analyze pairwise relationships

  9. Visualization:

  10. Plot 3+ papers in reduced dimensional space
  11. Choose PCA or t-SNE reduction
  12. Explore in 2D or 3D

Enhanced Visualizations Page

  1. Network Graphs:
  2. Interactive citation network visualization
  3. Overlay ML predictions on network structure
  4. Customize node sizes, colors, and layout algorithms
  5. Export high-quality visualizations

  6. Advanced Charts:

  7. Multi-dimensional scatter plots with prediction confidence
  8. Time-series analysis of citation patterns
  9. Distribution analyses and statistical summaries

Interactive Analytics Pipeline

  1. Interactive Analysis:
  2. Execute pre-built analytical notebooks
  3. Customize parameters and data ranges
  4. Real-time results with progress indicators

  5. Custom Workflows:

  6. Create custom analytical pipelines
  7. Combine multiple analysis types
  8. Export comprehensive reports

  9. Advanced Analytics:

  10. Network centrality analysis
  11. Community detection in citation networks
  12. Temporal trend analysis
  13. Performance benchmarking

๐Ÿง  About the ML Model

TransE Architecture

  • Model Type: Translating Embeddings for Knowledge Graphs
  • Embedding Dimension: 128
  • Training Data: Academic citation networks
  • Entities: 10,000+ computer science papers
  • Prediction Logic: source + relation โ‰ˆ target

Performance Metrics

  • Training Loss: ~0.15
  • Prediction Speed: <100ms per query
  • Cache Hit Rate: 90%+ for repeated queries
  • Confidence Calibration: Probability-like scores from distance metrics

๐Ÿ—๏ธ Architecture

Service Layer

โ”œโ”€โ”€ ML Service (src/services/ml_service.py)
โ”‚   โ”œโ”€โ”€ TransE Model Loading
โ”‚   โ”œโ”€โ”€ Prediction Generation
โ”‚   โ”œโ”€โ”€ Embedding Extraction
โ”‚   โ””โ”€โ”€ Intelligent Caching
โ”‚
โ”œโ”€โ”€ API Client (src/data/unified_api_client.py)
โ”‚   โ”œโ”€โ”€ Semantic Scholar Integration
โ”‚   โ”œโ”€โ”€ Rate Limiting
โ”‚   โ”œโ”€โ”€ Response Caching
โ”‚   โ””โ”€โ”€ Error Handling
โ”‚
โ””โ”€โ”€ Data Models (src/models/)
    โ”œโ”€โ”€ ML Models (PaperEmbedding, CitationPrediction)
    โ”œโ”€โ”€ Network Models (NetworkNode, NetworkEdge)
    โ””โ”€โ”€ API Models (APIResponse, SearchRequest)

Streamlit Pages

โ”œโ”€โ”€ app.py (Main Application)
โ”œโ”€โ”€ src/streamlit_app/pages/
โ”‚   โ”œโ”€โ”€ ML_Predictions.py         # Citation prediction interface
โ”‚   โ”œโ”€โ”€ Embedding_Explorer.py     # Vector space exploration
โ”‚   โ”œโ”€โ”€ Enhanced_Visualizations.py # Network graphs & charts
โ”‚   โ””โ”€โ”€ Notebook_Pipeline.py       # Interactive analytics pipeline
โ””โ”€โ”€ .streamlit/
    โ””โ”€โ”€ config.toml

Advanced Analytics Architecture

โ”œโ”€โ”€ src/analytics/ (New)
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ network_analysis.py       # Graph metrics & community detection
โ”‚   โ”œโ”€โ”€ temporal_analysis.py      # Time-series citation analysis
โ”‚   โ”œโ”€โ”€ performance_metrics.py    # System performance analysis
โ”‚   โ””โ”€โ”€ export_engine.py          # Multi-format export capabilities
โ”‚
โ”œโ”€โ”€ src/services/
โ”‚   โ”œโ”€โ”€ ml_service.py             # Existing ML service
โ”‚   โ””โ”€โ”€ analytics_service.py      # New analytics orchestration
โ”‚
โ””โ”€โ”€ notebooks/ (New)
    โ”œโ”€โ”€ 01_network_exploration.ipynb
    โ”œโ”€โ”€ 02_citation_analysis.ipynb
    โ””โ”€โ”€ 03_performance_benchmarks.ipynb

๐Ÿ”ง Configuration

Environment Variables

  • SEMANTIC_SCHOLAR_API_KEY: Optional API key for higher rate limits
  • NEO4J_URI: Neo4j database connection (if using database features)
  • NEO4J_USER: Database username
  • NEO4J_PASSWORD: Database password

Streamlit Configuration

  • Port: 8501 (default)
  • Theme: Custom academic theme
  • Caching: Enabled for ML models and API responses
  • Error Handling: Detailed error messages in development

๐Ÿ“Š Performance Optimizations

Caching Strategy

  • Model Loading: Models cached on first load
  • Predictions: LRU cache with TTL expiration
  • API Responses: Response caching with rate limiting
  • Embeddings: In-memory caching of frequently accessed embeddings

Scalability Features

  • Lazy Loading: Components loaded on-demand
  • Batch Processing: Efficient handling of multiple predictions
  • Memory Management: Automatic cache eviction
  • Error Recovery: Graceful handling of service failures

๐Ÿ› Troubleshooting

Common Issues

  1. Model Not Found:
  2. Ensure models/ directory contains the locally trained model files
  3. Check file permissions and paths

  4. Paper Not in Model:

  5. Model trained on specific dataset (computer science papers)
  6. Try papers from major CS venues (ICML, NeurIPS, etc.)

  7. Slow Performance:

  8. First prediction takes longer (model loading)
  9. Subsequent predictions are cached
  10. Consider GPU for large-scale usage

  11. API Rate Limits:

  12. Built-in rate limiting prevents 429 errors
  13. Consider API key for higher limits

Debug Mode

# Run with debug logging
STREAMLIT_LOGGER_LEVEL=debug streamlit run app.py

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch
  3. Add tests for new functionality
  4. Submit pull request

๐Ÿ“„ License

This project is part of Citation Compass and follows the same licensing terms.


Built with โค๏ธ using Streamlit, PyTorch, and Machine Learning