Skip to content

Get Started

This guide will get you running the Naruto character network pipeline in under 10 minutes.


Prerequisites

  • Python 3.10+ (tested on 3.10, 3.11, 3.12)
  • Neo4j Desktop (optional, for graph database visualization)
  • Git (for cloning the repository)

Installation

1. Clone the Repository

git clone https://github.com/dagny099/naruto-network-graph.git
cd naruto-network-graph

2. Create a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate
python -m venv .venv
.venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Install the naruto_net Package (Editable Mode)

This makes the subtitle processing modules available for import:

pip install -e .

Quick Test: Process a Sample Episode

The repository includes a validated test case (Episode 014 of Naruto Shippuden) to verify the pipeline works.

Run the End-to-End Pipeline

python scripts/00_ass_ingest_subset.py

What this does:

  1. Parses .ass subtitle files
  2. Segments dialogue into scenes (using >3 second gaps)
  3. Detects character mentions using alias matching
  4. Builds co-appearance edges

Expected output:

Processing Episode 014...
✓ 160 dialogue events parsed
✓ 55 scenes segmented
✓ 8 characters detected
✓ 5 co-appearance edges created

Output files:
  data/processed/edges.csv
  data/processed/scene_character.csv

Pipeline Validated

If you see the above output, the subtitle processing pipeline is working correctly!


Explore the Data

View Character Networks (CSV)

The pipeline outputs two key files:

data/processed/edges.csv — Character co-appearance edges

character_a character_b weight arc episodes
Naruto Uzumaki Tsunade 2 Pain's Assault [14]
Naruto Uzumaki Jiraiya 2 Pain's Assault [14]

data/processed/scene_character.csv — Scene-level character presence

scene_id character episode
014_001 Naruto Uzumaki 14
014_001 Tsunade 14

Import into Neo4j (Optional)

If you want to visualize the graph database:

  1. Install Neo4j Desktop from neo4j.com/download
  2. Create a new database (version 5.x recommended)
  3. Import characters:
// Paste contents of outputs/cypher_import_characters.cypher
// into Neo4j Browser
  1. Import edges:
// Paste contents of outputs/cypher_import_edges.cypher
// into Neo4j Browser
  1. Verify import:
MATCH (n:Character)
RETURN count(n) AS total_characters;
// Expected: 87 characters

See Architecture for the complete Neo4j schema.


Run Tests

The project includes 4 test suites to verify correctness:

pytest tests/ -v

Test coverage:

  • .ass file parsing against known-good fixture
  • ✓ ASS tag stripping and text normalization
  • ✓ Scene segmentation produces complete mapping
  • ✓ Character alias word-boundary matching

Common First Steps

Process Your Own Subtitle Files

  1. Place .ass files in data/naruto-subtitle-files/
  2. Edit scripts/00_ass_ingest_subset.py to specify episode numbers
  3. Run the script:
python scripts/00_ass_ingest_subset.py

Explore Character Metadata

The repository includes hand-curated character lists for all 3 arcs:

  • data/chunin_exams_characters.csv (50 characters)
  • data/sasuke_retrieval_characters.csv (50 characters)
  • data/pains_assault_characters.csv (50 characters)

Open these in Excel, Google Sheets, or pandas to see character metadata:

import pandas as pd

df = pd.read_csv('data/chunin_exams_characters.csv')
print(df[['name', 'affiliation_primary', 'role_type']].head(10))

Run Analytical Queries

If you've imported data into Neo4j, run the validation queries:

# Open outputs/analytical_queries.cypher
# Copy queries into Neo4j Browser

Examples:

  • Q1: Top 10 most connected characters
  • Q4: Community detection using Louvain algorithm
  • Q7: Characters appearing in all 3 arcs

Troubleshooting

Port Already in Use

If running a local server (e.g., for visualization):

# Try an alternate port
mkdocs serve -a 127.0.0.1:8051

Module Not Found: naruto_net

Make sure you installed the package in editable mode:

pip install -e .

Subtitle Files Not Found

The .ass subtitle files are not included in the repository (copyright). You'll need to source them separately. See FAQ for guidance.

Encoding Errors

If you encounter encoding issues with .ass files:

# In src/naruto_net/io/subtitles.py, line 42:
# Try 'utf-8-sig' instead of 'utf-8'

Next Steps

  • Understand the Methodology


    See how subtitle files become character networks step-by-step.

    How It Works

  • Review the Architecture


    Learn about the Neo4j schema, multi-label design, and relationship structure.

    Architecture

  • Explore the API


    Dive into the naruto_net package for custom analysis.

    API Reference


Questions?

Check the FAQ or open an issue on GitHub.