Get Started¶

This guide will get you running the Naruto character network pipeline in under 10 minutes.

Prerequisites¶

Python 3.10+ (tested on 3.10, 3.11, 3.12)
Neo4j Desktop (optional, for graph database visualization)
Git (for cloning the repository)

Installation¶

1. Clone the Repository¶

git clone https://github.com/dagny099/naruto-network-graph.git
cd naruto-network-graph

2. Create a Virtual Environment¶

macOS/LinuxWindows

python3 -m venv .venv
source .venv/bin/activate

python -m venv .venv
.venv\Scripts\activate

3. Install Dependencies¶

pip install -r requirements.txt

4. Install the `naruto_net` Package (Editable Mode)¶

This makes the subtitle processing modules available for import:

pip install -e .

Quick Test: Process a Sample Episode¶

The repository includes a validated test case (Episode 014 of Naruto Shippuden) to verify the pipeline works.

Run the End-to-End Pipeline¶

python scripts/00_ass_ingest_subset.py

What this does:

Parses .ass subtitle files
Segments dialogue into scenes (using >3 second gaps)
Detects character mentions using alias matching
Builds co-appearance edges

Expected output:

Processing Episode 014...
✓ 160 dialogue events parsed
✓ 55 scenes segmented
✓ 8 characters detected
✓ 5 co-appearance edges created

Output files:
  data/processed/edges.csv
  data/processed/scene_character.csv

Pipeline Validated

If you see the above output, the subtitle processing pipeline is working correctly!

Explore the Data¶

View Character Networks (CSV)¶

The pipeline outputs two key files:

data/processed/edges.csv — Character co-appearance edges

character_a	character_b	weight	arc	episodes
Naruto Uzumaki	Tsunade	2	Pain's Assault	[14]
Naruto Uzumaki	Jiraiya	2	Pain's Assault	[14]

data/processed/scene_character.csv — Scene-level character presence

scene_id	character	episode
014_001	Naruto Uzumaki	14
014_001	Tsunade	14

Import into Neo4j (Optional)¶

If you want to visualize the graph database:

Install Neo4j Desktop from neo4j.com/download
Create a new database (version 5.x recommended)
Import characters:

// Paste contents of outputs/cypher_import_characters.cypher
// into Neo4j Browser

Import edges:

// Paste contents of outputs/cypher_import_edges.cypher
// into Neo4j Browser

Verify import:

MATCH (n:Character)
RETURN count(n) AS total_characters;
// Expected: 87 characters

See Architecture for the complete Neo4j schema.

Run Tests¶

The project includes 4 test suites to verify correctness:

pytest tests/ -v

Test coverage:

✓ .ass file parsing against known-good fixture
✓ ASS tag stripping and text normalization
✓ Scene segmentation produces complete mapping
✓ Character alias word-boundary matching

Common First Steps¶

Process Your Own Subtitle Files¶

Place .ass files in data/naruto-subtitle-files/
Edit scripts/00_ass_ingest_subset.py to specify episode numbers
Run the script:

python scripts/00_ass_ingest_subset.py

Explore Character Metadata¶

The repository includes hand-curated character lists for all 3 arcs:

data/chunin_exams_characters.csv (50 characters)
data/sasuke_retrieval_characters.csv (50 characters)
data/pains_assault_characters.csv (50 characters)

Open these in Excel, Google Sheets, or pandas to see character metadata:

import pandas as pd

df = pd.read_csv('data/chunin_exams_characters.csv')
print(df[['name', 'affiliation_primary', 'role_type']].head(10))

Run Analytical Queries¶

If you've imported data into Neo4j, run the validation queries:

# Open outputs/analytical_queries.cypher
# Copy queries into Neo4j Browser

Examples:

Q1: Top 10 most connected characters
Q4: Community detection using Louvain algorithm
Q7: Characters appearing in all 3 arcs

Troubleshooting¶

Port Already in Use¶

If running a local server (e.g., for visualization):

# Try an alternate port
mkdocs serve -a 127.0.0.1:8051

Module Not Found: `naruto_net`¶

Make sure you installed the package in editable mode:

pip install -e .

Subtitle Files Not Found¶

The .ass subtitle files are not included in the repository (copyright). You'll need to source them separately. See FAQ for guidance.

Encoding Errors¶

If you encounter encoding issues with .ass files:

# In src/naruto_net/io/subtitles.py, line 42:
# Try 'utf-8-sig' instead of 'utf-8'

Next Steps¶

Understand the Methodology

See how subtitle files become character networks step-by-step.

How It Works
Review the Architecture

Learn about the Neo4j schema, multi-label design, and relationship structure.

Architecture
Explore the API

Dive into the naruto_net package for custom analysis.

API Reference

Questions?¶

Check the FAQ or open an issue on GitHub.