Get Started¶
This guide will get you running the Naruto character network pipeline in under 10 minutes.
Prerequisites¶
- Python 3.10+ (tested on 3.10, 3.11, 3.12)
- Neo4j Desktop (optional, for graph database visualization)
- Git (for cloning the repository)
Installation¶
1. Clone the Repository¶
2. Create a Virtual Environment¶
3. Install Dependencies¶
4. Install the naruto_net Package (Editable Mode)¶
This makes the subtitle processing modules available for import:
Quick Test: Process a Sample Episode¶
The repository includes a validated test case (Episode 014 of Naruto Shippuden) to verify the pipeline works.
Run the End-to-End Pipeline¶
What this does:
- Parses
.asssubtitle files - Segments dialogue into scenes (using >3 second gaps)
- Detects character mentions using alias matching
- Builds co-appearance edges
Expected output:
Processing Episode 014...
✓ 160 dialogue events parsed
✓ 55 scenes segmented
✓ 8 characters detected
✓ 5 co-appearance edges created
Output files:
data/processed/edges.csv
data/processed/scene_character.csv
Pipeline Validated
If you see the above output, the subtitle processing pipeline is working correctly!
Explore the Data¶
View Character Networks (CSV)¶
The pipeline outputs two key files:
data/processed/edges.csv — Character co-appearance edges
| character_a | character_b | weight | arc | episodes |
|---|---|---|---|---|
| Naruto Uzumaki | Tsunade | 2 | Pain's Assault | [14] |
| Naruto Uzumaki | Jiraiya | 2 | Pain's Assault | [14] |
data/processed/scene_character.csv — Scene-level character presence
| scene_id | character | episode |
|---|---|---|
| 014_001 | Naruto Uzumaki | 14 |
| 014_001 | Tsunade | 14 |
Import into Neo4j (Optional)¶
If you want to visualize the graph database:
- Install Neo4j Desktop from neo4j.com/download
- Create a new database (version 5.x recommended)
- Import characters:
- Import edges:
- Verify import:
See Architecture for the complete Neo4j schema.
Run Tests¶
The project includes 4 test suites to verify correctness:
Test coverage:
- ✓
.assfile parsing against known-good fixture - ✓ ASS tag stripping and text normalization
- ✓ Scene segmentation produces complete mapping
- ✓ Character alias word-boundary matching
Common First Steps¶
Process Your Own Subtitle Files¶
- Place
.assfiles indata/naruto-subtitle-files/ - Edit
scripts/00_ass_ingest_subset.pyto specify episode numbers - Run the script:
Explore Character Metadata¶
The repository includes hand-curated character lists for all 3 arcs:
data/chunin_exams_characters.csv(50 characters)data/sasuke_retrieval_characters.csv(50 characters)data/pains_assault_characters.csv(50 characters)
Open these in Excel, Google Sheets, or pandas to see character metadata:
import pandas as pd
df = pd.read_csv('data/chunin_exams_characters.csv')
print(df[['name', 'affiliation_primary', 'role_type']].head(10))
Run Analytical Queries¶
If you've imported data into Neo4j, run the validation queries:
Examples:
- Q1: Top 10 most connected characters
- Q4: Community detection using Louvain algorithm
- Q7: Characters appearing in all 3 arcs
Troubleshooting¶
Port Already in Use¶
If running a local server (e.g., for visualization):
Module Not Found: naruto_net¶
Make sure you installed the package in editable mode:
Subtitle Files Not Found¶
The .ass subtitle files are not included in the repository (copyright). You'll need to source them separately. See FAQ for guidance.
Encoding Errors¶
If you encounter encoding issues with .ass files:
Next Steps¶
-
Understand the Methodology
See how subtitle files become character networks step-by-step.
-
Review the Architecture
Learn about the Neo4j schema, multi-label design, and relationship structure.
-
Explore the API
Dive into the
naruto_netpackage for custom analysis.