Naruto Character Network Analysis¶
Understanding narrative structure through the lens of network science
This project analyzes character relationships across key Naruto story arcs using subtitle-based co-appearance networks. Inspired by the viral "Network of Thrones" analysis, this work adapts proven network science methods to anime storytelling—answering questions about character balance, narrative centrality, and ensemble dynamics.
What This Project Does¶
This analysis examines three S-tier story arcs from Naruto:
- Chunin Exams (~30 episodes)
- Sasuke Retrieval Mission (~22 episodes)
- Pain's Assault (~18 episodes)
By tracking when characters appear together in scenes (detected via subtitle timing and dialogue), we build weighted networks that reveal:
✓ Which characters are most central to each arc ✓ How character importance shifts over time ✓ Whether the story maintains ensemble balance or over-centralizes on the protagonist ✓ How communities form (do detected groups match villages/teams?)
Why Network Analysis?¶
Network science provides mathematical tools to quantify intuitions fans already have. When we say "Sakura becomes irrelevant after the Chunin Exams" or "Naruto takes over the story in Shippuden," network metrics like degree centrality, betweenness, and Shannon entropy let us test those claims with data.
This approach has been successfully applied to:
- Game of Thrones novels (revealing Tyrion as the most connected character)
- Harry Potter books (tracking friendship evolution)
- Les Misérables (the classic example of scene co-appearance networks)
But it hasn't been systematically applied to anime—until now.
Key Questions We Answer¶
1. Character Balance Entropy¶
"At what point did side characters become irrelevant?"
Using Shannon entropy to measure how evenly "screen time" (scene appearances) is distributed across the cast, we can identify when the narrative shifts from ensemble to protagonist-dominated.
2. Community Detection¶
"Do arcs form natural communities matching geography and allegiances?"
Graph algorithms can detect character clusters without being told who belongs where. We test whether detected communities align with villages (Konoha, Sand, Sound) and teams (Team 7, Sound Four, Akatsuki).
3. Naruto's Centralization¶
"When did Naruto become too central to the story?"
Tracking Naruto's degree centrality across arcs reveals whether his importance grows organically or if the narrative over-focuses on him at the expense of other characters.
What Makes This Approach Different¶
Subtitle-based precision: Unlike wiki scraping or manual annotation, we extract co-appearances directly from subtitle timing and dialogue text. This provides frame-accurate scene boundaries and character mentions.
Arc-specific comparison: Rather than analyzing the entire series as one monolithic network, we treat each story arc as a discrete unit—allowing us to compare narrative structure across different phases of the story.
Open methodology: All data processing steps, from subtitle parsing to edge construction, are documented and reproducible. The pipeline is designed to work for any anime with subtitle files.
Dataset release: Full character networks (nodes, edges, metadata) are published for others to explore, validate, or extend.
Get Started¶
-
Quick Start
Install dependencies and run the subtitle processing pipeline in 5 minutes.
-
How It Works
See the step-by-step flow from subtitle files to character networks.
-
API Reference
Explore the
naruto_netPython package for subtitle parsing and network construction. -
View on GitHub
Access the full codebase, data files, and Neo4j import scripts.
Current Status¶
Phase: Subtitle pipeline complete, Neo4j import in progress
Data Collected:
- 87 unique characters across 3 arcs
- 36 hand-coded canonical relationships (validation set)
- 426 Shippuden subtitle files (.ass format)
- End-to-end pipeline validated on sample episodes
Next Steps:
- Complete Neo4j graph database import
- Run full subtitle extraction across all 3 target arcs
- Calculate network centrality metrics
- Begin interactive visualization design
About¶
This project is built by Barbara Hidalgo-Sotelo, a data scientist and AI consultant with expertise in network analysis and visualization. It serves as both a technical portfolio piece and a contribution to the anime fan community—demonstrating how network science can illuminate storytelling structure.
Inspired by: Beveridge & Shan's "Network of Thrones" (2016) Tech Stack: Python, Neo4j, NetworkX, D3.js License: MIT (code), CC BY 4.0 (data)