Skip to content

Naruto Character Network Analysis

Understanding narrative structure through the lens of network science

This project analyzes character relationships across key Naruto story arcs using subtitle-based co-appearance networks. Inspired by the viral "Network of Thrones" analysis, this work adapts proven network science methods to anime storytelling—answering questions about character balance, narrative centrality, and ensemble dynamics.


What This Project Does

This analysis examines three S-tier story arcs from Naruto:

  • Chunin Exams (~30 episodes)
  • Sasuke Retrieval Mission (~22 episodes)
  • Pain's Assault (~18 episodes)

By tracking when characters appear together in scenes (detected via subtitle timing and dialogue), we build weighted networks that reveal:

✓ Which characters are most central to each arc ✓ How character importance shifts over time ✓ Whether the story maintains ensemble balance or over-centralizes on the protagonist ✓ How communities form (do detected groups match villages/teams?)


Why Network Analysis?

Network science provides mathematical tools to quantify intuitions fans already have. When we say "Sakura becomes irrelevant after the Chunin Exams" or "Naruto takes over the story in Shippuden," network metrics like degree centrality, betweenness, and Shannon entropy let us test those claims with data.

This approach has been successfully applied to:

  • Game of Thrones novels (revealing Tyrion as the most connected character)
  • Harry Potter books (tracking friendship evolution)
  • Les Misérables (the classic example of scene co-appearance networks)

But it hasn't been systematically applied to anime—until now.


Key Questions We Answer

1. Character Balance Entropy

"At what point did side characters become irrelevant?"

Using Shannon entropy to measure how evenly "screen time" (scene appearances) is distributed across the cast, we can identify when the narrative shifts from ensemble to protagonist-dominated.

2. Community Detection

"Do arcs form natural communities matching geography and allegiances?"

Graph algorithms can detect character clusters without being told who belongs where. We test whether detected communities align with villages (Konoha, Sand, Sound) and teams (Team 7, Sound Four, Akatsuki).

3. Naruto's Centralization

"When did Naruto become too central to the story?"

Tracking Naruto's degree centrality across arcs reveals whether his importance grows organically or if the narrative over-focuses on him at the expense of other characters.


What Makes This Approach Different

Subtitle-based precision: Unlike wiki scraping or manual annotation, we extract co-appearances directly from subtitle timing and dialogue text. This provides frame-accurate scene boundaries and character mentions.

Arc-specific comparison: Rather than analyzing the entire series as one monolithic network, we treat each story arc as a discrete unit—allowing us to compare narrative structure across different phases of the story.

Open methodology: All data processing steps, from subtitle parsing to edge construction, are documented and reproducible. The pipeline is designed to work for any anime with subtitle files.

Dataset release: Full character networks (nodes, edges, metadata) are published for others to explore, validate, or extend.


Get Started

  • Quick Start


    Install dependencies and run the subtitle processing pipeline in 5 minutes.

    Get Started

  • How It Works


    See the step-by-step flow from subtitle files to character networks.

    Methodology

  • API Reference


    Explore the naruto_net Python package for subtitle parsing and network construction.

    API Docs

  • View on GitHub


    Access the full codebase, data files, and Neo4j import scripts.

    GitHub Repository


Current Status

Phase: Subtitle pipeline complete, Neo4j import in progress

Data Collected:

  • 87 unique characters across 3 arcs
  • 36 hand-coded canonical relationships (validation set)
  • 426 Shippuden subtitle files (.ass format)
  • End-to-end pipeline validated on sample episodes

Next Steps:

  • Complete Neo4j graph database import
  • Run full subtitle extraction across all 3 target arcs
  • Calculate network centrality metrics
  • Begin interactive visualization design

About

This project is built by Barbara Hidalgo-Sotelo, a data scientist and AI consultant with expertise in network analysis and visualization. It serves as both a technical portfolio piece and a contribution to the anime fan community—demonstrating how network science can illuminate storytelling structure.

Inspired by: Beveridge & Shan's "Network of Thrones" (2016) Tech Stack: Python, Neo4j, NetworkX, D3.js License: MIT (code), CC BY 4.0 (data)