Troubleshooting¶
Common issues and solutions for working with the Naruto character network project.
Installation Issues¶
ModuleNotFoundError: No module named 'naruto_net'¶
Cause: The package wasn't installed in editable mode.
Solution:
Verify installation:
pip install -r requirements.txt fails with dependency conflicts¶
Cause: Conflicting package versions or Python version mismatch.
Solution 1: Use a fresh virtual environment
rm -rf .venv
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Solution 2: Install specific versions manually
Tests fail with FileNotFoundError for fixtures¶
Cause: Running tests from wrong directory.
Solution: Always run tests from the project root
Not:
Subtitle Processing Issues¶
UnicodeDecodeError when parsing .ass files¶
Cause: Subtitle file uses non-UTF-8 encoding (common with older fansubs).
Solution 1: Specify encoding explicitly
Edit src/naruto_net/io/subtitles.py, line ~42:
Solution 2: Convert file to UTF-8
Solution 3: Auto-detect encoding
import chardet
with open('subtitle.ass', 'rb') as f:
result = chardet.detect(f.read())
encoding = result['encoding']
# Then use detected encoding
with open('subtitle.ass', 'r', encoding=encoding) as f:
# ...
No characters detected despite valid subtitles¶
Cause 1: Character not in alias JSON
Solution: Add missing character to data/character_aliases.json
Cause 2: Word boundary issues
Solution: Check if alias contains special characters
# Example: "Nine-Tails" needs hyphen in regex
# Already handled in detect/mentions.py via re.escape()
Debug: Print all detected mentions
from naruto_net.detect.mentions import detect_characters
mentions = detect_characters(text, alias_dict)
print(f"Detected: {mentions}")
Scene segmentation produces too many/few scenes¶
Cause: Gap threshold doesn't match subtitle pacing.
Solution: Adjust threshold in src/naruto_net/segment/scenes.py
# Default: 3.0 seconds
SCENE_GAP_THRESHOLD = 3.0
# For slower-paced shows
SCENE_GAP_THRESHOLD = 5.0 # Stricter (fewer scenes)
# For fast-paced action
SCENE_GAP_THRESHOLD = 2.0 # Looser (more scenes)
Debug: Export gap distribution
gaps = [scene.gap_after for scene in scenes if scene.gap_after]
print(f"Mean gap: {np.mean(gaps):.2f}s")
print(f"Median gap: {np.median(gaps):.2f}s")
Neo4j Issues¶
Neo4j import script fails with "Invalid syntax"¶
Cause: Pasting multi-line Cypher with extra whitespace.
Solution: Use LOAD CSV or batching instead of raw paste
Option 1: Load from CSV (recommended)
LOAD CSV WITH HEADERS FROM 'file:///edges.csv' AS row
CREATE (a:Character {name: row.character_a})-[:CONNECTED {
weight: toInteger(row.weight),
arc: row.arc
}]->(b:Character {name: row.character_b});
Option 2: Use UNWIND for batching
UNWIND [
{a: 'Naruto', b: 'Sasuke', w: 5},
{a: 'Naruto', b: 'Sakura', w: 3}
] AS edge
MATCH (a:Character {name: edge.a})
MATCH (b:Character {name: edge.b})
CREATE (a)-[:CONNECTED {weight: edge.w}]->(b);
"Cannot merge node using null property value"¶
Cause: Missing required property in MERGE statement.
Solution: Ensure character_id and name are always set
// ❌ Fails if character_id is NULL
MERGE (c:Character {character_id: NULL})
// ✓ Correct
MERGE (c:Character {character_id: 1, name: "Naruto Uzumaki"})
Neo4j query returns no results despite data being imported¶
Cause 1: Label mismatch (case-sensitive)
// ❌ Wrong label
MATCH (c:chuninexams) RETURN c // Returns 0
// ✓ Correct
MATCH (c:ChūninExams) RETURN c // Returns results
Cause 2: Relationship direction
// ❌ May miss bidirectional edges
MATCH (a)-[:CONNECTED]->(b) RETURN a, b
// ✓ Correct (undirected)
MATCH (a)-[:CONNECTED]-(b) RETURN a, b
Debug: Check what exists
// Count all nodes
MATCH (n) RETURN labels(n), count(n);
// Sample relationships
MATCH (a)-[r]->(b) RETURN type(r), count(r) LIMIT 10;
"Heap space" error when running large queries¶
Cause: Neo4j default memory limits too low.
Solution: Increase heap size in neo4j.conf
# In Neo4j Desktop: Settings → Database Settings
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=4G
Alternative: Use LIMIT and batching
Pipeline Execution Issues¶
Script runs but produces no output files¶
Cause: Output directory doesn't exist.
Solution: Create directories first
Or add to script:
"Permission denied" when writing output files¶
Cause: File is open in another program (Excel, Neo4j) or wrong permissions.
Solution 1: Close all programs using the file
Solution 2: Write to different filename
Solution 3: Check file permissions
Pipeline runs extremely slowly¶
Cause: Processing too many episodes or inefficient I/O.
Solution 1: Process episodes in batches
# Instead of all at once
episodes = list(range(1, 500)) # ❌ Slow
# Batch processing
for batch_start in range(1, 500, 50):
batch = list(range(batch_start, batch_start + 50))
process_episodes(batch)
Solution 2: Use Parquet instead of CSV for intermediate files
# Faster writes
df.to_parquet('data/intermediate/events.parquet')
# Faster reads
df = pd.read_parquet('data/intermediate/events.parquet')
Solution 3: Enable progress bars
Testing Issues¶
Tests pass locally but fail in CI¶
Cause: Path separators differ (Windows vs Linux).
Solution: Use pathlib.Path
from pathlib import Path
# ❌ Platform-specific
file_path = 'data/intermediate\\events.csv'
# ✓ Cross-platform
file_path = Path('data') / 'intermediate' / 'events.csv'
AssertionError in CSV comparison tests¶
Cause: Floating point precision or column order mismatch.
Solution: Use pandas.testing.assert_frame_equal with tolerances
import pandas as pd
from pandas.testing import assert_frame_equal
# ❌ Too strict
assert df.equals(expected_df)
# ✓ Tolerant comparison
assert_frame_equal(
df,
expected_df,
check_dtype=False, # Ignore int64 vs int32
check_exact=False, # Allow float tolerance
atol=1e-6, # Absolute tolerance
check_column_type=False # Ignore order
)
Documentation (MkDocs) Issues¶
mkdocs serve fails with "Port 8000 already in use"¶
Cause: Another process is using the default port.
Solution: Use alternate port
Find what's using port 8000:
Mermaid diagrams don't render¶
Cause: pymdownx.superfences not configured for Mermaid.
Solution: Check mkdocs.yml has:
markdown_extensions:
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
mkdocstrings fails to find Python modules¶
Cause: src/ not in Python path.
Solution 1: Install package in editable mode
Solution 2: Set PYTHONPATH
Solution 3: Configure mkdocstrings paths in mkdocs.yml
Material theme not loading custom colors¶
Cause: CSS file not created or path mismatch.
Solution: Verify file exists
And mkdocs.yml has:
Performance Optimization¶
Speed up subtitle parsing¶
# Use generator pattern for large files
def parse_subtitles_streaming(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
for line in f:
if line.startswith('Dialogue:'):
yield parse_line(line)
# Instead of loading all into memory
events = list(parse_subtitles_streaming('large_file.ass'))
Reduce Neo4j import time¶
// Use CREATE instead of MERGE when IDs are unique
CREATE (c:Character {character_id: 1, name: "Naruto"})
// Batch imports with UNWIND
UNWIND $batch AS row
CREATE (c:Character {character_id: row.id, name: row.name})
Getting Help¶
If your issue isn't covered here:
- Check existing issues: GitHub Issues
- Search documentation: Use site search (top right)
- Enable debug logging:
- Ask on GitHub Discussions: Discussions
- Email: barbs@balex.com
Reporting Bugs¶
When opening a GitHub issue, include:
- Python version:
python --version - Operating system: macOS, Windows, Linux
- Error message: Full traceback
- Steps to reproduce:
# Example
git clone https://github.com/dagny099/naruto-network-graph.git
cd naruto-network-graph
pip install -r requirements.txt
python scripts/00_ass_ingest_subset.py
# Error occurs here
- Expected vs actual behavior