# DeepWiki Local Turn your folders and repos into a browsable "wiki" with search, graphs, and Q&A. ## Status: Steps 0-3 Complete ✅ This implementation includes the foundation of the DeepWiki pipeline: - **Step 0**: Core data structures for files, documents, symbols, and chunks - **Step 1**: File discovery with ignore patterns and fingerprinting - **Step 2**: Symbol extraction using tree-sitter for Python, Rust, TypeScript - **Step 3**: Document chunking by semantic units (functions, sections) ## Quick Start ```bash # Build and run cargo build cargo run # Run tests cargo test ``` ## What It Does ``` 1. Discovers files in your project (respects .gitignore) └─► 273 files found, 21 skipped 2. Parses files to extract symbols and imports └─► Functions, classes, imports identified 3. Chunks documents into searchable pieces └─► Per-function chunks for code, per-section for docs ``` ## Example Output ``` === DeepWiki Local - Steps 0-3 === Step 1: Discovery Scanning directory: . Discovery complete: 273 files found, 21 skipped Step 2: Parsing Parsed: example/orders.py (4 symbols) - class OrderService - function create_order - function get_order - function list_orders Step 3: Chunking Created 4 chunks from example/orders.py Chunk 1: lines 5-24 (function create_order) Chunk 2: lines 26-28 (function get_order) ``` ## Features ### Discovery - ✅ Gitignore-aware file walking - ✅ Smart ignore patterns (node_modules, target, .git, etc.) - ✅ BLAKE3 fingerprinting for change detection - ✅ Size filtering (max 2MB per file) ### Parsing - ✅ Tree-sitter based symbol extraction - ✅ Python: functions, classes, imports - ✅ Rust: functions, structs, use declarations - ✅ TypeScript/JavaScript: functions, classes, ES6 imports - ✅ JSON: package.json scripts and dependencies - ✅ Secret redaction (API keys, tokens) ### Chunking - ✅ Code: one chunk per symbol (function/class) - ✅ Markdown: one chunk per heading section - ✅ Line ranges and headings preserved ## Architecture ``` src/ ├── main.rs # Pipeline orchestration ├── types.rs # Data structures (FileRecord, Document, Symbol, Chunk) ├── discover.rs # File discovery with ignore patterns ├── parser.rs # Tree-sitter parsing and symbol extraction └── chunker.rs # Document chunking strategies ``` ## Documentation - **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Quick overview of what's implemented - **[README_STEPS_0_3.md](README_STEPS_0_3.md)** - Detailed documentation with examples ## Dependencies ```toml blake3 = "1.8.2" # Fast hashing ignore = "0.4" # Gitignore support tree-sitter = "0.24" # Language parsing serde_json = "1.0" # JSON parsing anyhow = "1.0" # Error handling ``` ## Testing All tests passing (6/6): - Pattern matching for ignore rules - Secret redaction - Import parsing (Python, Rust) - Markdown and code chunking ## Next Steps (Steps 4-7) - **Step 4**: BM25 keyword indexing with Tantivy - **Step 5**: Vector embeddings with ONNX - **Step 6**: Symbol graph building - **Step 7**: Wiki page synthesis ## Design Philosophy 1. **Fast**: BLAKE3 hashing, tree-sitter parsing, incremental updates 2. **Local-first**: No cloud dependencies, runs offline 3. **Language-agnostic**: Tree-sitter supports 40+ languages 4. **Precise**: Citations to exact file:line-line ranges ## Performance - Discovery: ~50ms for 273 files - Parsing: ~20ms for 5 files - Chunking: <1ms per document ## Example Use Cases Once complete, DeepWiki will answer: - "How do I run this project?" → README.md:19-28 - "Where is create_order defined?" → api/orders.py:12-27 - "What calls this function?" → Graph analysis - "Generate a flow diagram for checkout" → Synthesized from symbols ## License [Specify your license] ## Contributing This is an early-stage implementation. Contributions welcome!