151 lines
3.9 KiB
Markdown
151 lines
3.9 KiB
Markdown
# DeepWiki Local
|
|
|
|
Turn your folders and repos into a browsable "wiki" with search, graphs, and Q&A.
|
|
|
|
## Status: Steps 0-3 Complete ✅
|
|
|
|
This implementation includes the foundation of the DeepWiki pipeline:
|
|
|
|
- **Step 0**: Core data structures for files, documents, symbols, and chunks
|
|
- **Step 1**: File discovery with ignore patterns and fingerprinting
|
|
- **Step 2**: Symbol extraction using tree-sitter for Python, Rust, TypeScript
|
|
- **Step 3**: Document chunking by semantic units (functions, sections)
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Build and run
|
|
cargo build
|
|
cargo run
|
|
|
|
# Run tests
|
|
cargo test
|
|
```
|
|
|
|
## What It Does
|
|
|
|
```
|
|
1. Discovers files in your project (respects .gitignore)
|
|
└─► 273 files found, 21 skipped
|
|
|
|
2. Parses files to extract symbols and imports
|
|
└─► Functions, classes, imports identified
|
|
|
|
3. Chunks documents into searchable pieces
|
|
└─► Per-function chunks for code, per-section for docs
|
|
```
|
|
|
|
## Example Output
|
|
|
|
```
|
|
=== DeepWiki Local - Steps 0-3 ===
|
|
|
|
Step 1: Discovery
|
|
Scanning directory: .
|
|
Discovery complete: 273 files found, 21 skipped
|
|
|
|
Step 2: Parsing
|
|
Parsed: example/orders.py (4 symbols)
|
|
- class OrderService
|
|
- function create_order
|
|
- function get_order
|
|
- function list_orders
|
|
|
|
Step 3: Chunking
|
|
Created 4 chunks from example/orders.py
|
|
Chunk 1: lines 5-24 (function create_order)
|
|
Chunk 2: lines 26-28 (function get_order)
|
|
```
|
|
|
|
## Features
|
|
|
|
### Discovery
|
|
- ✅ Gitignore-aware file walking
|
|
- ✅ Smart ignore patterns (node_modules, target, .git, etc.)
|
|
- ✅ BLAKE3 fingerprinting for change detection
|
|
- ✅ Size filtering (max 2MB per file)
|
|
|
|
### Parsing
|
|
- ✅ Tree-sitter based symbol extraction
|
|
- ✅ Python: functions, classes, imports
|
|
- ✅ Rust: functions, structs, use declarations
|
|
- ✅ TypeScript/JavaScript: functions, classes, ES6 imports
|
|
- ✅ JSON: package.json scripts and dependencies
|
|
- ✅ Secret redaction (API keys, tokens)
|
|
|
|
### Chunking
|
|
- ✅ Code: one chunk per symbol (function/class)
|
|
- ✅ Markdown: one chunk per heading section
|
|
- ✅ Line ranges and headings preserved
|
|
|
|
## Architecture
|
|
|
|
```
|
|
src/
|
|
├── main.rs # Pipeline orchestration
|
|
├── types.rs # Data structures (FileRecord, Document, Symbol, Chunk)
|
|
├── discover.rs # File discovery with ignore patterns
|
|
├── parser.rs # Tree-sitter parsing and symbol extraction
|
|
└── chunker.rs # Document chunking strategies
|
|
```
|
|
|
|
## Documentation
|
|
|
|
- **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Quick overview of what's implemented
|
|
- **[README_STEPS_0_3.md](README_STEPS_0_3.md)** - Detailed documentation with examples
|
|
|
|
## Dependencies
|
|
|
|
```toml
|
|
blake3 = "1.8.2" # Fast hashing
|
|
ignore = "0.4" # Gitignore support
|
|
tree-sitter = "0.24" # Language parsing
|
|
serde_json = "1.0" # JSON parsing
|
|
anyhow = "1.0" # Error handling
|
|
```
|
|
|
|
## Testing
|
|
|
|
All tests passing (6/6):
|
|
- Pattern matching for ignore rules
|
|
- Secret redaction
|
|
- Import parsing (Python, Rust)
|
|
- Markdown and code chunking
|
|
|
|
## Next Steps (Steps 4-7)
|
|
|
|
- **Step 4**: BM25 keyword indexing with Tantivy
|
|
- **Step 5**: Vector embeddings with ONNX
|
|
- **Step 6**: Symbol graph building
|
|
- **Step 7**: Wiki page synthesis
|
|
|
|
## Design Philosophy
|
|
|
|
1. **Fast**: BLAKE3 hashing, tree-sitter parsing, incremental updates
|
|
2. **Local-first**: No cloud dependencies, runs offline
|
|
3. **Language-agnostic**: Tree-sitter supports 40+ languages
|
|
4. **Precise**: Citations to exact file:line-line ranges
|
|
|
|
## Performance
|
|
|
|
- Discovery: ~50ms for 273 files
|
|
- Parsing: ~20ms for 5 files
|
|
- Chunking: <1ms per document
|
|
|
|
## Example Use Cases
|
|
|
|
Once complete, DeepWiki will answer:
|
|
|
|
- "How do I run this project?" → README.md:19-28
|
|
- "Where is create_order defined?" → api/orders.py:12-27
|
|
- "What calls this function?" → Graph analysis
|
|
- "Generate a flow diagram for checkout" → Synthesized from symbols
|
|
|
|
## License
|
|
|
|
[Specify your license]
|
|
|
|
## Contributing
|
|
|
|
This is an early-stage implementation. Contributions welcome!
|