Automatically convert documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes.
📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!
Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready Claude AI skills. Instead of manually reading and summarizing documentation, Skill Seeker:
.zip file for ClaudeResult: Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.
--async flag)Skill Seekers is now published on the Python Package Index! Install with a single command:
pip install skill-seekers
Get started in seconds. No cloning, no setup - just install and run. See installation options below.
# Install from PyPI (easiest method!)
pip install skill-seekers
# Use the unified CLI
skill-seekers scrape --config configs/react.json
skill-seekers github --repo facebook/react
skill-seekers enhance output/react/
skill-seekers package output/react/
Time: ~25 minutes | Quality: Production-ready | Cost: Free
📖 New to Skill Seekers? Check out our Quick Start Guide or Bulletproof Guide
# Install with uv (fast, modern alternative)
uv tool install skill-seekers
# Or run directly without installing
uv tool run --from skill-seekers skill-seekers scrape --config https://raw.githubusercontent.com/yusufkaraaslan/Skill_Seekers/main/configs/react.json
# Unified CLI - simple commands
skill-seekers scrape --config configs/react.json
skill-seekers github --repo facebook/react
skill-seekers package output/react/
Time: ~25 minutes | Quality: Production-ready | Cost: Free
# Clone and install in editable mode
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -e .
# Use the unified CLI
skill-seekers scrape --config configs/react.json
# One-time setup (5 minutes)
./setup_mcp.sh
# Then in Claude Code, just ask:
"Generate a React skill from https://react.dev/"
"Scrape PDF at docs/manual.pdf and create skill"
Time: Automated | Quality: Production-ready | Cost: Free
# Install dependencies
pip3 install requests beautifulsoup4
# Run scripts directly (old method)
python3 src/skill_seekers/cli/doc_scraper.py --config configs/react.json
# Upload output/react.zip to Claude - Done!
Time: ~25 minutes | Quality: Production-ready | Cost: Free
# Scrape documentation website
skill-seekers scrape --config configs/react.json
# Quick scrape without config
skill-seekers scrape --url https://react.dev --name react
# With async mode (3x faster)
skill-seekers scrape --config configs/godot.json --async --workers 8
# Basic PDF extraction
skill-seekers pdf --pdf docs/manual.pdf --name myskill
# Advanced features
skill-seekers pdf --pdf docs/manual.pdf --name myskill \
--extract-tables \ # Extract tables
--parallel \ # Fast parallel processing
--workers 8 # Use 8 CPU cores
# Scanned PDFs (requires: pip install pytesseract Pillow)
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
# Password-protected PDFs
skill-seekers pdf --pdf docs/encrypted.pdf --name myskill --password mypassword
Time: ~5-15 minutes (or 2-5 minutes with parallel) | Quality: Production-ready | Cost: Free
# Basic repository scraping
skill-seekers github --repo facebook/react
# Using a config file
skill-seekers github --config configs/react_github.json
# With authentication (higher rate limits)
export GITHUB_TOKEN=ghp_your_token_here
skill-seekers github --repo facebook/react
# Customize what to include
skill-seekers github --repo django/django \
--include-issues \ # Extract GitHub Issues
--max-issues 100 \ # Limit issue count
--include-changelog \ # Extract CHANGELOG.md
--include-releases # Extract GitHub Releases
Time: ~5-10 minutes | Quality: Production-ready | Cost: Free
The Problem: Documentation and code often drift apart. Docs might be outdated, missing features that exist in code, or documenting features that were removed.
The Solution: Combine documentation + GitHub + PDF into one unified skill that shows BOTH what's documented AND what actually exists, with clear warnings about discrepancies.
# Use existing unified configs
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json
# Or create unified config (mix documentation + GitHub)
cat > configs/myframework_unified.json << 'EOF'
{
"name": "myframework",
"description": "Complete framework knowledge from docs + code",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://docs.myframework.com/",
"extract_api": true,
"max_pages": 200
},
{
"type": "github",
"repo": "owner/myframework",
"include_code": true,
"code_analysis_depth": "surface"
}
]
}
EOF
# Run unified scraper
skill-seekers unified --config configs/myframework_unified.json
# Package and upload
skill-seekers package output/myframework/
# Upload output/myframework.zip to Claude - Done!
Time: ~30-45 minutes | Quality: Production-ready with conflict detection | Cost: Free
What Makes It Special:
✅ Conflict Detection - Automatically finds 4 types of discrepancies:
✅ Transparent Reporting - Shows both versions side-by-side:
#### `move_local_x(delta: float)`
⚠️ **Conflict**: Documentation signature differs from implementation
**Documentation says:**
def move_local_x(delta: float)
**Code implementation:** ```python def move_local_x(delta: float, snap: bool = False) -> None
✅ **Advantages:** - **Identifies documentation gaps** - Find outdated or missing docs automatically - **Catches code changes** - Know when APIs change without docs being updated - **Single source of truth** - One skill showing intent (docs) AND reality (code) - **Actionable insights** - Get suggestions for fixing each conflict - **Development aid** - See what's actually in the codebase vs what's documented **Example Unified Configs:** - `configs/react_unified.json` - React docs + GitHub repo - `configs/django_unified.json` - Django docs + GitHub repo - `configs/fastapi_unified.json` - FastAPI docs + GitHub repo **Full Guide:** See [docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md) for complete documentation. ## How It Works ```mermaid graph LR A[Documentation Website] --> B[Skill Seeker] B --> C[Scraper] B --> D[AI Enhancement] B --> E[Packager] C --> F[Organized References] D --> F F --> E E --> G[Claude Skill .zip] G --> H[Upload to Claude AI]
.zip fileBefore you start, make sure you have:
python3 --versiongit --versionFirst time user? → Start Here: Bulletproof Quick Start Guide 🎯
This guide walks you through EVERYTHING step-by-step (Python install, git clone, first skill creation).
Use Skill Seeker directly from Claude Code with natural language!
# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# One-time setup (5 minutes)
./setup_mcp.sh
# Restart Claude Code, then just ask:
In Claude Code:
List all available configs Generate config for Tailwind at https://tailwindcss.com/docs Scrape docs using configs/react.json Package skill at output/react/
Benefits:
Full guides:
# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS/Linux
# OR on Windows: venv\Scripts\activate
# Install dependencies
pip install requests beautifulsoup4 pytest
# Save dependencies
pip freeze > requirements.txt
# Optional: Install anthropic for API-based enhancement (not needed for LOCAL enhancement)
# pip install anthropic
Always activate the virtual environment before using Skill Seeker:
source venv/bin/activate # Run this each time you start a new terminal session
# Make sure venv is activated (you should see (venv) in your prompt)
source venv/bin/activate
# Optional: Estimate pages first (fast, 1-2 minutes)
skill-seekers estimate configs/godot.json
# Use Godot preset
skill-seekers scrape --config configs/godot.json
# Use React preset
skill-seekers scrape --config configs/react.json
# See all presets
ls configs/
skill-seekers scrape --interactive
skill-seekers scrape \
--name react \
--url https://react.dev/ \
--description "React framework for UIs"
Once your skill is packaged, you need to upload it to Claude:
# Set your API key (one-time)
export ANTHROPIC_API_KEY=sk-ant-...
# Package and upload automatically
skill-seekers package output/react/ --upload
# OR upload existing .zip
skill-seekers upload output/react.zip
Benefits:
Requirements:
# Package skill
skill-seekers package output/react/
# This will:
# 1. Create output/react.zip
# 2. Open the output/ folder automatically
# 3. Show upload instructions
# Then manually upload:
# - Go to https://claude.ai/skills
# - Click "Upload Skill"
# - Select output/react.zip
# - Done!
Benefits:
In Claude Code, just ask: "Package and upload the React skill" # With API key set: # - Packages the skill # - Uploads to Claude automatically # - Done! ✅ # Without API key: # - Packages the skill # - Shows where to find the .zip # - Provides manual upload instructions
Benefits:
doc-to-skill/ ├── cli/ │ ├── doc_scraper.py # Main scraping tool │ ├── package_skill.py # Package to .zip │ ├── upload_skill.py # Auto-upload (API) │ └── enhance_skill.py # AI enhancement ├── mcp/ # MCP server for Claude Code │ └── server.py # 9 MCP tools ├── configs/ # Preset configurations │ ├── godot.json # Godot Engine │ ├── react.json # React │ ├── vue.json # Vue.js │ ├── django.json # Django │ └── fastapi.json # FastAPI └── output/ # All output (auto-created) ├── godot_data/ # Scraped data ├── godot/ # Built skill └── godot.zip # Packaged skill
skill-seekers estimate configs/react.json
# Output:
📊 ESTIMATION RESULTS
✅ Pages Discovered: 180
📈 Estimated Total: 230
⏱️ Time Elapsed: 1.2 minutes
💡 Recommended max_pages: 280
Benefits:
max_pages settingskill-seekers scrape --config configs/godot.json
# If data exists:
✓ Found existing data: 245 pages
Use existing data? (y/n): y
⏭️ Skipping scrape, using existing data
Automatic pattern extraction:
Enhanced SKILL.md:
Automatically infers categories from:
# Automatically detects:
- Python (def, import, from)
- JavaScript (const, let, =>)
- GDScript (func, var, extends)
- C++ (#include, int main)
- And more...
# Scrape once
skill-seekers scrape --config configs/react.json
# Later, just rebuild (instant)
skill-seekers scrape --config configs/react.json --skip-scrape
# Enable async mode with 8 workers (recommended for large docs)
skill-seekers scrape --config configs/react.json --async --workers 8
# Small docs (~100-500 pages)
skill-seekers scrape --config configs/mydocs.json --async --workers 4
# Large docs (2000+ pages) with no rate limiting
skill-seekers scrape --config configs/largedocs.json --async --workers 8 --no-rate-limit
Performance Comparison:
When to use:
See full guide: ASYNC_SUPPORT.md
# Option 1: During scraping (API-based, requires API key)
pip3 install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers scrape --config configs/react.json --enhance
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
skill-seekers scrape --config configs/react.json --enhance-local
# Option 3: After scraping (API-based, standalone)
skill-seekers enhance output/react/
# Option 4: After scraping (LOCAL, no API key, standalone)
skill-seekers enhance output/react/
What it does:
LOCAL Enhancement (Recommended):
For massive documentation sites like Godot (40K pages), AWS, or Microsoft Docs:
# 1. Estimate first (discover page count)
skill-seekers estimate configs/godot.json
# 2. Auto-split into focused sub-skills
python3 -m skill_seekers.cli.split_config configs/godot.json --strategy router
# Creates:
# - godot-scripting.json (5K pages)
# - godot-2d.json (8K pages)
# - godot-3d.json (10K pages)
# - godot-physics.json (6K pages)
# - godot-shaders.json (11K pages)
# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
for config in configs/godot-*.json; do
skill-seekers scrape --config $config &
done
wait
# 4. Generate intelligent router/hub skill
python3 -m skill_seekers.cli.generate_router configs/godot-*.json
# 5. Package all skills
python3 -m skill_seekers.cli.package_multi output/godot*/
# 6. Upload all .zip files to Claude
# Users just ask questions naturally!
# Router automatically directs to the right sub-skill!
Split Strategies:
Benefits:
Configuration:
{
"name": "godot",
"max_pages": 40000,
"split_strategy": "router",
"split_config": {
"target_pages_per_skill": 5000,
"create_router": true,
"split_by_categories": ["scripting", "2d", "3d", "physics"]
}
}
Full Guide: Large Documentation Guide
Never lose progress on long-running scrapes:
# Enable in config
{
"checkpoint": {
"enabled": true,
"interval": 1000 // Save every 1000 pages
}
}
# If scrape is interrupted (Ctrl+C or crash)
skill-seekers scrape --config configs/godot.json --resume
# Resume from last checkpoint
✅ Resuming from checkpoint (12,450 pages scraped)
⏭️ Skipping 12,450 already-scraped pages
🔄 Continuing from where we left off...
# Start fresh (clear checkpoint)
skill-seekers scrape --config configs/godot.json --fresh
Benefits:
--resume flag# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# 2. Wait for new terminal to close (enhancement completes)
# Check the enhanced SKILL.md:
cat output/godot/SKILL.md
# 3. Package
skill-seekers package output/godot/
# 4. Done! You have godot.zip with excellent SKILL.md
Time: 20-40 minutes (scraping) + 60 seconds (enhancement) = ~21-41 minutes
# 1. Use cached data + Local Enhancement
skill-seekers scrape --config configs/godot.json --skip-scrape
skill-seekers enhance output/godot/
# 2. Package
skill-seekers package output/godot/
# 3. Done!
Time: 1-3 minutes (build) + 60 seconds (enhancement) = ~2-4 minutes total
# 1. Scrape + Build (no enhancement)
skill-seekers scrape --config configs/godot.json
# 2. Package
skill-seekers package output/godot/
# 3. Done! (SKILL.md will be basic template)
Time: 20-40 minutes Note: SKILL.md will be generic - enhancement strongly recommended!
| Config | Framework | Description |
|---|---|---|
godot.json | Godot Engine | Game development |
react.json | React | UI framework |
vue.json | Vue.js | Progressive framework |
django.json | Django | Python web framework |
fastapi.json | FastAPI | Modern Python API |
ansible-core.json | Ansible Core 2.19 | Automation & configuration |
# Godot
skill-seekers scrape --config configs/godot.json
# React
skill-seekers scrape --config configs/react.json
# Vue
skill-seekers scrape --config configs/vue.json
# Django
skill-seekers scrape --config configs/django.json
# FastAPI
skill-seekers scrape --config configs/fastapi.json
# Ansible
skill-seekers scrape --config configs/ansible-core.json
skill-seekers scrape --interactive
# Follow prompts, it will create the config for you
# Copy a preset
cp configs/react.json configs/myframework.json
# Edit it
nano configs/myframework.json
# Use it
skill-seekers scrape --config configs/myframework.json
{
"name": "myframework",
"description": "When to use this skill",
"base_url": "https://docs.myframework.com/",
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/docs", "/guide"],
"exclude": ["/blog", "/about"]
},
"categories": {
"getting_started": ["intro", "quickstart"],
"api": ["api", "reference"]
},
"rate_limit": 0.5,
"max_pages": 500
}
output/ ├── godot_data/ # Scraped raw data │ ├── pages/ # JSON files (one per page) │ └── summary.json # Overview │ └── godot/ # The skill ├── SKILL.md # Enhanced with real examples ├── references/ # Categorized docs │ ├── index.md │ ├── getting_started.md │ ├── scripting.md │ └── ... ├── scripts/ # Empty (add your own) └── assets/ # Empty (add your own)
# Interactive mode
skill-seekers scrape --interactive
# Use config file
skill-seekers scrape --config configs/godot.json
# Quick mode
skill-seekers scrape --name react --url https://react.dev/
# Skip scraping (use existing data)
skill-seekers scrape --config configs/godot.json --skip-scrape
# With description
skill-seekers scrape \
--name react \
--url https://react.dev/ \
--description "React framework for building UIs"
Edit max_pages in config to test:
{
"max_pages": 20 // Test with just 20 pages
}
# Scrape once
skill-seekers scrape --config configs/react.json
# Rebuild multiple times (instant)
skill-seekers scrape --config configs/react.json --skip-scrape
skill-seekers scrape --config configs/react.json --skip-scrape
# Test in Python
from bs4 import BeautifulSoup
import requests
url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
# After building, check:
cat output/godot/SKILL.md # Should have real examples
cat output/godot/references/index.md # Categories
main_content selectorarticle, main, div[role="main"]# Force re-scrape
rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json
Edit the config categories section with better keywords.
# Delete old data
rm -rf output/godot_data/
# Re-scrape
skill-seekers scrape --config configs/godot.json
| Task | Time | Notes |
|---|---|---|
| Scraping (sync) | 15-45 min | First time only, thread-based |
| Scraping (async) | 5-15 min | 2-3x faster with --async flag |
| Building | 1-3 min | Fast! |
| Re-building | <1 min | With --skip-scrape |
| Packaging | 5-10 sec | Final zip |
One tool does everything:
Simple structure:
doc_scraper.py - The toolconfigs/ - Presetsoutput/ - Everything elseBetter output:
# Try Godot
skill-seekers scrape --config configs/godot.json
# Try React
skill-seekers scrape --config configs/react.json
# Or go interactive
skill-seekers scrape --interactive
MIT License - see LICENSE file for details
Happy skill building! 🚀