DuckDB Expert
DuckDBを使用した大規模データ分析の専門スキル。DuckDBのアーキテクチャ、SQL構文、パフォーマンス最適化、各種ファイル形式(CSV、Parquet、JSON等)の効率的な読み込み・書き出しを熟知。メモリ効率の良い分析、複雑なクエリの最適化、データパイプライン構築を支援。Use when analyzing large datasets, querying CSV/Parquet/JSON files directly, building data pipelines, or optimizing analytical queries with DuckDB.
No API Required
Download Skill Package (.skill) View Source on GitHub
Table of Contents
1. Overview
DuckDBは、OLAP(オンライン分析処理)に最適化された組み込み型の列指向データベースです。このスキルは、DuckDBを使用した効率的なデータ分析、大規模データセットの処理、パフォーマンス最適化を支援します。
2. Prerequisites
- Python 3.8+ with
duckdbpackage installed - Optional:
pandas,polars,pyarrowfor DataFrame integration
# Install DuckDB (required)
pip install duckdb
# Optional: Install DataFrame libraries
pip install pandas polars pyarrow
3. Quick Start
1. Connect to DuckDB (in-memory or persistent)
2. Load/query data files directly (CSV, Parquet, JSON)
3. Transform and analyze using SQL
4. Export results (DataFrame, file, or database)
4. How It Works
1. Connect to DuckDB (in-memory or persistent)
2. Load/query data files directly (CSV, Parquet, JSON)
3. Transform and analyze using SQL
4. Export results (DataFrame, file, or database)
5. Usage Examples
- 大規模なCSV、Parquet、JSONファイルを直接クエリしたい
- メモリに収まらない大きなデータセットを分析したい
- SQLを使ってデータ変換やETLパイプラインを構築したい
- pandas/Polarsと連携した高速なデータ処理を行いたい
- ファイルベースのデータウェアハウスを構築したい
- 複雑な分析クエリのパフォーマンスを最適化したい
6. Understanding the Output
- Data Profile Report: Markdown/JSON report with schema, statistics, quality metrics
- Query Results: pandas/Polars DataFrame, Arrow table, or raw tuples
- Transformed Files: Parquet, CSV, or JSON with optional compression/partitioning
7. Tips & Best Practices
- Begin with the smallest realistic sample input so you can validate the workflow before scaling up.
- Keep
skills/duckdb-expert/SKILL.mdopen while working; it remains the authoritative source for the full procedure. - Review the most relevant reference files first instead of scanning every guide: duckdb_functions_reference.md, file_formats_guide.md, performance_tuning_guide.md.
- Run helper scripts on test data before using them on final assets or production-bound inputs: etl_pipeline.py, duckdb_analyzer.py.
- Preserve intermediate outputs so you can explain assumptions, diffs, and follow-up actions clearly.
8. Combining with Other Skills
- Combine this skill with adjacent skills in the same category when the work spans planning, implementation, and review.
- Browse the broader category for neighboring workflows: category index.
- Use the English skill catalog when you need to chain this workflow into a larger end-to-end process.
9. Troubleshooting
- Re-check prerequisites first: missing runtime dependencies and unsupported file formats are the most common failures.
- If a helper script is involved, run it with a minimal sample input before applying it to a full dataset or repository.
- Compare your input shape against the reference files to confirm expected fields, sections, or metadata are present.
- Confirm the expected Python version and required packages are installed in the active environment.
- When output looks incomplete, inspect the script arguments and rerun with explicit input/output paths.
10. Reference
References:
skills/duckdb-expert/references/duckdb_functions_reference.mdskills/duckdb-expert/references/file_formats_guide.mdskills/duckdb-expert/references/performance_tuning_guide.md
Scripts:
skills/duckdb-expert/scripts/duckdb_analyzer.pyskills/duckdb-expert/scripts/etl_pipeline.py