DuckDB Expert

DuckDBを使用した大規模データ分析の専門スキル。DuckDBのアーキテクチャ、SQL構文、パフォーマンス最適化、各種ファイル形式（CSV、Parquet、JSON等）の効率的な読み込み・書き出しを熟知。メモリ効率の良い分析、複雑なクエリの最適化、データパイプライン構築を支援。Use when analyzing large datasets, querying CSV/Parquet/JSON files directly, building data pipelines, or optimizing analytical queries with DuckDB.

No API Required

Download Skill Package (.skill) View Source on GitHub

Table of Contents

1. Overview
2. Prerequisites
3. Quick Start
4. How It Works
5. Usage Examples
6. Understanding the Output
7. Tips & Best Practices
8. Combining with Other Skills
9. Troubleshooting
10. Reference

1. Overview

DuckDBは、OLAP（オンライン分析処理）に最適化された組み込み型の列指向データベースです。このスキルは、DuckDBを使用した効率的なデータ分析、大規模データセットの処理、パフォーマンス最適化を支援します。

2. Prerequisites

Python 3.8+ with duckdb package installed
Optional: pandas, polars, pyarrow for DataFrame integration

# Install DuckDB (required)
pip install duckdb

# Optional: Install DataFrame libraries
pip install pandas polars pyarrow

3. Quick Start

Connect to DuckDB (in-memory or persistent)
Load/query data files directly (CSV, Parquet, JSON)
Transform and analyze using SQL
Export results (DataFrame, file, or database)

4. How It Works

Connect to DuckDB (in-memory or persistent)
Load/query data files directly (CSV, Parquet, JSON)
Transform and analyze using SQL
Export results (DataFrame, file, or database)

5. Usage Examples

大規模なCSV、Parquet、JSONファイルを直接クエリしたい
メモリに収まらない大きなデータセットを分析したい
SQLを使ってデータ変換やETLパイプラインを構築したい
pandas/Polarsと連携した高速なデータ処理を行いたい
ファイルベースのデータウェアハウスを構築したい
複雑な分析クエリのパフォーマンスを最適化したい

6. Understanding the Output

Data Profile Report: Markdown/JSON report with schema, statistics, quality metrics
Query Results: pandas/Polars DataFrame, Arrow table, or raw tuples
Transformed Files: Parquet, CSV, or JSON with optional compression/partitioning

7. Tips & Best Practices

Begin with the smallest realistic sample input so you can validate the workflow before scaling up.
Keep skills/duckdb-expert/SKILL.md open while working; it remains the authoritative source for the full procedure.
Review the most relevant reference files first instead of scanning every guide: duckdb_functions_reference.md, file_formats_guide.md, performance_tuning_guide.md.
Run helper scripts on test data before using them on final assets or production-bound inputs: etl_pipeline.py, duckdb_analyzer.py.
Preserve intermediate outputs so you can explain assumptions, diffs, and follow-up actions clearly.

8. Combining with Other Skills

Combine this skill with adjacent skills in the same category when the work spans planning, implementation, and review.
Browse the broader category for neighboring workflows: category index.
Use the English skill catalog when you need to chain this workflow into a larger end-to-end process.

9. Troubleshooting

Re-check prerequisites first: missing runtime dependencies and unsupported file formats are the most common failures.
If a helper script is involved, run it with a minimal sample input before applying it to a full dataset or repository.
Compare your input shape against the reference files to confirm expected fields, sections, or metadata are present.
Confirm the expected Python version and required packages are installed in the active environment.
When output looks incomplete, inspect the script arguments and rerun with explicit input/output paths.

10. Reference

References:

skills/duckdb-expert/references/duckdb_functions_reference.md
skills/duckdb-expert/references/file_formats_guide.md
skills/duckdb-expert/references/performance_tuning_guide.md

Scripts:

skills/duckdb-expert/scripts/duckdb_analyzer.py
skills/duckdb-expert/scripts/etl_pipeline.py