Introduction
Parquet Tools is a command-line utility for working with Parquet files, providing essential functionality for inspecting, converting, and analyzing Parquet data. This guide covers installation methods and common usage patterns.
Installation
Using Homebrew
# Install Parquet Tools using Homebrew
brew install parquet-tools
Using pip
# Install Parquet Tools using pip
pip install parquet-tools
Basic Usage
File Inspection
# Show file schema and metadata
parquet-tools show FILE.parquet
# Inspect file details
parquet-tools inspect FILE.parquet
# Count rows in a file
parquet-tools rowcount buildings_8563.parquet
Data Viewing
# View first few rows
parquet-tools head buildings_8563.parquet
# View all rows in JSON format
parquet-tools cat --json buildings_8563.parquet
File Conversion
# Convert Parquet to CSV
parquet-tools csv FILE.parquet > FILE.csv
Common Use Cases
Schema Analysis
# Get detailed schema information
parquet-tools show --detail FILE.parquet
Data Sampling
# View random sample of rows
parquet-tools sample --num-rows 100 FILE.parquet
Best Practices
- Always check file schema before processing
- Use appropriate output formats for your needs
- Consider memory usage when viewing large files
- Keep track of row counts for data validation
- Use JSON output for programmatic processing
- Regular schema validation
- Document any schema changes