LinuxCommandLibrary

parquet-tools

TLDR

Show schema

$ parquet-tools schema [file.parquet]
copy
View data
$ parquet-tools cat [file.parquet]
copy
Show metadata
$ parquet-tools meta [file.parquet]
copy
View first N rows
$ parquet-tools head -n [10] [file.parquet]
copy
Show row count
$ parquet-tools rowcount [file.parquet]
copy
Convert to JSON
$ parquet-tools cat --json [file.parquet]
copy
Show column info
$ parquet-tools column-index [file.parquet]
copy

SYNOPSIS

parquet-tools command [options] file

DESCRIPTION

parquet-tools inspects Apache Parquet files. Parquet is a columnar storage format used in big data systems.
Schema inspection shows column names, types, and nesting. This helps understand data structure without reading contents.
Cat and head commands display actual data. JSON output integrates with other tools.
Metadata shows compression, encoding, and statistics. Row groups and column chunks reveal physical layout.
Parquet files from Spark, Hive, and other systems can be examined. Useful for debugging data pipelines.

PARAMETERS

cat

Print file contents.
head
Print first rows.
schema
Show schema.
meta
Show file metadata.
rowcount
Count rows.
column-index
Show column index.
-n N
Number of rows.
--json
JSON output format.
--columns COLS
Specific columns.

CAVEATS

Large files may be slow to fully read. Some complex types display differently. Requires Java runtime.

HISTORY

Parquet format was developed by Twitter and Cloudera around 2013. parquet-tools provides command-line inspection for the widely-adopted columnar format.

SEE ALSO

Copied to clipboard