orc-tools
TLDR
Show ORC file metadata
$ orc-metadata [file.orc]
Show file contents$ orc-contents [file.orc]
Get file statistics$ orc-statistics [file.orc]
Convert CSV to ORC$ orc-tools convert [data.csv] -o [output.orc]
Scan ORC file$ orc-scan [file.orc]
Merge ORC files$ orc-tools merge [file1.orc] [file2.orc] -o [merged.orc]
SYNOPSIS
orc-tools command [options] files...
DESCRIPTION
orc-tools is a collection of utilities for working with Apache ORC (Optimized Row Columnar) files. ORC is a columnar storage format optimized for Hadoop workloads.
The tools allow inspection, conversion, and manipulation of ORC files.
PARAMETERS
metadata
Show file metadata.contents
Display contents.statistics
Show statistics.convert
Convert to ORC.scan
Scan and validate.merge
Merge files.-o file
Output file.
ORC FEATURES
$ - Columnar storage
- Compression (ZLIB, Snappy, LZO)
- Predicate pushdown
- Type evolution
- ACID support
- Compression (ZLIB, Snappy, LZO)
- Predicate pushdown
- Type evolution
- ACID support
CAVEATS
Java required. Large files may need memory tuning. Part of Apache ORC project.
HISTORY
Apache ORC was created at Hortonworks for Hive, later becoming a top-level Apache project for efficient columnar storage.
SEE ALSO
parquet-tools(1), hive(1), spark(1)


