LinuxCommandLibrary

orc-tools

TLDR

Show ORC file metadata

$ orc-metadata [file.orc]
copy
Show file contents
$ orc-contents [file.orc]
copy
Get file statistics
$ orc-statistics [file.orc]
copy
Convert CSV to ORC
$ orc-tools convert [data.csv] -o [output.orc]
copy
Scan ORC file
$ orc-scan [file.orc]
copy
Merge ORC files
$ orc-tools merge [file1.orc] [file2.orc] -o [merged.orc]
copy

SYNOPSIS

orc-tools command [options] files...

DESCRIPTION

orc-tools is a collection of utilities for working with Apache ORC (Optimized Row Columnar) files. ORC is a columnar storage format optimized for Hadoop workloads.
The tools allow inspection, conversion, and manipulation of ORC files.

PARAMETERS

metadata

Show file metadata.
contents
Display contents.
statistics
Show statistics.
convert
Convert to ORC.
scan
Scan and validate.
merge
Merge files.
-o file
Output file.

ORC FEATURES

$ - Columnar storage
- Compression (ZLIB, Snappy, LZO)
- Predicate pushdown
- Type evolution
- ACID support
copy

CAVEATS

Java required. Large files may need memory tuning. Part of Apache ORC project.

HISTORY

Apache ORC was created at Hortonworks for Hive, later becoming a top-level Apache project for efficient columnar storage.

SEE ALSO

parquet-tools(1), hive(1), spark(1)

Copied to clipboard