LinuxCommandLibrary

hive

Execute SQL-like queries on Hadoop data

TLDR

Start a Hive interactive shell

$ hive
copy

Run HiveQL
$ hive -e "[hiveql_query]"
copy

Run a HiveQL file with a variable substitution
$ hive [[-d|--define]] [key]=[value] -f [path/to/file.sql]
copy

Run a HiveQL with HiveConfig (e.g. mapred.reduce.tasks=32)
$ hive --hiveconf [conf_name]=[conf_value]
copy

SYNOPSIS

hive [options]...

PARAMETERS

-d variable=value
    Define a Hive variable.

-e query
    Execute the specified Hive query.

-f filename
    Execute the Hive commands from the specified file.

-H
    Display usage information.

-i filename
    Initialization SQL file to execute.

-p port
    Connect to Hive Server on the specified port.

-hiveconf property=value
    Use value for given property.

-hivevar name=value
    Variable substitution to apply to Hive commands.

-v
    Verbose mode (show all commands executed).

DESCRIPTION

The `hive` command is a shell-like interface to Apache Hive, a data warehouse system built on top of Hadoop. It allows users to write and execute HiveQL queries, manage tables, and interact with data stored in Hadoop Distributed File System (HDFS) or other compatible storage systems. The `hive` command provides a way to access Hive without needing to write Java code or use other lower-level APIs. It supports various command-line options for configuration, connection management, and script execution. It's a vital tool for data analysts, data engineers, and database administrators working with large datasets stored in Hadoop-based environments. It's also crucial to use the correct version of hive command for different versions of hive which may lead to errors.

CAVEATS

Ensure the Hadoop environment is properly configured and accessible before running the `hive` command. Version compatibility between Hive client and server is important. Incorrectly formatted HiveQL queries can cause errors or unexpected results.

EXIT CODES

The `hive` command returns 0 on success and a non-zero value on failure. Common causes of failure include invalid HiveQL syntax, connection issues, or errors during query execution.

CONFIGURATION FILES

Hive's behavior can be customized through configuration files such as `hive-site.xml` and `hive-default.xml`. These files define properties like database connection settings, memory allocation, and other parameters that affect Hive's operation.

HISTORY

Apache Hive was initially developed by Facebook and later became an Apache project. It was created to provide a SQL-like interface for querying and analyzing large datasets stored in Hadoop. The `hive` command-line interface has been a fundamental part of Hive since its inception, offering a convenient way for users to interact with the Hive data warehouse system.

Early versions focused on basic query execution and table management. Over time, the `hive` command has been enhanced with new features, improved performance, and better integration with other Hadoop ecosystem components.

SEE ALSO

hadoop(1), beeline(1)

Copied to clipboard