hive
Execute SQL-like queries on Hadoop data
TLDR
Start a Hive interactive shell
Run HiveQL
Run a HiveQL file with a variable substitution
Run a HiveQL with HiveConfig (e.g. mapred.reduce.tasks=32)
SYNOPSIS
hive [options]...
PARAMETERS
-d variable=value
Define a Hive variable.
-e query
Execute the specified Hive query.
-f filename
Execute the Hive commands from the specified file.
-H
Display usage information.
-i filename
Initialization SQL file to execute.
-p port
Connect to Hive Server on the specified port.
-hiveconf property=value
Use value for given property.
-hivevar name=value
Variable substitution to apply to Hive commands.
-v
Verbose mode (show all commands executed).
DESCRIPTION
The `hive` command is a shell-like interface to Apache Hive, a data warehouse system built on top of Hadoop. It allows users to write and execute HiveQL queries, manage tables, and interact with data stored in Hadoop Distributed File System (HDFS) or other compatible storage systems. The `hive` command provides a way to access Hive without needing to write Java code or use other lower-level APIs. It supports various command-line options for configuration, connection management, and script execution. It's a vital tool for data analysts, data engineers, and database administrators working with large datasets stored in Hadoop-based environments. It's also crucial to use the correct version of hive command for different versions of hive which may lead to errors.
CAVEATS
Ensure the Hadoop environment is properly configured and accessible before running the `hive` command. Version compatibility between Hive client and server is important. Incorrectly formatted HiveQL queries can cause errors or unexpected results.
EXIT CODES
The `hive` command returns 0 on success and a non-zero value on failure. Common causes of failure include invalid HiveQL syntax, connection issues, or errors during query execution.
CONFIGURATION FILES
Hive's behavior can be customized through configuration files such as `hive-site.xml` and `hive-default.xml`. These files define properties like database connection settings, memory allocation, and other parameters that affect Hive's operation.
HISTORY
Apache Hive was initially developed by Facebook and later became an Apache project. It was created to provide a SQL-like interface for querying and analyzing large datasets stored in Hadoop. The `hive` command-line interface has been a fundamental part of Hive since its inception, offering a convenient way for users to interact with the Hive data warehouse system.
Early versions focused on basic query execution and table management. Over time, the `hive` command has been enhanced with new features, improved performance, and better integration with other Hadoop ecosystem components.
SEE ALSO
hadoop(1), beeline(1)