impala
Execute SQL queries on Impala
TLDR
Launch impala in station mode
Launch impala in Access Point mode
Switch between different sections
Select a network to connect to
Display hotkeys
SYNOPSIS
impala [options]
PARAMETERS
-i hostname:port
Specifies the Impala daemon to connect to. Default port is 21000.
-q query
Executes a single SQL query in non-interactive mode and exits. Useful for scripting.
--database=db_name
Specifies the initial database to use after connecting.
--ssl
Enables SSL/TLS encryption for the connection to Impala.
--ca_cert=path
Path to the CA certificate file for validating the Impala daemon's certificate when SSL is enabled.
--auth_option=option
Specifies the authentication mechanism. Common options include PLAIN, NOSASL, GSSAPI (Kerberos).
--auth_creds_kinit
Uses existing Kerberos credentials obtained via kinit for GSSAPI authentication.
--ldap_user=user
Specifies the LDAP username for authentication when using LDAP.
--ldap_password_file=path
Path to a file containing the LDAP password for authentication.
--query_option=key=value
Sets a query option (e.g., query_timeout_s, batch_size) for the current session.
-f file
Executes SQL commands from a specified file in non-interactive mode.
--delimited
Displays query results with tab-separated values, useful for scripting.
--disable_webserver_reporting
Disables logging the shell's activities to the Impala Web UI.
-h, --help
Displays a help message and exits.
DESCRIPTION
The impala command launches the Impala shell, a command-line interface for interacting with Apache Impala, a high-performance, distributed SQL query engine for data stored in a Hadoop cluster.
Impala enables users to issue SQL queries directly against large datasets stored in HDFS, Apache Kudu, or Apache HBase, providing low-latency responses for interactive analytics. The shell connects to an Impala daemon (impalad) and allows users to execute SQL statements, manage databases and tables, and retrieve query results. It's an essential tool for developers, data analysts, and administrators working with Impala, facilitating both interactive data exploration and scripting of analytical workloads.
CAVEATS
The impala command requires a running Impala daemon to connect to. It is specifically designed for the Impala ecosystem and is not a general-purpose Linux SQL client. Network connectivity and proper security configurations (e.g., Kerberos, SSL/TLS) are crucial for successful operation in production environments. Performance can vary significantly based on cluster size, data distribution, and query complexity.
INTERACTIVE VS. NON-INTERACTIVE MODE
The impala shell operates in two primary modes: interactive and non-interactive. When invoked without the -q or -f option, it enters an interactive prompt where users can type and execute SQL commands directly. With -q or -f, it executes the provided query or script and then exits, making it suitable for batch processing and automation.
AUTHENTICATION AND AUTHORIZATION
Impala supports various authentication mechanisms, including Kerberos (GSSAPI), LDAP, and basic PLAIN authentication. The impala shell provides options like --auth_option, --auth_creds_kinit, and LDAP-specific parameters to configure secure connections to Impala daemons. Authorization is typically managed via Apache Ranger or similar systems, ensuring users only access permitted data.
HISTORY
Apache Impala was initially developed by Cloudera to provide low-latency, interactive SQL queries over large datasets stored in Hadoop, addressing the performance limitations of earlier SQL-on-Hadoop solutions like Apache Hive (which traditionally relied on MapReduce). It was open-sourced and later became an Apache Incubator project. The impala shell has been a core component since its inception, providing a familiar command-line interface for database interaction, akin to traditional RDBMS clients, but optimized for the big data environment.


