LinuxCommandLibrary

impala

Execute SQL queries on Impala

TLDR

Launch impala in station mode

$ impala
copy

Launch impala in Access Point mode
$ impala [[-m|--mode]] ap
copy

Switch between different sections
$ [<Tab>|<Shift Tab>]
copy

Select a network to connect to
$ <Space>
copy

Display hotkeys
$ <?>
copy

SYNOPSIS

impala [options]

PARAMETERS

-i hostname:port
    Specifies the Impala daemon to connect to. Default port is 21000.

-q query
    Executes a single SQL query in non-interactive mode and exits. Useful for scripting.

--database=db_name
    Specifies the initial database to use after connecting.

--ssl
    Enables SSL/TLS encryption for the connection to Impala.

--ca_cert=path
    Path to the CA certificate file for validating the Impala daemon's certificate when SSL is enabled.

--auth_option=option
    Specifies the authentication mechanism. Common options include PLAIN, NOSASL, GSSAPI (Kerberos).

--auth_creds_kinit
    Uses existing Kerberos credentials obtained via kinit for GSSAPI authentication.

--ldap_user=user
    Specifies the LDAP username for authentication when using LDAP.

--ldap_password_file=path
    Path to a file containing the LDAP password for authentication.

--query_option=key=value
    Sets a query option (e.g., query_timeout_s, batch_size) for the current session.

-f file
    Executes SQL commands from a specified file in non-interactive mode.

--delimited
    Displays query results with tab-separated values, useful for scripting.

--disable_webserver_reporting
    Disables logging the shell's activities to the Impala Web UI.

-h, --help
    Displays a help message and exits.

DESCRIPTION

The impala command launches the Impala shell, a command-line interface for interacting with Apache Impala, a high-performance, distributed SQL query engine for data stored in a Hadoop cluster.

Impala enables users to issue SQL queries directly against large datasets stored in HDFS, Apache Kudu, or Apache HBase, providing low-latency responses for interactive analytics. The shell connects to an Impala daemon (impalad) and allows users to execute SQL statements, manage databases and tables, and retrieve query results. It's an essential tool for developers, data analysts, and administrators working with Impala, facilitating both interactive data exploration and scripting of analytical workloads.

CAVEATS

The impala command requires a running Impala daemon to connect to. It is specifically designed for the Impala ecosystem and is not a general-purpose Linux SQL client. Network connectivity and proper security configurations (e.g., Kerberos, SSL/TLS) are crucial for successful operation in production environments. Performance can vary significantly based on cluster size, data distribution, and query complexity.

INTERACTIVE VS. NON-INTERACTIVE MODE

The impala shell operates in two primary modes: interactive and non-interactive. When invoked without the -q or -f option, it enters an interactive prompt where users can type and execute SQL commands directly. With -q or -f, it executes the provided query or script and then exits, making it suitable for batch processing and automation.

AUTHENTICATION AND AUTHORIZATION

Impala supports various authentication mechanisms, including Kerberos (GSSAPI), LDAP, and basic PLAIN authentication. The impala shell provides options like --auth_option, --auth_creds_kinit, and LDAP-specific parameters to configure secure connections to Impala daemons. Authorization is typically managed via Apache Ranger or similar systems, ensuring users only access permitted data.

HISTORY

Apache Impala was initially developed by Cloudera to provide low-latency, interactive SQL queries over large datasets stored in Hadoop, addressing the performance limitations of earlier SQL-on-Hadoop solutions like Apache Hive (which traditionally relied on MapReduce). It was open-sourced and later became an Apache Incubator project. The impala shell has been a core component since its inception, providing a familiar command-line interface for database interaction, akin to traditional RDBMS clients, but optimized for the big data environment.

SEE ALSO

hive(1), beeline(1), kinit(1), psql(1), mysql(1)

Copied to clipboard