LinuxCommandLibrary

odps-tunnel

Upload and download data to/from MaxCompute

TLDR

Download table to local file

$ tunnel download [table_name] [path/to/file];
copy

Upload local file to a table partition
$ tunnel upload [path/to/file] [table_name]/[partition_spec];
copy

Upload table specifying field and record delimiters
$ tunnel upload [path/to/file] [table_name] -fd [field_delim] -rd [record_delim];
copy

Upload table using multiple threads
$ tunnel upload [path/to/file] [table_name] -threads [num];
copy

SYNOPSIS

odps-tunnel command [options]

Common Commands:
odps-tunnel upload [options] <local_file_path> <table_name>
odps-tunnel download [options] <table_name> <local_file_path>

PARAMETERS

-e, --endpoint
    Specifies the MaxCompute service endpoint (e.g., 'http://service.cn-hangzhou.maxcompute.aliyun.com/api').

-i, --id
    Specifies the Alibaba Cloud AccessKey ID for authentication.

-k, --key
    Specifies the Alibaba Cloud AccessKey Secret for authentication.

-p, --project
    Specifies the MaxCompute project name to operate within.

-t, --table
    Specifies the target MaxCompute table name for data transfer.

-pt, --partition
    Specifies the partition for a partitioned table (e.g., 'dt=20230101,hr=10').

-f, --file
    Specifies the local file path for data input or output.

-h, --help
    Displays help information for the command or a specific subcommand.

-v, --version
    Displays the version of the MaxCompute Tunnel client.

--skip-header
    For upload, skips the first line of the input file, treating it as a header.

--delimiter
    Specifies the column delimiter for text files (e.g., ',').

--encoding
    Specifies the character encoding for the input/output file (e.g., 'UTF-8', 'GBK').

--compress
    Enables data compression during transfer to improve performance.

--threads
    Specifies the number of concurrent threads for data transfer (e.g., '8').

--overwrite
    For upload, overwrites existing data in the target table or partition.

--strict
    Enables strict mode for data type conversion during upload, enforcing exact type matching.

--null-value
    Specifies the string representation for NULL values in the data file (e.g., '\N').

DESCRIPTION

The odps-tunnel command is a command-line interface (CLI) tool for Alibaba Cloud's MaxCompute (formerly ODPS) service. It serves as a client-side utility primarily for efficient and high-performance data transfer between local file systems and MaxCompute tables. Users can leverage odps-tunnel to upload large datasets from local files into MaxCompute tables, or to download data from MaxCompute tables to local files. It supports various data formats, encoding, compression, and offers options for handling partitioned tables, error management, and parallel processing to optimize data transfer performance. This tool is essential for data engineers and analysts working with MaxCompute for ETL processes, data loading, and reporting.

CAVEATS

Authentication: Requires valid Alibaba Cloud AccessKey ID and Secret. It is recommended to use environment variables or configuration files for credentials rather than directly on the command line for security.
Network Latency: Performance can be significantly affected by network latency and bandwidth between the client and MaxCompute endpoint.
Data Volume: While optimized for large data volumes, extremely large files might still benefit from data partitioning or specific MaxCompute import/export features for better management.
Schema Matching: When uploading, the local data schema must be compatible with the MaxCompute table schema; otherwise, data conversion errors may occur.

CONFIGURATION FILE SUPPORT

The odps-tunnel client supports reading configurations from a file (e.g., 'tunnel.ini' or similar) to manage common parameters like endpoint, AccessKey, and project, avoiding repetitive command-line input and centralizing settings.

DATA FORMATS

While primarily used for text-based data (CSV, TSV), the client is optimized for various data types and can handle complex data structures if properly formatted according to MaxCompute's table schema.

JOB ID & STATUS

Each upload or download operation initiated through odps-tunnel is assigned a unique job ID. This ID can be used with subcommands like 'show' and 'purge' to monitor the progress or manage the lifecycle of the data transfer job.

HISTORY

MaxCompute was originally launched as ODPS (Open Data Processing Service) by Alibaba Cloud. The odps-tunnel client was developed as an essential component of the ODPS ecosystem to facilitate high-throughput data ingestion and extraction. As ODPS evolved into MaxCompute, the odps-tunnel client continued to be maintained and updated, reflecting the platform's advancements in data processing capabilities and new features like enhanced data types and partition management. Its development has focused on improving transfer performance, stability, and ease of use for handling massive datasets in cloud environments.

SEE ALSO

scp(1), rsync(1)

Copied to clipboard