odps-tunnel
Upload and download data to/from MaxCompute
TLDR
Download table to local file
Upload local file to a table partition
Upload table specifying field and record delimiters
Upload table using multiple threads
SYNOPSIS
odps-tunnel command [options]
Common Commands:
odps-tunnel upload [options] <local_file_path> <table_name>
odps-tunnel download [options] <table_name> <local_file_path>
PARAMETERS
-e, --endpoint
Specifies the MaxCompute service endpoint (e.g., 'http://service.cn-hangzhou.maxcompute.aliyun.com/api').
-i, --id
Specifies the Alibaba Cloud AccessKey ID for authentication.
-k, --key
Specifies the Alibaba Cloud AccessKey Secret for authentication.
-p, --project
Specifies the MaxCompute project name to operate within.
-t, --table
Specifies the target MaxCompute table name for data transfer.
-pt, --partition
Specifies the partition for a partitioned table (e.g., 'dt=20230101,hr=10').
-f, --file
Specifies the local file path for data input or output.
-h, --help
Displays help information for the command or a specific subcommand.
-v, --version
Displays the version of the MaxCompute Tunnel client.
--skip-header
For upload, skips the first line of the input file, treating it as a header.
--delimiter
Specifies the column delimiter for text files (e.g., ',').
--encoding
Specifies the character encoding for the input/output file (e.g., 'UTF-8', 'GBK').
--compress
Enables data compression during transfer to improve performance.
--threads
Specifies the number of concurrent threads for data transfer (e.g., '8').
--overwrite
For upload, overwrites existing data in the target table or partition.
--strict
Enables strict mode for data type conversion during upload, enforcing exact type matching.
--null-value
Specifies the string representation for NULL values in the data file (e.g., '\N').
DESCRIPTION
The odps-tunnel command is a command-line interface (CLI) tool for Alibaba Cloud's MaxCompute (formerly ODPS) service. It serves as a client-side utility primarily for efficient and high-performance data transfer between local file systems and MaxCompute tables. Users can leverage odps-tunnel to upload large datasets from local files into MaxCompute tables, or to download data from MaxCompute tables to local files. It supports various data formats, encoding, compression, and offers options for handling partitioned tables, error management, and parallel processing to optimize data transfer performance. This tool is essential for data engineers and analysts working with MaxCompute for ETL processes, data loading, and reporting.
CAVEATS
Authentication: Requires valid Alibaba Cloud AccessKey ID and Secret. It is recommended to use environment variables or configuration files for credentials rather than directly on the command line for security.
Network Latency: Performance can be significantly affected by network latency and bandwidth between the client and MaxCompute endpoint.
Data Volume: While optimized for large data volumes, extremely large files might still benefit from data partitioning or specific MaxCompute import/export features for better management.
Schema Matching: When uploading, the local data schema must be compatible with the MaxCompute table schema; otherwise, data conversion errors may occur.
CONFIGURATION FILE SUPPORT
The odps-tunnel client supports reading configurations from a file (e.g., 'tunnel.ini' or similar) to manage common parameters like endpoint, AccessKey, and project, avoiding repetitive command-line input and centralizing settings.
DATA FORMATS
While primarily used for text-based data (CSV, TSV), the client is optimized for various data types and can handle complex data structures if properly formatted according to MaxCompute's table schema.
JOB ID & STATUS
Each upload or download operation initiated through odps-tunnel is assigned a unique job ID. This ID can be used with subcommands like 'show' and 'purge' to monitor the progress or manage the lifecycle of the data transfer job.
HISTORY
MaxCompute was originally launched as ODPS (Open Data Processing Service) by Alibaba Cloud. The odps-tunnel client was developed as an essential component of the ODPS ecosystem to facilitate high-throughput data ingestion and extraction. As ODPS evolved into MaxCompute, the odps-tunnel client continued to be maintained and updated, reflecting the platform's advancements in data processing capabilities and new features like enhanced data types and partition management. Its development has focused on improving transfer performance, stability, and ease of use for handling massive datasets in cloud environments.