LinuxCommandLibrary

aws-glue

Manage AWS Glue resources

TLDR

List jobs

$ aws glue list-jobs
copy

Start a job
$ aws glue start-job-run --job-name [job_name]
copy

Start running a workflow
$ aws glue start-workflow-run --name [workflow_name]
copy

List triggers
$ aws glue list-triggers
copy

Start a trigger
$ aws glue start-trigger --name [trigger_name]
copy

Create a dev endpoint
$ aws glue create-dev-endpoint --endpoint-name [name] --role-arn [role_arn_used_by_endpoint]
copy

SYNOPSIS

aws glue subcommand [options]

Examples:
aws glue start-job-run --job-name "MyGlueJob"
aws glue get-job-runs --job-name "MyGlueJob"
aws glue create-job --name "NewJob" --role "arn:aws:iam::123456789012:role/GlueServiceRole" --command Type=PythonShell,ScriptLocation=s3://my-bucket/scripts/myscript.py

PARAMETERS

subcommand
    Specifies the AWS Glue API operation to perform. Examples include create-job, start-job-run, get-table, get-databases, list-jobs, etc. Each subcommand has its own specific set of required and optional parameters.

--job-name value
    A common parameter used across many job-related subcommands (e.g., start-job-run, get-job-runs) to identify the target Glue job by its name.

--arguments json-map
    Provides key-value pairs of arguments to pass to a Glue job run. Used with start-job-run to customize job execution parameters.

--region value
    Specifies the AWS region to which the command applies (e.g., us-east-1). Overrides the default region configured for the AWS CLI.

--output value
    Determines the output format (e.g., json, text, table). Default is typically json.

--profile value
    Uses a specific named profile from your AWS credentials file. Useful for managing multiple AWS accounts or roles.

--query value
    Uses JMESPath syntax to filter and transform the output of the command. Highly useful for parsing JSON responses.

--cli-input-json file://path/to/file.json
    Reads command input parameters from a specified JSON file. Useful for complex inputs or automation.

DESCRIPTION

The `aws glue` command is part of the AWS Command Line Interface (AWS CLI), providing a powerful and programmatic way to interact with AWS Glue, a fully managed extract, transform, and load (ETL) service. It enables users to perform a wide range of operations directly from their terminal or scripts.

Through `aws glue` subcommands, you can create, update, and delete Glue jobs (e.g., Spark, Python shell, Ray, Streaming ETL), manage job triggers and workflows, and interact with the Glue Data Catalog to define databases, tables, and partitions. It's essential for automating ETL processes, integrating with CI/CD pipelines, monitoring job runs, and managing connections and security configurations within the AWS Glue environment. This command-line interface offers fine-grained control over your data integration processes.

CAVEATS

Using `aws glue` requires the AWS CLI to be properly installed and configured with valid AWS credentials and IAM permissions. Incorrect permissions can lead to 'Access Denied' errors. Users must have a good understanding of AWS Glue concepts (jobs, triggers, workflows, data catalog) to effectively use the commands. Be mindful of API rate limits, especially in automated scripts, and ensure network connectivity to AWS.

The exact parameters and their syntax vary significantly between subcommands; always refer to the official AWS CLI documentation for specific usage details.

INSTALLATION

The `aws glue` command is part of the AWS CLI v2. It can be installed using various methods such as `pip` (for v1), dedicated installers for Linux, macOS, and Windows, or through package managers. For example, on Linux:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

CONFIGURATION

After installation, configure the AWS CLI with your credentials and preferred region using aws configure:
aws configure
AWS Access Key ID [None]: YOUR_ACCESS_KEY_ID
AWS Secret Access Key [None]: YOUR_SECRET_ACCESS_KEY
Default region name [None]: your-preferred-region
Default output format [None]: json
This setup allows `aws glue` commands to authenticate and operate correctly within your AWS account.

COMMON USE CASES

Common use cases for `aws glue` include:
1. Triggering Glue ETL jobs and monitoring their progress.
2. Listing and inspecting existing Glue jobs, triggers, and workflows.
3. Programmatically creating or updating Glue Data Catalog tables and partitions.
4. Integrating Glue operations into CI/CD pipelines for automated deployments of ETL processes.

HISTORY

AWS Glue was first announced and became generally available in August 2017, offering a serverless data integration service. The `aws glue` subcommand was subsequently integrated into the AWS CLI shortly after the service's launch, providing developers and administrators a powerful command-line interface to manage Glue resources. Since its inception, the `aws glue` command set has evolved alongside the Glue service, incorporating new features, job types (like Ray and Streaming ETL), and API enhancements to support the growing needs of data professionals.

SEE ALSO

aws(1) - The main AWS CLI command., jq(1) - A lightweight and flexible command-line JSON processor, often used to parse output from AWS CLI commands., sh(1) / bash(1) - Standard Unix shell for scripting and automating AWS CLI commands.

Copied to clipboard