aws-glue

Manage AWS Glue resources

TLDR

List jobs

$ aws glue list-jobs

Start a job

$ aws glue start-job-run --job-name [job_name]

Start running a workflow

$ aws glue start-workflow-run --name [workflow_name]

List triggers

$ aws glue list-triggers

Start a trigger

$ aws glue start-trigger --name [trigger_name]

Create a dev endpoint

$ aws glue create-dev-endpoint --endpoint-name [name] --role-arn [role_arn_used_by_endpoint]

SYNOPSIS

aws glue [global-options] subcommand [subcommand-options]

--aws-access-key-id
    AWS access key ID.

--aws-secret-access-key
    AWS secret access key.

--aws-session-token
    AWS session token.

--ca-bundle
    CA bundle for SSL verification.

--cli-auto-prompt
    Automatically prompt for CLI input.

--cli-binary-format raw|base64
    Binary format for input/output.

--cli-connect-timeout
    Connection timeout in seconds.

--cli-read-timeout
    Read timeout in seconds.

--debug
    Enable debug logging.

--endpoint-url
    Override service endpoint URL.

--max-items
    Maximum items to return.

--no-cli-pager
    Disable cli paging.

--no-paginate
    Disable automatic pagination.

--no-sign-request
    Do not sign requests.

--output json|text|table
    Output format.

--page-size
    Page size for paginated results.

--profile
    Named profile from credentials file.

--region
    AWS region (e.g., us-east-1).

--region-set
    List of regions to try.

--no-verify-ssl
    Disable SSL certificate verification.

DESCRIPTION

The aws glue command is a subcommand of the AWS Command Line Interface (CLI) used to manage AWS Glue, a serverless data integration service for ETL (extract, transform, load) workloads. It enables automation of data cataloging, job orchestration, crawling, and schema discovery across data stores like Amazon S3, RDS, DynamoDB, and Redshift.

AWS Glue automatically discovers, catalogs, and cleans data, making it available for querying with Athena or analysis in Redshift Spectrum. The CLI supports creating and managing jobs (Python/Spark scripts), crawlers (schema inference), triggers, workflows, databases, tables, partitions, classifiers, and development endpoints.

Operations include batch actions for efficiency, job monitoring with run history, and integration with Lake Formation for governance. Output formats include JSON, text, or table; pagination is automatic.

Prerequisites: Install AWS CLI v2 (curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"; unzip awscliv2.zip; ./aws/install), configure credentials (aws configure), and attach IAM policies like AWSGlueServiceRole. Use with --dry-run for testing.

Common use cases: ETL pipelines, data lake building, ML feature stores. Scales serverlessly, charges per DPU-hour.

aws-glue

Manage AWS Glue resources

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

COMMON SUBCOMMANDS

EXAMPLE USAGE

HISTORY

SEE ALSO