aws-glue
Manage AWS Glue resources
TLDR
List jobs
Start a job
Start running a workflow
List triggers
Start a trigger
Create a dev endpoint
SYNOPSIS
aws glue [global-options] subcommand [subcommand-options]
PARAMETERS
--aws-access-key-id
AWS access key ID.
--aws-secret-access-key
AWS secret access key.
--aws-session-token
AWS session token.
--ca-bundle
CA bundle for SSL verification.
--cli-auto-prompt
Automatically prompt for CLI input.
--cli-binary-format raw|base64
Binary format for input/output.
--cli-connect-timeout
Connection timeout in seconds.
--cli-read-timeout
Read timeout in seconds.
--debug
Enable debug logging.
--endpoint-url
Override service endpoint URL.
--max-items
Maximum items to return.
--no-cli-pager
Disable cli paging.
--no-paginate
Disable automatic pagination.
--no-sign-request
Do not sign requests.
--output json|text|table
Output format.
--page-size
Page size for paginated results.
--profile
Named profile from credentials file.
--region
AWS region (e.g., us-east-1).
--region-set
List of regions to try.
--no-verify-ssl
Disable SSL certificate verification.
DESCRIPTION
The aws glue command is a subcommand of the AWS Command Line Interface (CLI) used to manage AWS Glue, a serverless data integration service for ETL (extract, transform, load) workloads. It enables automation of data cataloging, job orchestration, crawling, and schema discovery across data stores like Amazon S3, RDS, DynamoDB, and Redshift.
AWS Glue automatically discovers, catalogs, and cleans data, making it available for querying with Athena or analysis in Redshift Spectrum. The CLI supports creating and managing jobs (Python/Spark scripts), crawlers (schema inference), triggers, workflows, databases, tables, partitions, classifiers, and development endpoints.
Operations include batch actions for efficiency, job monitoring with run history, and integration with Lake Formation for governance. Output formats include JSON, text, or table; pagination is automatic.
Prerequisites: Install AWS CLI v2 (curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"; unzip awscliv2.zip; ./aws/install), configure credentials (aws configure), and attach IAM policies like AWSGlueServiceRole. Use with --dry-run for testing.
Common use cases: ETL pipelines, data lake building, ML feature stores. Scales serverlessly, charges per DPU-hour.
CAVEATS
Requires AWS CLI installed and configured; IAM permissions essential (e.g., glue:CreateJob). Rate limits apply; use --dry-run. Not for interactive use—script-friendly.
COMMON SUBCOMMANDS
create-job, get-job, start-job-run, create-crawler, start-crawler, get-database, batch-create-partition, list-jobs.
EXAMPLE USAGE
aws glue create-job --job-name my-etl --role ARN:aws:iam::123:role/GlueRole --command Name=glueetl,ScriptLocation=s3://bucket/script.py
aws glue start-job-run --job-name my-etl
aws glue get-crawler --name my-crawler
HISTORY
Launched August 2017 with AWS Glue service. AWS CLI v1 initial support; v2 (2019+) added binary JSON, faster performance. Evolving with Glue 4.0 (2023) for Spark 3.3, Ray.


