View Source AWS.Glue (aws-elixir v1.0.4)
Glue
Defines the public endpoint for the Glue service.
Link to this section Summary
Functions
Creates one or more partitions in a batch operation.
Deletes a list of connection definitions from the Data Catalog.
Deletes one or more partitions in a batch operation.
Deletes multiple tables at once.
Deletes a specified batch of versions of a table.
Retrieves information about a list of blueprints.
Returns a list of resource metadata for a given list of crawler names.
Retrieves the details for the custom patterns specified by a list of names.
Retrieves a list of data quality results for the specified result IDs.
Returns a list of resource metadata for a given list of development endpoint names.
Returns a list of resource metadata for a given list of job names.
Retrieves partitions in a batch request.
Returns the configuration for the specified table optimizers.
Returns a list of resource metadata for a given list of trigger names.
Returns a list of resource metadata for a given list of workflow names.
Annotate datapoints over time for a specific data quality statistic.
Stops one or more job runs for a specified job definition.
Updates one or more partitions in a batch operation.
Cancels the specified recommendation run that was being used to generate rules.
Cancels a run where a ruleset is being evaluated against a data source.
Cancels (stops) a task run.
Cancels the statement.
Validates the supplied schema.
Registers a blueprint with Glue.
Creates a new catalog in the Glue Data Catalog.
Creates a classifier in the user's account.
Creates settings for a column statistics task.
Creates a connection definition in the Data Catalog.
Creates a new crawler with specified targets, role, configuration, and optional schedule.
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
Creates a new database in a Data Catalog.
Creates a new development endpoint.
Creates a Zero-ETL integration in the caller's account between two resources
with Amazon Resource Names (ARNs): the SourceArn
and TargetArn
.
This API can be used for setting up the ResourceProperty
of the Glue
connection (for the source) or Glue database ARN (for the target).
This API is used to provide optional override properties for the the tables that need to be replicated.
Creates a new job definition.
Creates an Glue machine learning transform.
Creates a new partition.
Creates a specified partition index in an existing table.
Creates a new registry which may be used to hold a collection of schemas.
Creates a new schema set and registers the schema definition.
Transforms a directed acyclic graph (DAG) into code.
Creates a new security configuration.
Creates a new session.
Creates a new table definition in the Data Catalog.
Creates a new table optimizer for a specific function.
Creates a new trigger.
Creates an Glue usage profile.
Creates a new function definition in the Data Catalog.
Creates a new workflow.
Deletes an existing blueprint.
Removes the specified catalog from the Glue Data Catalog.
Removes a classifier from the Data Catalog.
Delete the partition column statistics of a column.
Retrieves table statistics of columns.
Deletes settings for a column statistics task.
Deletes a connection from the Data Catalog.
Removes a specified crawler from the Glue Data Catalog, unless the crawler state
is
RUNNING
.
Deletes a custom pattern by specifying its name.
Deletes a data quality ruleset.
Removes a specified database from a Data Catalog.
Deletes a specified development endpoint.
Deletes the specified Zero-ETL integration.
Deletes the table properties that have been created for the tables that need to be replicated.
Deletes a specified job definition.
Deletes an Glue machine learning transform.
Deletes a specified partition.
Deletes a specified partition index from an existing table.
Delete the entire registry including schema and all of its versions.
Deletes a specified policy.
Deletes the entire schema set, including the schema set and all of its versions.
Remove versions from the specified schema.
Deletes a specified security configuration.
Deletes the session.
Removes a table definition from the Data Catalog.
Deletes an optimizer and all associated metadata for a table.
Deletes a specified version of a table.
Deletes a specified trigger.
Deletes the Glue specified usage profile.
Deletes an existing function definition from the Data Catalog.
Deletes a workflow.
The DescribeConnectionType
API provides full details of the supported options
for a given connection type in Glue.
Provides details regarding the entity used with the connection type, with a description of the data model for each field in the selected entity.
Returns a list of inbound integrations for the specified integration.
The API is used to retrieve a list of integrations.
Retrieves the details of a blueprint.
Retrieves the details of a blueprint run.
Retrieves the details of blueprint runs for a specified blueprint.
The name of the Catalog to retrieve.
Retrieves the status of a migration operation.
Retrieves all catalogs defined in a catalog in the Glue Data Catalog.
Retrieve a classifier by name.
Lists all classifier objects in the Data Catalog.
Retrieves partition statistics of columns.
Retrieves table statistics of columns.
Get the associated metadata/information for a task run, given a task run ID.
Retrieves information about all runs associated with the specified table.
Gets settings for a column statistics task.
Retrieves a connection definition from the Data Catalog.
Retrieves a list of connection definitions from the Data Catalog.
Retrieves metadata for a specified crawler.
Retrieves metrics about specified crawlers.
Retrieves metadata for all crawlers defined in the customer account.
Retrieves the details of a custom pattern by specifying its name.
Retrieves the security configuration for a specified catalog.
Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
Retrieve a statistic's predictions for a given Profile ID.
Retrieves the result of a data quality rule evaluation.
Gets the specified recommendation run that was used to generate rules.
Returns an existing ruleset by identifier or name.
Retrieves a specific run where a ruleset is evaluated against a data source.
Retrieves the definition of a specified database.
Retrieves all databases defined in a given Data Catalog.
Transforms a Python script into a directed acyclic graph (DAG).
Retrieves information about a specified development endpoint.
Retrieves all the development endpoints in this Amazon Web Services account.
This API is used to query preview data from a given connection type or from a native Amazon S3 based Glue Data Catalog.
This API is used for fetching the ResourceProperty
of the Glue connection (for
the source) or Glue database ARN (for the target)
This API is used to retrieve optional override properties for the tables that need to be replicated.
Retrieves an existing job definition.
Returns information on a job bookmark entry.
Retrieves the metadata for a given job run.
Retrieves metadata for all runs of a given job definition.
Retrieves all current job definitions.
Creates mappings.
Gets details for a specific task run on a machine learning transform.
Gets a list of runs for a machine learning transform.
Gets an Glue machine learning transform artifact and all its corresponding metadata.
Gets a sortable, filterable list of existing Glue machine learning transforms.
Retrieves information about a specified partition.
Retrieves the partition indexes associated with a table.
Retrieves information about the partitions in a table.
Gets code to perform a specified mapping.
Describes the specified registry in detail.
Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants.
Retrieves a specified resource policy.
Describes the specified schema in detail.
Retrieves a schema by the SchemaDefinition
.
Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
Retrieves a specified security configuration.
Retrieves a list of all security configurations.
Retrieves the session.
Retrieves the statement.
Retrieves the Table
definition in a Data Catalog for
a specified table.
Returns the configuration of all optimizers associated with a specified table.
Retrieves a specified version of a table.
Retrieves a list of strings that identify available versions of a specified table.
Retrieves the definitions of some or all of the tables in a given
Database
.
Retrieves a list of tags associated with a resource.
Retrieves the definition of a trigger.
Gets all the triggers associated with a job.
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
Retrieves information about the specified Glue usage profile.
Retrieves a specified function definition from the Data Catalog.
Retrieves multiple function definitions from the Data Catalog.
Retrieves resource metadata for a workflow.
Retrieves the metadata for a given workflow run.
Retrieves the workflow run properties which were set during the run.
Retrieves metadata for all runs of a given workflow.
Imports an existing Amazon Athena Data Catalog to Glue.
Lists all the blueprint names in an account.
List all task runs for a particular account.
The ListConnectionTypes
API provides a discovery mechanism to learn available
connection types in Glue.
Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag.
Returns all the crawls of a specified crawler.
Lists all the custom patterns that have been created.
Returns all data quality execution results for your account.
Lists the recommendation runs meeting the filter criteria.
Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
Returns a paginated list of rulesets for the specified list of Glue tables.
Retrieve annotations for a data quality statistic.
Retrieves a list of data quality statistics.
Retrieves the names of all DevEndpoint
resources in this Amazon Web Services
account, or the
resources with the specified tag.
Returns the available entities supported by the connection type.
Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag.
Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag.
Returns a list of registries that you have created, with minimal registry information.
Returns a list of schema versions that you have created, with minimal information.
Returns a list of schemas with minimal details.
Retrieve a list of sessions.
Lists statements for the session.
Lists the history of previous optimizer runs for a specific table.
Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag.
List all the Glue usage profiles.
Lists names of workflows created in the account.
Modifies a Zero-ETL integration in the caller's account.
Sets the security configuration for a specified catalog.
Annotate all datapoints for a Profile.
Sets the Data Catalog resource policy for access control.
Puts the metadata key value pair for a specified schema version ID.
Puts the specified workflow run properties for the given workflow run.
Queries for the schema version metadata information.
Adds a new version to the existing schema.
Removes a key value pair from the schema version metadata for the specified schema version ID.
Resets a bookmark entry.
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
Executes the statement.
Searches a set of tables based on properties in the table metadata as well as on the parent database.
Starts a new run of the specified blueprint.
Starts a column statistics task run, for a specified table and columns.
Starts a column statistics task run schedule.
Starts a crawl using the specified crawler, regardless of what is scheduled.
Changes the schedule state of the specified crawler to
SCHEDULED
, unless the crawler is already running or the
schedule state is already SCHEDULED
.
Starts a recommendation run that is used to generate rules when you don't know what rules to write.
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table).
Begins an asynchronous task to export all labeled data for a particular transform.
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
Starts a job run using a job definition.
Starts a task to estimate the quality of the transform.
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
Starts an existing trigger.
Starts a new run of the specified workflow.
Stops a task run for the specified table.
Stops a column statistics task run schedule.
If the specified crawler is running, stops the crawl.
Sets the schedule state of the specified crawler to
NOT_SCHEDULED
, but does not stop the crawler if it is
already running.
Stops the session.
Stops a specified trigger.
Stops the execution of the specified workflow run.
Adds tags to a resource.
Tests a connection to a service to validate the service credentials that you provide.
Removes tags from a resource.
Updates a registered blueprint.
Updates an existing catalog's properties in the Glue Data Catalog.
Modifies an existing classifier (a GrokClassifier
,
an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on
which field is present).
Creates or updates partition statistics of columns.
Creates or updates table statistics of columns.
Updates settings for a column statistics task.
Updates a connection definition in the Data Catalog.
Updates a crawler.
Updates the schedule of a crawler using a cron
expression.
Updates the specified data quality ruleset.
Updates an existing database definition in a Data Catalog.
Updates a specified development endpoint.
This API can be used for updating the ResourceProperty
of the Glue connection
(for the source) or Glue database ARN (for the target).
This API is used to provide optional override properties for the tables that need to be replicated.
Updates an existing job definition.
Synchronizes a job from the source control repository.
Updates an existing machine learning transform.
Updates a partition.
Updates an existing registry which is used to hold a collection of schemas.
Updates the description, compatibility setting, or version checkpoint for a schema set.
Synchronizes a job to the source control repository.
Updates a metadata table in the Data Catalog.
Updates the configuration for an existing table optimizer.
Updates a trigger definition.
Update an Glue usage profile.
Updates an existing function definition in the Data Catalog.
Updates an existing workflow.
Link to this section Functions
Creates one or more partitions in a batch operation.
Deletes a list of connection definitions from the Data Catalog.
Deletes one or more partitions in a batch operation.
Deletes multiple tables at once.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling
BatchDeleteTable
, use DeleteTableVersion
or
BatchDeleteTableVersion
, and DeletePartition
or
BatchDeletePartition
, to delete any resources that belong to the
table.
Deletes a specified batch of versions of a table.
Retrieves information about a list of blueprints.
Returns a list of resource metadata for a given list of crawler names.
After calling the ListCrawlers
operation, you can call this operation to
access the data to which you have been granted permissions. This operation
supports all IAM permissions, including permission conditions that uses tags.
Retrieves the details for the custom patterns specified by a list of names.
Retrieves a list of data quality results for the specified result IDs.
Returns a list of resource metadata for a given list of development endpoint names.
After
calling the ListDevEndpoints
operation, you can call this operation to access
the
data to which you have been granted permissions. This operation supports all IAM
permissions,
including permission conditions that uses tags.
Returns a list of resource metadata for a given list of job names.
After calling the ListJobs
operation, you can call this operation to access
the data to which you have been granted permissions. This operation supports all
IAM permissions, including permission conditions that uses tags.
Retrieves partitions in a batch request.
Returns the configuration for the specified table optimizers.
Returns a list of resource metadata for a given list of trigger names.
After calling the ListTriggers
operation, you can call this operation to
access the data to which you have been granted permissions. This operation
supports all IAM permissions, including permission conditions that uses tags.
Returns a list of resource metadata for a given list of workflow names.
After calling the ListWorkflows
operation, you can call this operation to
access the data to which you have been granted permissions. This operation
supports all IAM permissions, including permission conditions that uses tags.
batch_put_data_quality_statistic_annotation(client, input, options \\ [])
View SourceAnnotate datapoints over time for a specific data quality statistic.
Stops one or more job runs for a specified job definition.
Updates one or more partitions in a batch operation.
cancel_data_quality_rule_recommendation_run(client, input, options \\ [])
View SourceCancels the specified recommendation run that was being used to generate rules.
cancel_data_quality_ruleset_evaluation_run(client, input, options \\ [])
View SourceCancels a run where a ruleset is being evaluated against a data source.
Cancels (stops) a task run.
Machine learning task runs are asynchronous tasks that Glue runs on your behalf
as part of various machine learning workflows. You can cancel a
machine learning task run at any time by calling CancelMLTaskRun
with a task
run's parent transform's TransformID
and the task run's TaskRunId
.
Cancels the statement.
Validates the supplied schema.
This call has no side effects, it simply validates using the supplied schema
using DataFormat
as the format. Since it does not take a schema set name, no
compatibility checks are performed.
Registers a blueprint with Glue.
Creates a new catalog in the Glue Data Catalog.
Creates a classifier in the user's account.
This can be a GrokClassifier
, an
XMLClassifier
, a JsonClassifier
, or a CsvClassifier
,
depending on which field of the request is present.
create_column_statistics_task_settings(client, input, options \\ [])
View SourceCreates settings for a column statistics task.
Creates a connection definition in the Data Catalog.
Connections used for creating federated resources require the IAM
glue:PassConnection
permission.
Creates a new crawler with specified targets, role, configuration, and optional schedule.
At least one crawl target must be specified, in the s3Targets
field, the
jdbcTargets
field, or the DynamoDBTargets
field.
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
Each custom pattern you create specifies a regular expression and an optional list of context words. If no context words are passed only a regular expression is checked.
Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the Glue developer guide.
Creates a new database in a Data Catalog.
Creates a new development endpoint.
Creates a Zero-ETL integration in the caller's account between two resources
with Amazon Resource Names (ARNs): the SourceArn
and TargetArn
.
create_integration_resource_property(client, input, options \\ [])
View SourceThis API can be used for setting up the ResourceProperty
of the Glue
connection (for the source) or Glue database ARN (for the target).
These properties can include the role to access the connection or database. To
set both source and target properties the same API needs to be invoked with the
Glue connection ARN as ResourceArn
with SourceProcessingProperties
and the
Glue database ARN as ResourceArn
with TargetProcessingProperties
respectively.
This API is used to provide optional override properties for the the tables that need to be replicated.
These properties can include properties for filtering and partitioning for the
source and target tables. To set both source and target properties the same API
need to be invoked with the Glue connection ARN as ResourceArn
with
SourceTableConfig
, and the Glue database ARN as ResourceArn
with
TargetTableConfig
respectively.
Creates a new job definition.
Creates an Glue machine learning transform.
This operation creates the transform and all the necessary parameters to train it.
Call this operation as the first step in the process of using a machine learning
transform
(such as the FindMatches
transform) for deduplicating data. You can provide an
optional Description
, in addition to the parameters that you want to use for
your
algorithm.
You must also specify certain parameters for the tasks that Glue runs on your
behalf as part of learning from your data and creating a high-quality machine
learning
transform. These parameters include Role
, and optionally,
AllocatedCapacity
, Timeout
, and MaxRetries
. For more
information, see
Jobs.
Creates a new partition.
Creates a specified partition index in an existing table.
Creates a new registry which may be used to hold a collection of schemas.
Creates a new schema set and registers the schema definition.
Returns an error if the schema set already exists without actually registering the version.
When the schema set is created, a version checkpoint will be set to the first
version. Compatibility mode "DISABLED" restricts any additional schema versions
from being added after the first schema version. For all other compatibility
modes, validation of compatibility settings will be applied only from the second
version onwards when the RegisterSchemaVersion
API is used.
When this API is called without a RegistryId
, this will create an entry for a
"default-registry" in the registry database tables, if it is not already
present.
Transforms a directed acyclic graph (DAG) into code.
Creates a new security configuration.
A security configuration is a set of security properties that can be used by Glue. You can use a security configuration to encrypt data at rest. For information about using security configurations in Glue, see Encrypting Data Written by Crawlers, Jobs, and Development Endpoints.
Creates a new session.
Creates a new table definition in the Data Catalog.
Creates a new table optimizer for a specific function.
Creates a new trigger.
Creates an Glue usage profile.
Creates a new function definition in the Data Catalog.
Creates a new workflow.
Deletes an existing blueprint.
Removes the specified catalog from the Glue Data Catalog.
After completing this operation, you no longer have access to the databases, tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted catalog. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources before calling the
DeleteCatalog
operation, use DeleteTableVersion
(or
BatchDeleteTableVersion
), DeletePartition
(or BatchDeletePartition
),
DeleteTable
(or BatchDeleteTable
), DeleteUserDefinedFunction
and
DeleteDatabase
to delete any resources that belong to the catalog.
Removes a classifier from the Data Catalog.
delete_column_statistics_for_partition(client, input, options \\ [])
View SourceDelete the partition column statistics of a column.
The Identity and Access Management (IAM) permission required for this operation
is DeletePartition
.
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation
is DeleteTable
.
delete_column_statistics_task_settings(client, input, options \\ [])
View SourceDeletes settings for a column statistics task.
Deletes a connection from the Data Catalog.
Removes a specified crawler from the Glue Data Catalog, unless the crawler state
is
RUNNING
.
Deletes a custom pattern by specifying its name.
Deletes a data quality ruleset.
Removes a specified database from a Data Catalog.
After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling
DeleteDatabase
, use DeleteTableVersion
or
BatchDeleteTableVersion
, DeletePartition
or
BatchDeletePartition
, DeleteUserDefinedFunction
, and
DeleteTable
or BatchDeleteTable
, to delete any resources that
belong to the database.
Deletes a specified development endpoint.
Deletes the specified Zero-ETL integration.
Deletes the table properties that have been created for the tables that need to be replicated.
Deletes a specified job definition.
If the job definition is not found, no exception is thrown.
Deletes an Glue machine learning transform.
Machine learning transforms are a special
type of transform that use machine learning to learn the details of the
transformation to be
performed by learning from examples provided by humans. These transformations
are then saved
by Glue. If you no longer need a transform, you can delete it by calling
DeleteMLTransforms
. However, any Glue jobs that still reference the deleted
transform will no longer succeed.
Deletes a specified partition.
Deletes a specified partition index from an existing table.
Delete the entire registry including schema and all of its versions.
To get the status of the delete operation, you can call the GetRegistry
API
after the asynchronous call. Deleting a registry will deactivate all online
operations for the registry such as the UpdateRegistry
, CreateSchema
,
UpdateSchema
, and RegisterSchemaVersion
APIs.
Deletes a specified policy.
Deletes the entire schema set, including the schema set and all of its versions.
To get the status of the delete operation, you can call GetSchema
API after
the asynchronous call. Deleting a registry will deactivate all online operations
for the schema, such as the GetSchemaByDefinition
, and RegisterSchemaVersion
APIs.
Remove versions from the specified schema.
A version number or range may be supplied. If the compatibility mode forbids
deleting of a version that is necessary, such as BACKWARDS_FULL, an error is
returned. Calling the GetSchemaVersions
API after this call will list the
status of the deleted versions.
When the range of version numbers contain check pointed version, the API will
return a 409 conflict and will not proceed with the deletion. You have to remove
the checkpoint first using the DeleteSchemaCheckpoint
API before using this
API.
You cannot use the DeleteSchemaVersions
API to delete the first schema version
in the schema set. The first schema version can only be deleted by the
DeleteSchema
API. This operation will also delete the attached
SchemaVersionMetadata
under the schema versions. Hard deletes will be enforced
on the database.
If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.
Deletes a specified security configuration.
Deletes the session.
Removes a table definition from the Data Catalog.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling
DeleteTable
, use DeleteTableVersion
or
BatchDeleteTableVersion
, and DeletePartition
or
BatchDeletePartition
, to delete any resources that belong to the
table.
Deletes an optimizer and all associated metadata for a table.
The optimization will no longer be performed on the table.
Deletes a specified version of a table.
Deletes a specified trigger.
If the trigger is not found, no exception is thrown.
Deletes the Glue specified usage profile.
Deletes an existing function definition from the Data Catalog.
Deletes a workflow.
The DescribeConnectionType
API provides full details of the supported options
for a given connection type in Glue.
Provides details regarding the entity used with the connection type, with a description of the data model for each field in the selected entity.
The response includes all the fields which make up the entity.
Returns a list of inbound integrations for the specified integration.
The API is used to retrieve a list of integrations.
Retrieves the details of a blueprint.
Retrieves the details of a blueprint run.
Retrieves the details of blueprint runs for a specified blueprint.
The name of the Catalog to retrieve.
This should be all lowercase.
Retrieves the status of a migration operation.
Retrieves all catalogs defined in a catalog in the Glue Data Catalog.
For a Redshift-federated catalog use case, this operation returns the list of catalogs mapped to Redshift databases in the Redshift namespace catalog.
Retrieve a classifier by name.
Lists all classifier objects in the Data Catalog.
Retrieves partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation
is GetPartition
.
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation
is GetTable
.
Get the associated metadata/information for a task run, given a task run ID.
Retrieves information about all runs associated with the specified table.
Gets settings for a column statistics task.
Retrieves a connection definition from the Data Catalog.
Retrieves a list of connection definitions from the Data Catalog.
Retrieves metadata for a specified crawler.
Retrieves metrics about specified crawlers.
Retrieves metadata for all crawlers defined in the customer account.
Retrieves the details of a custom pattern by specifying its name.
get_data_catalog_encryption_settings(client, input, options \\ [])
View SourceRetrieves the security configuration for a specified catalog.
Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
Retrieve a statistic's predictions for a given Profile ID.
Retrieves the result of a data quality rule evaluation.
get_data_quality_rule_recommendation_run(client, input, options \\ [])
View SourceGets the specified recommendation run that was used to generate rules.
Returns an existing ruleset by identifier or name.
get_data_quality_ruleset_evaluation_run(client, input, options \\ [])
View SourceRetrieves a specific run where a ruleset is evaluated against a data source.
Retrieves the definition of a specified database.
Retrieves all databases defined in a given Data Catalog.
Transforms a Python script into a directed acyclic graph (DAG).
Retrieves information about a specified development endpoint.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address, and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
Retrieves all the development endpoints in this Amazon Web Services account.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
This API is used to query preview data from a given connection type or from a native Amazon S3 based Glue Data Catalog.
Returns records as an array of JSON blobs. Each record is formatted using
Jackson JsonNode based on the field type defined by the DescribeEntity
API.
Spark connectors generate schemas according to the same data type mapping as in
the DescribeEntity
API. Spark connectors convert data to the appropriate data
types matching the schema when returning rows.
This API is used for fetching the ResourceProperty
of the Glue connection (for
the source) or Glue database ARN (for the target)
This API is used to retrieve optional override properties for the tables that need to be replicated.
These properties can include properties for filtering and partition for source and target tables.
Retrieves an existing job definition.
Returns information on a job bookmark entry.
For more information about enabling and using job bookmarks, see:
*
Tracking processed data using job bookmarks
*
*
Retrieves the metadata for a given job run.
Job run history is accessible for 90 days for your workflow and job run.
Retrieves metadata for all runs of a given job definition.
Retrieves all current job definitions.
Creates mappings.
Gets details for a specific task run on a machine learning transform.
Machine learning
task runs are asynchronous tasks that Glue runs on your behalf as part of
various machine
learning workflows. You can check the stats of any task run by calling
GetMLTaskRun
with the TaskRunID
and its parent transform's
TransformID
.
Gets a list of runs for a machine learning transform.
Machine learning task runs are
asynchronous tasks that Glue runs on your behalf as part of various machine
learning
workflows. You can get a sortable, filterable list of machine learning task runs
by calling
GetMLTaskRuns
with their parent transform's TransformID
and other
optional parameters as documented in this section.
This operation returns a list of historic runs and must be paginated.
Gets an Glue machine learning transform artifact and all its corresponding metadata.
Machine learning transforms are a special type of transform that use machine
learning to learn
the details of the transformation to be performed by learning from examples
provided by
humans. These transformations are then saved by Glue. You can retrieve their
metadata by
calling GetMLTransform
.
Gets a sortable, filterable list of existing Glue machine learning transforms.
Machine
learning transforms are a special type of transform that use machine learning to
learn the
details of the transformation to be performed by learning from examples provided
by humans.
These transformations are then saved by Glue, and you can retrieve their
metadata by
calling GetMLTransforms
.
Retrieves information about a specified partition.
Retrieves the partition indexes associated with a table.
Retrieves information about the partitions in a table.
Gets code to perform a specified mapping.
Describes the specified registry in detail.
Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants.
Also retrieves the Data Catalog resource policy.
If you enabled metadata encryption in Data Catalog settings, and you do not have permission on the KMS key, the operation can't return the Data Catalog resource policy.
Retrieves a specified resource policy.
Describes the specified schema in detail.
Retrieves a schema by the SchemaDefinition
.
The schema definition is sent to the Schema Registry, canonicalized, and hashed.
If the hash is matched within the scope of the SchemaName
or ARN (or the
default registry, if none is supplied), that schema’s metadata is returned.
Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted
statuses will not be included in the results.
Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
Schema versions in Deleted status will not be included in the results.
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
This API allows you to compare two schema versions between two schema definitions under the same schema.
Retrieves a specified security configuration.
Retrieves a list of all security configurations.
Retrieves the session.
Retrieves the statement.
Retrieves the Table
definition in a Data Catalog for
a specified table.
Returns the configuration of all optimizers associated with a specified table.
Retrieves a specified version of a table.
Retrieves a list of strings that identify available versions of a specified table.
Retrieves the definitions of some or all of the tables in a given
Database
.
Retrieves a list of tags associated with a resource.
Retrieves the definition of a trigger.
Gets all the triggers associated with a job.
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
For IAM authorization, the public IAM action associated with this API is
glue:GetPartition
.
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
For IAM authorization, the public IAM action associated with this API is
glue:GetPartitions
.
Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
For IAM authorization, the public IAM action associated with this API is
glue:GetTable
.
Retrieves information about the specified Glue usage profile.
Retrieves a specified function definition from the Data Catalog.
Retrieves multiple function definitions from the Data Catalog.
Retrieves resource metadata for a workflow.
Retrieves the metadata for a given workflow run.
Job run history is accessible for 90 days for your workflow and job run.
Retrieves the workflow run properties which were set during the run.
Retrieves metadata for all runs of a given workflow.
Imports an existing Amazon Athena Data Catalog to Glue.
Lists all the blueprint names in an account.
List all task runs for a particular account.
The ListConnectionTypes
API provides a discovery mechanism to learn available
connection types in Glue.
The response contains a list of connection types with high-level details of what
is supported for each connection type. The connection types listed are the set
of supported options for the ConnectionType
value in the CreateConnection
API.
Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag.
This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on
the response so that tagged resources can be retrieved as a group. If you choose
to use tags
filtering, only resources with the tag are retrieved.
Returns all the crawls of a specified crawler.
Returns only the crawls that have occurred since the launch date of the crawler history feature, and only retains up to 12 months of crawls. Older crawls will not be returned.
You may use this API to:
* Retrive all the crawls of a specified crawler.
* Retrieve all the crawls of a specified crawler within a limited count.
* Retrieve all the crawls of a specified crawler in a specific time range.
* Retrieve all the crawls of a specified crawler with a particular state, crawl ID, or DPU hour value.
Lists all the custom patterns that have been created.
Returns all data quality execution results for your account.
list_data_quality_rule_recommendation_runs(client, input, options \\ [])
View SourceLists the recommendation runs meeting the filter criteria.
list_data_quality_ruleset_evaluation_runs(client, input, options \\ [])
View SourceLists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
Returns a paginated list of rulesets for the specified list of Glue tables.
list_data_quality_statistic_annotations(client, input, options \\ [])
View SourceRetrieve annotations for a data quality statistic.
Retrieves a list of data quality statistics.
Retrieves the names of all DevEndpoint
resources in this Amazon Web Services
account, or the
resources with the specified tag.
This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on
the response so that tagged resources can be retrieved as a group. If you choose
to use tags
filtering, only resources with the tag are retrieved.
Returns the available entities supported by the connection type.
Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag.
This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on
the response so that tagged resources can be retrieved as a group. If you choose
to use tags
filtering, only resources with the tag are retrieved.
Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag.
This operation takes the optional Tags
field, which you can use as
a filter of the responses so that tagged resources can be retrieved as a group.
If you choose to use tag
filtering, only resources with the tags are retrieved.
Returns a list of registries that you have created, with minimal registry information.
Registries in the Deleting
status will not be included in the results. Empty
results will be returned if there are no registries available.
Returns a list of schema versions that you have created, with minimal information.
Schema versions in Deleted status will not be included in the results. Empty results will be returned if there are no schema versions available.
Returns a list of schemas with minimal details.
Schemas in Deleting status will not be included in the results. Empty results will be returned if there are no schemas available.
When the RegistryId
is not provided, all the schemas across registries will be
part of the API response.
Retrieve a list of sessions.
Lists statements for the session.
Lists the history of previous optimizer runs for a specific table.
Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag.
This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on
the response so that tagged resources can be retrieved as a group. If you choose
to use tags
filtering, only resources with the tag are retrieved.
List all the Glue usage profiles.
Lists names of workflows created in the account.
Modifies a Zero-ETL integration in the caller's account.
put_data_catalog_encryption_settings(client, input, options \\ [])
View SourceSets the security configuration for a specified catalog.
After the configuration has been set, the specified encryption is applied to every catalog write thereafter.
Annotate all datapoints for a Profile.
Sets the Data Catalog resource policy for access control.
Puts the metadata key value pair for a specified schema version ID.
A maximum of 10 key value pairs will be allowed per schema version. They can be added over one or more calls.
Puts the specified workflow run properties for the given workflow run.
If a property already exists for the specified run, then it overrides the value otherwise adds the property to existing properties.
Queries for the schema version metadata information.
Adds a new version to the existing schema.
Returns an error if new version of schema does not meet the compatibility requirements of the schema set. This API will not create a new schema set and will return a 404 error if the schema set is not already present in the Schema Registry.
If this is the first schema definition to be registered in the Schema Registry,
this API will store the schema version and return immediately. Otherwise, this
call has the potential to run longer than other operations due to compatibility
modes. You can call the GetSchemaVersion
API with the SchemaVersionId
to
check compatibility modes.
If the same schema definition is already stored in Schema Registry as a version, the schema ID of the existing schema is returned to the caller.
Removes a key value pair from the schema version metadata for the specified schema version ID.
Resets a bookmark entry.
For more information about enabling and using job bookmarks, see:
*
Tracking processed data using job bookmarks
*
*
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
The selected nodes and all nodes that are downstream from the selected nodes are run.
Executes the statement.
Searches a set of tables based on properties in the table metadata as well as on the parent database.
You can search against text or filter conditions.
You can only get tables that you have access to based on the security policies defined in Lake Formation. You need at least a read-only access to the table for it to be returned. If you do not have access to all the columns in the table, these columns will not be searched against when returning the list of tables back to you. If you have access to the columns but not the data in the columns, those columns and the associated metadata for those columns will be included in the search.
Starts a new run of the specified blueprint.
Starts a column statistics task run, for a specified table and columns.
start_column_statistics_task_run_schedule(client, input, options \\ [])
View SourceStarts a column statistics task run schedule.
Starts a crawl using the specified crawler, regardless of what is scheduled.
If the crawler is already running, returns a CrawlerRunningException.
Changes the schedule state of the specified crawler to
SCHEDULED
, unless the crawler is already running or the
schedule state is already SCHEDULED
.
start_data_quality_rule_recommendation_run(client, input, options \\ [])
View SourceStarts a recommendation run that is used to generate rules when you don't know what rules to write.
Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.
Recommendation runs are automatically deleted after 90 days.
start_data_quality_ruleset_evaluation_run(client, input, options \\ [])
View SourceOnce you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table).
The evaluation computes results which you can retrieve with the
GetDataQualityResult
API.
Begins an asynchronous task to export all labeled data for a particular transform.
This
task is the only label-related API call that is not part of the typical active
learning
workflow. You typically use StartExportLabelsTaskRun
when you want to work
with
all of your existing labels at the same time, such as when you want to remove or
change labels
that were previously submitted as truth. This API operation accepts the
TransformId
whose labels you want to export and an Amazon Simple Storage
Service (Amazon S3) path to export the labels to. The operation returns a
TaskRunId
. You can check on the status of your task run by calling the
GetMLTaskRun
API.
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
This API operation is generally used as
part of the active learning workflow that starts with the
StartMLLabelingSetGenerationTaskRun
call and that ultimately results in
improving the quality of your machine learning transform.
After the StartMLLabelingSetGenerationTaskRun
finishes, Glue machine learning
will have generated a series of questions for humans to answer. (Answering these
questions is
often called 'labeling' in the machine learning workflows). In the case of the
FindMatches
transform, these questions are of the form, “What is the correct
way to group these rows together into groups composed entirely of matching
records?” After the
labeling process is finished, users upload their answers/labels with a call to
StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes,
all future runs of the machine learning transform use the new and improved
labels and perform
a higher-quality transformation.
By default, StartMLLabelingSetGenerationTaskRun
continually learns from and
combines all labels that you upload unless you set Replace
to true. If you set
Replace
to true, StartImportLabelsTaskRun
deletes and forgets all
previously uploaded labels and learns only from the exact set that you upload.
Replacing
labels can be helpful if you realize that you previously uploaded incorrect
labels, and you
believe that they are having a negative effect on your transform quality.
You can check on the status of your task run by calling the GetMLTaskRun
operation.
Starts a job run using a job definition.
Starts a task to estimate the quality of the transform.
When you provide label sets as examples of truth, Glue machine learning uses some of those examples to learn from them. The rest of the labels are used as a test to estimate quality.
Returns a unique identifier for the run. You can call GetMLTaskRun
to get more
information about the stats of the EvaluationTaskRun
.
start_ml_labeling_set_generation_task_run(client, input, options \\ [])
View SourceStarts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
When the StartMLLabelingSetGenerationTaskRun
finishes, Glue will have
generated a "labeling set" or a set of questions for humans to answer.
In the case of the FindMatches
transform, these questions are of the form,
“What is the correct way to group these rows together into groups composed
entirely of
matching records?”
After the labeling process is finished, you can upload your labels with a call
to
StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes,
all future runs of the machine learning transform will use the new and improved
labels and
perform a higher-quality transformation.
Starts an existing trigger.
See Triggering Jobs for information about how different types of trigger are started.
Starts a new run of the specified workflow.
Stops a task run for the specified table.
stop_column_statistics_task_run_schedule(client, input, options \\ [])
View SourceStops a column statistics task run schedule.
If the specified crawler is running, stops the crawl.
Sets the schedule state of the specified crawler to
NOT_SCHEDULED
, but does not stop the crawler if it is
already running.
Stops the session.
Stops a specified trigger.
Stops the execution of the specified workflow run.
Adds tags to a resource.
A tag is a label you can assign to an Amazon Web Services resource. In Glue, you can tag only certain resources. For information about what resources you can tag, see Amazon Web Services Tags in Glue.
Tests a connection to a service to validate the service credentials that you provide.
You can either provide an existing connection name or a TestConnectionInput
for testing a non-existing connection input. Providing both at the same time
will cause an error.
If the action is successful, the service sends back an HTTP 200 response.
Removes tags from a resource.
Updates a registered blueprint.
Updates an existing catalog's properties in the Glue Data Catalog.
Modifies an existing classifier (a GrokClassifier
,
an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on
which field is present).
update_column_statistics_for_partition(client, input, options \\ [])
View SourceCreates or updates partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation
is UpdatePartition
.
Creates or updates table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation
is UpdateTable
.
update_column_statistics_task_settings(client, input, options \\ [])
View SourceUpdates settings for a column statistics task.
Updates a connection definition in the Data Catalog.
Updates a crawler.
If a crawler is
running, you must stop it using StopCrawler
before updating
it.
Updates the schedule of a crawler using a cron
expression.
Updates the specified data quality ruleset.
Updates an existing database definition in a Data Catalog.
Updates a specified development endpoint.
update_integration_resource_property(client, input, options \\ [])
View SourceThis API can be used for updating the ResourceProperty
of the Glue connection
(for the source) or Glue database ARN (for the target).
These properties can include the role to access the connection or database. Since the same resource can be used across multiple integrations, updating resource properties will impact all the integrations using it.
This API is used to provide optional override properties for the tables that need to be replicated.
These properties can include properties for filtering and partitioning for the
source and target tables. To set both source and target properties the same API
need to be invoked with the Glue connection ARN as ResourceArn
with
SourceTableConfig
, and the Glue database ARN as ResourceArn
with
TargetTableConfig
respectively.
The override will be reflected across all the integrations using same
ResourceArn
and source table.
Updates an existing job definition.
The previous job definition is completely overwritten by this information.
Synchronizes a job from the source control repository.
This operation takes the job artifacts that are located in the remote repository and updates the Glue internal stores with these artifacts.
This API supports optional parameters which take in the repository information.
Updates an existing machine learning transform.
Call this operation to tune the algorithm parameters to achieve better results.
After calling this operation, you can call the StartMLEvaluationTaskRun
operation to assess how well your new parameters achieved your goals (such as
improving the
quality of your machine learning transform, or making it more cost-effective).
Updates a partition.
Updates an existing registry which is used to hold a collection of schemas.
The updated properties relate to the registry, and do not modify any of the schemas within the registry.
Updates the description, compatibility setting, or version checkpoint for a schema set.
For updating the compatibility setting, the call will not validate compatibility
for the entire set of schema versions with the new compatibility setting. If the
value for Compatibility
is provided, the VersionNumber
(a checkpoint) is
also required. The API will validate the checkpoint version number for
consistency.
If the value for the VersionNumber
(checkpoint) is provided, Compatibility
is optional and this can be used to set/reset a checkpoint for the schema.
This update will happen only if the schema is in the AVAILABLE state.
Synchronizes a job to the source control repository.
This operation takes the job artifacts from the Glue internal stores and makes a commit to the remote repository that is configured on the job.
This API supports optional parameters which take in the repository information.
Updates a metadata table in the Data Catalog.
Updates the configuration for an existing table optimizer.
Updates a trigger definition.
Update an Glue usage profile.
Updates an existing function definition in the Data Catalog.
Updates an existing workflow.