Extract, transform, load (ETL) is the data integration process for loading information from one or more source databases into a data warehouse or a target database.
It consists of three functions or stages:
• Extract: In this stage, data is read and extracted from the source database into a staging area.
• Transform: Here, the raw data is checked, validated for any data integrity issues, and transformed so that it matches the data warehouse or target database schema.
• Load: Finally, the transformed data is loaded into the data warehouse or target database.
AWS Glue: Functionality and Feature
The main features of AWS Glue include:
Serverless computing:
AWS Glue is a serverless offering. Serveless means users don’t have to manually designate a server to run it. Whenever the user wants to use AWS Glue service or functionality, Amazon spins up a server for its use and then shuts it down when it’s no longer in use. This automatic provisioning avoids scaling the infrastructure or the task of managing.
Apache Spark: AWS Glue is based on the Apache Spark analytics engine for big data processing. However, the service also allows users to create scripts in Scala and Python.
Easy development: AWS Glue has access to “developer endpoints”: environments in which users can develop and test your AWS Glue scripts for the users who have decided to manually write their ETL code.
AWS Glue Data Catalog: The AWS Glue Data Catalog is a metadata repository that stores information about sources and all of the user’s data stores giving the user more visibility into data assets regardless of location.
Job scheduling: AWS Glue makes the task of scheduling easier by allowing the user to start jobs based on a schedule, an event, or completely on-demand.
Some of the most recent AWS Glue updates are as follows:
• From June 2019, Support for Python 3.6 in Python shell jobs
• From May 2019, Support for connecting directly to AWS Glue via a virtual private cloud (VPC) endpoint.
• From May 2019, Support for real-time, continuous logging for AWS Glue jobs with Apache Spark
• From March 2019, Support for custom CSV classifiers to infer the schema of CSV data
Who uses AWS Glue?
Companies reportedly use AWS Glue in their tech stacks are as follow:
• Tessian
• iOLAP
• Postmates
• Chime
• Depop
• SparkPost
• Bizongo
AWS Glue Integrations
Below is a list of tools that integrate with AWS Glue.
• MySQL
• Amazon S3
• Amazon RDS
• Microsoft SQL
• Oracle
• Amazon Redshift
• Amazon RDS
• Amazon EMR
AWS Glue Alternatives
Below are alternatives to Amazon Glue:
• AWS Data Pipeline
• Airflow
• Apache Spark
• Talend
• Alooma
Features:
Focus: Data catalog, ETL
Database replication: Full table; incremental via change data capture through AWS Database Migration Service (DMS)
SaaS sources: None
The ability for customers to add new data sources: Developers can write custom Python or Scala code and import custom libraries and Jar files into Glue ETL jobs to access data sources not natively supported by AWS Glue.
Connects to data warehouses? Data lakes? Yes / Yes
Transparent pricing yes
G2 customer satisfaction 4.1/5
Support SLAs Available
Purchase process Options for self-service and talking with sales
Compliance, governance, and security certifications HIPAA, GDPR
Data sharing Yes, within AWS
Vendor lock-inAWS Glue is strongly tied to the AWS platform. Usage is billed monthly.

Author's Bio: 

Mr. Gowrishankar
Web Development and Python Trainer
12+ Years Experience in Web Development and Python