2. Big Data Analytics using Glue
Sridevi Murugayen
Senior Cloud Architect
AWS User Group - Chennai
Agilisium
3. What is Big Data
Big Data
Volume
Velocity
Variety
value
Veracity
4. What is Analytics
Discovery, interpretation, and communication of meaningful patterns in data;
The process of applying those patterns towards effective decision making.
What happened? Why did it happen? What will happen What should I do next?
Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics
5. * This is only a representative image. It does not include all services and all scenarios.
Easy, fast and cost-effective way
to process vast amounts of data
Reference Architecture
6. Glue
Glue is a fully managed, serverless ETL service to prepare and load data for analytics
Also provides centralized metadata repository using Glue Catalog
Use AWS Glue
• to build a data warehouse to organize, cleanse, validate, and format data
• to run serverless queries against your Amazon S3 data lake
• to create event-driven ETL pipelines with AWS Glue
• to understand your data assets
21. Pricing
ETL Job:
• $0.44 per DPU-Hour, billed per second, with 10-minute minimum for each ETL job of type Apache Spark
• $0.44 per DPU-Hour, billed per second, with 1-minute minimum for each ETL job of type Python shell
• $0.44 per DPU-Hour, billed per second, with 10-minute minimum for each provisioned development endpoint
Crawler:
• $0.44 per DPU-Hour, billed per second, with a 10-minute minimum per crawler run
Storage:
• Free for the first million objects stored
• $1 per 100,000 objects stored above 1M, per month
Requests:
• Free for the first million requests per month
• $1 per million requests above 1M in a month