SlideShare a Scribd company logo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2 3 A u g u s t , 2 0 2 2
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplify data integration Using AWS Glue
Nico Anandito
Analytics Specialist Solutions Architect
Amazon Web Services
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
1. AWS Glue introduction
2. AWS Glue to simplify data integration
• Ingest
• Transform
• Operationalize
3. Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
Applications
R E A L - T I M E / S L A
S E N S I T I V E
H I G H L Y S C A L A B L E
P R I C E P E R F O R M A N C E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration platform trends
Tools for all personas
Scalable Infrastructure Open Standards
Low cost
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
D A T A I N T E G R A T I O N I N B A T C H
A N D R E A L T I M E
P E R F O R M A N T A N D
C O S T - E F F E C T I V E
C E N T R A L I Z E D C A T A L O G A N D
G O V E R N A N C E
Amazon
DynamoDB
Amazon
SageMaker
Amazon
Redshift
Amazon
OpenSearch
Service
Amazon
EMR
Amazon
S3
Amazon
Aurora
Amazon
Athena
T O O L S F O R D I V E R S E S K I L L S E T S
Modern Data Architecture with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Serverless
Data Integration for
complex workloads
Serverless
No infrastructure to maintain. Allocate needed compute power and run jobs
Cost-effective
All-in-one pricing model is 55% cheaper than other cloud data integration
solutions
Handles complex workloads
Connect to 65+ data sources, process petabytes of data in real-time, includes
batch and event driven modes
No lock-in
Develop data integration pipelines in open source SparkSQL, PySpark, Python,
Scala
Data Integration for every user
Development environments catered to different skillsets - visual ETL development for
Data Engineers, notebook styled development for Data Scientists, and no code
development for Data Analysts
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Globe Telecom Develops A 360-Degree Customer View on AWS
Building a robust subscriber profile for more than 90 millions customers using AWS Glue
Can onboard 40 times more user
attributes a month
High platform availability
Integrates easily with downstream
applications
“Now, more than ever, multiple downstream applications
and analytical functions have access to real-time behavioral
data, placing us in a stronger position to deliver more
relevant and meaningful interactions with each of our
customers. We can personalize engagements from
messaging, real-time offers, to product bundles and more,”
Derick Adil
Director, Asset Delivery and Domain Integration, Globe Telecom
Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OLTP ERP CRM
Data
Warehouse Data
Lake
10011000010010101110
01010101110010101000
01011111011010
0011110010110010110
0100011000010
Devices Web Sensors
Automated schema discovery and management
Transactional
systems
Structured and Semi-Structured
discovery (Glue Crawlers)
No movement of data = Low
Costs/Admin
All metadata centrally available for
search and query = Productivity
Automate data discovery = Productivity
Unify structured, semi-structured data
= Speed to Insight
Machine
Learning
DW
Queries
Big Data
processing
Interactive Real-time
Business
Intelligence
Data Catalog
Unified Data Catalog with automated schema discovery
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-premises DBs
Proprietary stores
SaaS applications
Data Sources
CUSTOM CONNECTOR
• No additional cost for
connecting to sources
• Flexible and easy to
build connectors
Custom Connectors with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Connectors Marketplace
+ Many more…
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Execution Engine
Cost effective
job starts in seconds
Reduced job latencies
enabling micro-batching
Serverless Apache Spark and
Python environment
Per second billing with a
1-minute minimum billing
Fast and predictable
Diverse workloads
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio
V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G
Monitor thousands of jobs through a single
pane of glass
Advanced transforms though code snippets
Support for AWS Marketplace and custom
connectors
Preview your data at each step of the visual job
authoring process
Real-time schema inference without having to
catalog
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio Notebook
Interactive AWS Glue
jobs development
Submit AWS Glue jobs from the AWS
Glue Studio notebook
Use notebook magic to define
transforms in SQL and control cost
Built-in monitoring support
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Interactive Sessions
Time to first query = 10-15 minutes
AWS Glue Interactive Sessions
Steps Task Time required
1
Connect notebook to
Sessions API
In seconds
Time to first query ~ 1 min
Development
tool of your
choice
Rapid
development
Built-in cost
control
Existing options
High cost of a long-running cluster
“ oisy eighbor” problem
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with PII detection and remediation
Size of the dataset Location of PII Accuracy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue PII Detection and remediation
T H R E E S I M P L E S T E P S
Type of Scan
1
Full Scan
Sample Scan
Remediation
3
Store results
Redact/mask
results
Entities to detect
2
Built-in Entities
(e.g. SSN, passport)
Custom Entities
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
Without Autoscaling With Autoscaling
Job execution timeline
List operation
Wide transform
Uneven distribution of data
partitions
AWS Glue job End
Start
Potential
savings
Glue Auto-scaling
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Streaming ETL
Process stream data & make it queryable in seconds
Join streams against each other or static data
Automatic updates to the AWS Glue Data Catalog
Dozens of supported data targets
Simplify your architecture with one service
for streaming and batch data integration
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring dashboard to check job status
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Orchestrate Glue jobs and other AWS services
Schedule jobs or trigger based on events
Monitor execution of the workflows in one place
Orchestrate jobs easily with AWS Glue workflows
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Glue APIs to build CI/CD pipeline
BOTO3 Endpoints to automate CI/CD pipeline
Automate to save development hours
Deploy jobs faster without any manual intervention
Manage Data Catalog though code snippets
AWS Cloud
Data Engineers
AWS CodePipeline
AWS CodeCommit/Git
AWS Lambda
AWS Glue Job
commit
deploy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo Architecture
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
No-code to advanced data use cases
Process petabytes of data both in batch and real- time using Apache Spark
Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs
Catalog data assets to make them available to AWS Analytics Services
AWS Glue to simplify data integration in the cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A modern data strategy can help you manage, act on, and react to your data so you can make
better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources
today.
• Harness data to reinvent your organization
• In unpredictable times, a data strategy is key
• Make data a strategic asset
• Rewiring your culture to be data-driven
• Put your data to work with a modern analytics approach
• … and more!
Visit the AWS Data resource hub
tinyurl.com/aws-data-hub-id
Visit resource hub
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Training and Certification for Data and Analytics
Discover how to harness
data, one of the world’s
most valuable resources,
and innovate at scale.
This learning plan expose you to
the fastest way to get answers
from all your data to all your users.
It can also help prepare you for the
AWS Certified Data Analytics -
Specialty certification exam.
Earning AWS Certified Data
Analytics – Specialty
validates expertise in using
AWS data lakes and analytics
services.
AWS Data & Analytics
FREE Training Resources
AWS Data Analytics
Learning Plan
AWS Certified Data
Analytics - Specialty
https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR
https://bit.ly/3wBVjD1
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you for attending AWS Innovate – Data Edition
We hope you found it interesting! A kind reminder to complete the survey.
Let us know what you thought of today’s event and how we can improve the event
experience for you in the future.
aws-apj-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Color palette feedback
E X A M P L E S
0
0
0
255
255
255
40
40
40
254
143
1
222
72
230
240
130
112
229
230
255
233
127
147
64
104
138
191
112
213
Current palette
0
0
0
255
255
255
6
0
73
254
143
1
222
72
230
74
201
209
21
163
99
105
20
225
40
73
189
181
145
253
Recommended palette
(hyperlinks)
(hyperlinks)
Or whatever color is
predominant in the slide
background if a solidish
color is selected for
content slides
Text
Text
Text
Text
Text
Text
Text
Hyperlink
Usable on this
background color
Usable on this
background color
Usable on this
background color
242
244
244
242
244
244

More Related Content

Similar to Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
Lam Le
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS Cloud
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018
Amazon Web Services
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Denodo
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
Amazon Web Services
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Amazon Web Services
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Amazon Web Services
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
AWS Chicago
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
Bill Liu
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28
Amazon Web Services
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Amazon Web Services
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Amazon Web Services LATAM
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWS
Amazon Web Services
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022
CloudHesive
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
Amazon Web Services
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWS
Amazon Web Services
 
Single View of Data
Single View of DataSingle View of Data
Single View of Data
confluent
 

Similar to Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf (20)

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS Cloud
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWS
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWS
 
Single View of Data
Single View of DataSingle View of Data
Single View of Data
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

  • 1. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2 3 A u g u s t , 2 0 2 2
  • 2. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplify data integration Using AWS Glue Nico Anandito Analytics Specialist Solutions Architect Amazon Web Services
  • 3. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda 1. AWS Glue introduction 2. AWS Glue to simplify data integration • Ingest • Transform • Operationalize 3. Demo
  • 4. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E
  • 5. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S
  • 6. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S Applications R E A L - T I M E / S L A S E N S I T I V E H I G H L Y S C A L A B L E P R I C E P E R F O R M A N C E
  • 7. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration platform trends Tools for all personas Scalable Infrastructure Open Standards Low cost
  • 8. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. D A T A I N T E G R A T I O N I N B A T C H A N D R E A L T I M E P E R F O R M A N T A N D C O S T - E F F E C T I V E C E N T R A L I Z E D C A T A L O G A N D G O V E R N A N C E Amazon DynamoDB Amazon SageMaker Amazon Redshift Amazon OpenSearch Service Amazon EMR Amazon S3 Amazon Aurora Amazon Athena T O O L S F O R D I V E R S E S K I L L S E T S Modern Data Architecture with AWS Glue
  • 9. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Serverless Data Integration for complex workloads Serverless No infrastructure to maintain. Allocate needed compute power and run jobs Cost-effective All-in-one pricing model is 55% cheaper than other cloud data integration solutions Handles complex workloads Connect to 65+ data sources, process petabytes of data in real-time, includes batch and event driven modes No lock-in Develop data integration pipelines in open source SparkSQL, PySpark, Python, Scala Data Integration for every user Development environments catered to different skillsets - visual ETL development for Data Engineers, notebook styled development for Data Scientists, and no code development for Data Analysts
  • 10. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Globe Telecom Develops A 360-Degree Customer View on AWS Building a robust subscriber profile for more than 90 millions customers using AWS Glue Can onboard 40 times more user attributes a month High platform availability Integrates easily with downstream applications “Now, more than ever, multiple downstream applications and analytical functions have access to real-time behavioral data, placing us in a stronger position to deliver more relevant and meaningful interactions with each of our customers. We can personalize engagements from messaging, real-time offers, to product bundles and more,” Derick Adil Director, Asset Delivery and Domain Integration, Globe Telecom Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
  • 11. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 12. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. OLTP ERP CRM Data Warehouse Data Lake 10011000010010101110 01010101110010101000 01011111011010 0011110010110010110 0100011000010 Devices Web Sensors Automated schema discovery and management Transactional systems Structured and Semi-Structured discovery (Glue Crawlers) No movement of data = Low Costs/Admin All metadata centrally available for search and query = Productivity Automate data discovery = Productivity Unify structured, semi-structured data = Speed to Insight Machine Learning DW Queries Big Data processing Interactive Real-time Business Intelligence Data Catalog Unified Data Catalog with automated schema discovery
  • 13. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. On-premises DBs Proprietary stores SaaS applications Data Sources CUSTOM CONNECTOR • No additional cost for connecting to sources • Flexible and easy to build connectors Custom Connectors with AWS Glue
  • 14. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Connectors Marketplace + Many more…
  • 15. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 16. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Execution Engine Cost effective job starts in seconds Reduced job latencies enabling micro-batching Serverless Apache Spark and Python environment Per second billing with a 1-minute minimum billing Fast and predictable Diverse workloads
  • 17. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G Monitor thousands of jobs through a single pane of glass Advanced transforms though code snippets Support for AWS Marketplace and custom connectors Preview your data at each step of the visual job authoring process Real-time schema inference without having to catalog
  • 18. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio Notebook Interactive AWS Glue jobs development Submit AWS Glue jobs from the AWS Glue Studio notebook Use notebook magic to define transforms in SQL and control cost Built-in monitoring support New
  • 19. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Interactive Sessions Time to first query = 10-15 minutes AWS Glue Interactive Sessions Steps Task Time required 1 Connect notebook to Sessions API In seconds Time to first query ~ 1 min Development tool of your choice Rapid development Built-in cost control Existing options High cost of a long-running cluster “ oisy eighbor” problem New
  • 20. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with PII detection and remediation Size of the dataset Location of PII Accuracy
  • 21. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue PII Detection and remediation T H R E E S I M P L E S T E P S Type of Scan 1 Full Scan Sample Scan Remediation 3 Store results Redact/mask results Entities to detect 2 Built-in Entities (e.g. SSN, passport) Custom Entities New
  • 22. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 Without Autoscaling With Autoscaling Job execution timeline List operation Wide transform Uneven distribution of data partitions AWS Glue job End Start Potential savings Glue Auto-scaling New
  • 23. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Streaming ETL Process stream data & make it queryable in seconds Join streams against each other or static data Automatic updates to the AWS Glue Data Catalog Dozens of supported data targets Simplify your architecture with one service for streaming and batch data integration
  • 24. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 25. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring dashboard to check job status
  • 26. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Orchestrate Glue jobs and other AWS services Schedule jobs or trigger based on events Monitor execution of the workflows in one place Orchestrate jobs easily with AWS Glue workflows
  • 27. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glue APIs to build CI/CD pipeline BOTO3 Endpoints to automate CI/CD pipeline Automate to save development hours Deploy jobs faster without any manual intervention Manage Data Catalog though code snippets AWS Cloud Data Engineers AWS CodePipeline AWS CodeCommit/Git AWS Lambda AWS Glue Job commit deploy
  • 28. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 29. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo Architecture
  • 30. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary No-code to advanced data use cases Process petabytes of data both in batch and real- time using Apache Spark Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs Catalog data assets to make them available to AWS Analytics Services AWS Glue to simplify data integration in the cloud
  • 31. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. A modern data strategy can help you manage, act on, and react to your data so you can make better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources today. • Harness data to reinvent your organization • In unpredictable times, a data strategy is key • Make data a strategic asset • Rewiring your culture to be data-driven • Put your data to work with a modern analytics approach • … and more! Visit the AWS Data resource hub tinyurl.com/aws-data-hub-id Visit resource hub
  • 32. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Training and Certification for Data and Analytics Discover how to harness data, one of the world’s most valuable resources, and innovate at scale. This learning plan expose you to the fastest way to get answers from all your data to all your users. It can also help prepare you for the AWS Certified Data Analytics - Specialty certification exam. Earning AWS Certified Data Analytics – Specialty validates expertise in using AWS data lakes and analytics services. AWS Data & Analytics FREE Training Resources AWS Data Analytics Learning Plan AWS Certified Data Analytics - Specialty https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR https://bit.ly/3wBVjD1
  • 33. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you for attending AWS Innovate – Data Edition We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future. aws-apj-marketing@amazon.com twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws
  • 34. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Color palette feedback E X A M P L E S 0 0 0 255 255 255 40 40 40 254 143 1 222 72 230 240 130 112 229 230 255 233 127 147 64 104 138 191 112 213 Current palette 0 0 0 255 255 255 6 0 73 254 143 1 222 72 230 74 201 209 21 163 99 105 20 225 40 73 189 181 145 253 Recommended palette (hyperlinks) (hyperlinks) Or whatever color is predominant in the slide background if a solidish color is selected for content slides Text Text Text Text Text Text Text Hyperlink Usable on this background color Usable on this background color Usable on this background color 242 244 244 242 244 244