SlideShare a Scribd company logo
1 of 35
Download to read offline
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2 3 A u g u s t , 2 0 2 2
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplify data integration Using AWS Glue
Nico Anandito
Analytics Specialist Solutions Architect
Amazon Web Services
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
1. AWS Glue introduction
2. AWS Glue to simplify data integration
• Ingest
• Transform
• Operationalize
3. Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration is hard
Data
G R O W I N G
E X P O N E N T I A L L Y
F R O M N E W
S O U R C E S
I N C R E A S I N G L Y
D I V E R S E
Personas
N O O R L O W C O D E
D E V E L O P E R S
D A T A A N A L Y S T S A N D
D A T A S C I E N T I S T S
Applications
R E A L - T I M E / S L A
S E N S I T I V E
H I G H L Y S C A L A B L E
P R I C E P E R F O R M A N C E
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data integration platform trends
Tools for all personas
Scalable Infrastructure Open Standards
Low cost
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
D A T A I N T E G R A T I O N I N B A T C H
A N D R E A L T I M E
P E R F O R M A N T A N D
C O S T - E F F E C T I V E
C E N T R A L I Z E D C A T A L O G A N D
G O V E R N A N C E
Amazon
DynamoDB
Amazon
SageMaker
Amazon
Redshift
Amazon
OpenSearch
Service
Amazon
EMR
Amazon
S3
Amazon
Aurora
Amazon
Athena
T O O L S F O R D I V E R S E S K I L L S E T S
Modern Data Architecture with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Serverless
Data Integration for
complex workloads
Serverless
No infrastructure to maintain. Allocate needed compute power and run jobs
Cost-effective
All-in-one pricing model is 55% cheaper than other cloud data integration
solutions
Handles complex workloads
Connect to 65+ data sources, process petabytes of data in real-time, includes
batch and event driven modes
No lock-in
Develop data integration pipelines in open source SparkSQL, PySpark, Python,
Scala
Data Integration for every user
Development environments catered to different skillsets - visual ETL development for
Data Engineers, notebook styled development for Data Scientists, and no code
development for Data Analysts
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Globe Telecom Develops A 360-Degree Customer View on AWS
Building a robust subscriber profile for more than 90 millions customers using AWS Glue
Can onboard 40 times more user
attributes a month
High platform availability
Integrates easily with downstream
applications
“Now, more than ever, multiple downstream applications
and analytical functions have access to real-time behavioral
data, placing us in a stronger position to deliver more
relevant and meaningful interactions with each of our
customers. We can personalize engagements from
messaging, real-time offers, to product bundles and more,”
Derick Adil
Director, Asset Delivery and Domain Integration, Globe Telecom
Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OLTP ERP CRM
Data
Warehouse Data
Lake
10011000010010101110
01010101110010101000
01011111011010
0011110010110010110
0100011000010
Devices Web Sensors
Automated schema discovery and management
Transactional
systems
Structured and Semi-Structured
discovery (Glue Crawlers)
No movement of data = Low
Costs/Admin
All metadata centrally available for
search and query = Productivity
Automate data discovery = Productivity
Unify structured, semi-structured data
= Speed to Insight
Machine
Learning
DW
Queries
Big Data
processing
Interactive Real-time
Business
Intelligence
Data Catalog
Unified Data Catalog with automated schema discovery
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-premises DBs
Proprietary stores
SaaS applications
Data Sources
CUSTOM CONNECTOR
• No additional cost for
connecting to sources
• Flexible and easy to
build connectors
Custom Connectors with AWS Glue
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Connectors Marketplace
+ Many more…
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Execution Engine
Cost effective
job starts in seconds
Reduced job latencies
enabling micro-batching
Serverless Apache Spark and
Python environment
Per second billing with a
1-minute minimum billing
Fast and predictable
Diverse workloads
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio
V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G
Monitor thousands of jobs through a single
pane of glass
Advanced transforms though code snippets
Support for AWS Marketplace and custom
connectors
Preview your data at each step of the visual job
authoring process
Real-time schema inference without having to
catalog
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Studio Notebook
Interactive AWS Glue
jobs development
Submit AWS Glue jobs from the AWS
Glue Studio notebook
Use notebook magic to define
transforms in SQL and control cost
Built-in monitoring support
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Interactive Sessions
Time to first query = 10-15 minutes
AWS Glue Interactive Sessions
Steps Task Time required
1
Connect notebook to
Sessions API
In seconds
Time to first query ~ 1 min
Development
tool of your
choice
Rapid
development
Built-in cost
control
Existing options
High cost of a long-running cluster
“ oisy eighbor” problem
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges with PII detection and remediation
Size of the dataset Location of PII Accuracy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue PII Detection and remediation
T H R E E S I M P L E S T E P S
Type of Scan
1
Full Scan
Sample Scan
Remediation
3
Store results
Redact/mask
results
Entities to detect
2
Built-in Entities
(e.g. SSN, passport)
Custom Entities
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
Without Autoscaling With Autoscaling
Job execution timeline
List operation
Wide transform
Uneven distribution of data
partitions
AWS Glue job End
Start
Potential
savings
Glue Auto-scaling
New
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Streaming ETL
Process stream data & make it queryable in seconds
Join streams against each other or static data
Automatic updates to the AWS Glue Data Catalog
Dozens of supported data targets
Simplify your architecture with one service
for streaming and batch data integration
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion Transform Deploy
AWS Glue
Serverless Data Integration in the Cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring dashboard to check job status
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Orchestrate Glue jobs and other AWS services
Schedule jobs or trigger based on events
Monitor execution of the workflows in one place
Orchestrate jobs easily with AWS Glue workflows
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Glue APIs to build CI/CD pipeline
BOTO3 Endpoints to automate CI/CD pipeline
Automate to save development hours
Deploy jobs faster without any manual intervention
Manage Data Catalog though code snippets
AWS Cloud
Data Engineers
AWS CodePipeline
AWS CodeCommit/Git
AWS Lambda
AWS Glue Job
commit
deploy
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo Architecture
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
No-code to advanced data use cases
Process petabytes of data both in batch and real- time using Apache Spark
Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs
Catalog data assets to make them available to AWS Analytics Services
AWS Glue to simplify data integration in the cloud
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A modern data strategy can help you manage, act on, and react to your data so you can make
better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources
today.
• Harness data to reinvent your organization
• In unpredictable times, a data strategy is key
• Make data a strategic asset
• Rewiring your culture to be data-driven
• Put your data to work with a modern analytics approach
• … and more!
Visit the AWS Data resource hub
tinyurl.com/aws-data-hub-id
Visit resource hub
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Training and Certification for Data and Analytics
Discover how to harness
data, one of the world’s
most valuable resources,
and innovate at scale.
This learning plan expose you to
the fastest way to get answers
from all your data to all your users.
It can also help prepare you for the
AWS Certified Data Analytics -
Specialty certification exam.
Earning AWS Certified Data
Analytics – Specialty
validates expertise in using
AWS data lakes and analytics
services.
AWS Data & Analytics
FREE Training Resources
AWS Data Analytics
Learning Plan
AWS Certified Data
Analytics - Specialty
https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR
https://bit.ly/3wBVjD1
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you for attending AWS Innovate – Data Edition
We hope you found it interesting! A kind reminder to complete the survey.
Let us know what you thought of today’s event and how we can improve the event
experience for you in the future.
aws-apj-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Color palette feedback
E X A M P L E S
0
0
0
255
255
255
40
40
40
254
143
1
222
72
230
240
130
112
229
230
255
233
127
147
64
104
138
191
112
213
Current palette
0
0
0
255
255
255
6
0
73
254
143
1
222
72
230
74
201
209
21
163
99
105
20
225
40
73
189
181
145
253
Recommended palette
(hyperlinks)
(hyperlinks)
Or whatever color is
predominant in the slide
background if a solidish
color is selected for
content slides
Text
Text
Text
Text
Text
Text
Text
Hyperlink
Usable on this
background color
Usable on this
background color
Usable on this
background color
242
244
244
242
244
244

More Related Content

Similar to Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight OverviewLam Le
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Amazon Web Services
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceDenodo
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...Amazon Web Services
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Amazon Web Services
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Amazon Web Services
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28Amazon Web Services
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Amazon Web Services
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Amazon Web Services LATAM
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSAmazon Web Services
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022CloudHesive
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentAmazon Web Services
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWSAmazon Web Services
 
Single View of Data
Single View of DataSingle View of Data
Single View of Dataconfluent
 

Similar to Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf (20)

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS CloudAWS Enterprise Day | Journey to the AWS Cloud
AWS Enterprise Day | Journey to the AWS Cloud
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018Transforming Enterprise IT - Transformation Day Montreal 2018
Transforming Enterprise IT - Transformation Day Montreal 2018
 
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS MarketplaceBest Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
Best Practices for Cloud Migrations with Zero Disruption with AWS Marketplace
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
 
AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28AWS IoT - from Cloud to Edge | AWS Floor28
AWS IoT - from Cloud to Edge | AWS Floor28
 
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
Accelerate your journey to AI: IBM Cloud Pak for Data on AWS - DEM18-S - New ...
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWS
 
reInvent reCap 2022
reInvent reCap 2022reInvent reCap 2022
reInvent reCap 2022
 
Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
 
How Cardknox Migrated 1M+ Sensitive Records to AWS
 How Cardknox Migrated 1M+ Sensitive Records to AWS How Cardknox Migrated 1M+ Sensitive Records to AWS
How Cardknox Migrated 1M+ Sensitive Records to AWS
 
Single View of Data
Single View of DataSingle View of Data
Single View of Data
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Sederhanakan_integrasi_data_anda_dengan_AWS_Glue_handout.pdf

  • 1. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2 3 A u g u s t , 2 0 2 2
  • 2. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplify data integration Using AWS Glue Nico Anandito Analytics Specialist Solutions Architect Amazon Web Services
  • 3. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda 1. AWS Glue introduction 2. AWS Glue to simplify data integration • Ingest • Transform • Operationalize 3. Demo
  • 4. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E
  • 5. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S
  • 6. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration is hard Data G R O W I N G E X P O N E N T I A L L Y F R O M N E W S O U R C E S I N C R E A S I N G L Y D I V E R S E Personas N O O R L O W C O D E D E V E L O P E R S D A T A A N A L Y S T S A N D D A T A S C I E N T I S T S Applications R E A L - T I M E / S L A S E N S I T I V E H I G H L Y S C A L A B L E P R I C E P E R F O R M A N C E
  • 7. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data integration platform trends Tools for all personas Scalable Infrastructure Open Standards Low cost
  • 8. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. D A T A I N T E G R A T I O N I N B A T C H A N D R E A L T I M E P E R F O R M A N T A N D C O S T - E F F E C T I V E C E N T R A L I Z E D C A T A L O G A N D G O V E R N A N C E Amazon DynamoDB Amazon SageMaker Amazon Redshift Amazon OpenSearch Service Amazon EMR Amazon S3 Amazon Aurora Amazon Athena T O O L S F O R D I V E R S E S K I L L S E T S Modern Data Architecture with AWS Glue
  • 9. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Serverless Data Integration for complex workloads Serverless No infrastructure to maintain. Allocate needed compute power and run jobs Cost-effective All-in-one pricing model is 55% cheaper than other cloud data integration solutions Handles complex workloads Connect to 65+ data sources, process petabytes of data in real-time, includes batch and event driven modes No lock-in Develop data integration pipelines in open source SparkSQL, PySpark, Python, Scala Data Integration for every user Development environments catered to different skillsets - visual ETL development for Data Engineers, notebook styled development for Data Scientists, and no code development for Data Analysts
  • 10. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Globe Telecom Develops A 360-Degree Customer View on AWS Building a robust subscriber profile for more than 90 millions customers using AWS Glue Can onboard 40 times more user attributes a month High platform availability Integrates easily with downstream applications “Now, more than ever, multiple downstream applications and analytical functions have access to real-time behavioral data, placing us in a stronger position to deliver more relevant and meaningful interactions with each of our customers. We can personalize engagements from messaging, real-time offers, to product bundles and more,” Derick Adil Director, Asset Delivery and Domain Integration, Globe Telecom Read more: https://aws.amazon.com/solutions/case-studies/globe-telecom-cadenz/
  • 11. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 12. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. OLTP ERP CRM Data Warehouse Data Lake 10011000010010101110 01010101110010101000 01011111011010 0011110010110010110 0100011000010 Devices Web Sensors Automated schema discovery and management Transactional systems Structured and Semi-Structured discovery (Glue Crawlers) No movement of data = Low Costs/Admin All metadata centrally available for search and query = Productivity Automate data discovery = Productivity Unify structured, semi-structured data = Speed to Insight Machine Learning DW Queries Big Data processing Interactive Real-time Business Intelligence Data Catalog Unified Data Catalog with automated schema discovery
  • 13. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. On-premises DBs Proprietary stores SaaS applications Data Sources CUSTOM CONNECTOR • No additional cost for connecting to sources • Flexible and easy to build connectors Custom Connectors with AWS Glue
  • 14. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Connectors Marketplace + Many more…
  • 15. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 16. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Execution Engine Cost effective job starts in seconds Reduced job latencies enabling micro-batching Serverless Apache Spark and Python environment Per second billing with a 1-minute minimum billing Fast and predictable Diverse workloads
  • 17. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio V I S U A L J O B A U T H O R I N G A N D M O N I T O R I N G Monitor thousands of jobs through a single pane of glass Advanced transforms though code snippets Support for AWS Marketplace and custom connectors Preview your data at each step of the visual job authoring process Real-time schema inference without having to catalog
  • 18. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Studio Notebook Interactive AWS Glue jobs development Submit AWS Glue jobs from the AWS Glue Studio notebook Use notebook magic to define transforms in SQL and control cost Built-in monitoring support New
  • 19. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Interactive Sessions Time to first query = 10-15 minutes AWS Glue Interactive Sessions Steps Task Time required 1 Connect notebook to Sessions API In seconds Time to first query ~ 1 min Development tool of your choice Rapid development Built-in cost control Existing options High cost of a long-running cluster “ oisy eighbor” problem New
  • 20. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges with PII detection and remediation Size of the dataset Location of PII Accuracy
  • 21. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue PII Detection and remediation T H R E E S I M P L E S T E P S Type of Scan 1 Full Scan Sample Scan Remediation 3 Store results Redact/mask results Entities to detect 2 Built-in Entities (e.g. SSN, passport) Custom Entities New
  • 22. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 Without Autoscaling With Autoscaling Job execution timeline List operation Wide transform Uneven distribution of data partitions AWS Glue job End Start Potential savings Glue Auto-scaling New
  • 23. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Streaming ETL Process stream data & make it queryable in seconds Join streams against each other or static data Automatic updates to the AWS Glue Data Catalog Dozens of supported data targets Simplify your architecture with one service for streaming and batch data integration
  • 24. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingestion Transform Deploy AWS Glue Serverless Data Integration in the Cloud
  • 25. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring dashboard to check job status
  • 26. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Orchestrate Glue jobs and other AWS services Schedule jobs or trigger based on events Monitor execution of the workflows in one place Orchestrate jobs easily with AWS Glue workflows
  • 27. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glue APIs to build CI/CD pipeline BOTO3 Endpoints to automate CI/CD pipeline Automate to save development hours Deploy jobs faster without any manual intervention Manage Data Catalog though code snippets AWS Cloud Data Engineers AWS CodePipeline AWS CodeCommit/Git AWS Lambda AWS Glue Job commit deploy
  • 28. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 29. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo Architecture
  • 30. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary No-code to advanced data use cases Process petabytes of data both in batch and real- time using Apache Spark Migrate from expensive traditional ETL solutions to gain flexibility and reduce costs Catalog data assets to make them available to AWS Analytics Services AWS Glue to simplify data integration in the cloud
  • 31. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. A modern data strategy can help you manage, act on, and react to your data so you can make better decisions, respond faster, and uncover new opportunities. Dive deeper with these resources today. • Harness data to reinvent your organization • In unpredictable times, a data strategy is key • Make data a strategic asset • Rewiring your culture to be data-driven • Put your data to work with a modern analytics approach • … and more! Visit the AWS Data resource hub tinyurl.com/aws-data-hub-id Visit resource hub
  • 32. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Training and Certification for Data and Analytics Discover how to harness data, one of the world’s most valuable resources, and innovate at scale. This learning plan expose you to the fastest way to get answers from all your data to all your users. It can also help prepare you for the AWS Certified Data Analytics - Specialty certification exam. Earning AWS Certified Data Analytics – Specialty validates expertise in using AWS data lakes and analytics services. AWS Data & Analytics FREE Training Resources AWS Data Analytics Learning Plan AWS Certified Data Analytics - Specialty https://bit.ly/3Ntlhy7 https://go.aws/3lwF0RR https://bit.ly/3wBVjD1
  • 33. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you for attending AWS Innovate – Data Edition We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future. aws-apj-marketing@amazon.com twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws
  • 34. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Color palette feedback E X A M P L E S 0 0 0 255 255 255 40 40 40 254 143 1 222 72 230 240 130 112 229 230 255 233 127 147 64 104 138 191 112 213 Current palette 0 0 0 255 255 255 6 0 73 254 143 1 222 72 230 74 201 209 21 163 99 105 20 225 40 73 189 181 145 253 Recommended palette (hyperlinks) (hyperlinks) Or whatever color is predominant in the slide background if a solidish color is selected for content slides Text Text Text Text Text Text Text Hyperlink Usable on this background color Usable on this background color Usable on this background color 242 244 244 242 244 244