SlideShare a Scribd company logo
1 of 20
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build An ETL Pipeline To Analyze
Customer Data
Jean-Pierre Dodel
Product Manager – Amazon Comprehend
AWS
A I M 4 1 6
Roy Hasson
Manager, Global Business Development – Analytics
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Introduction to Amazon Comprehend
Introduction to AWS Glue
Workshop use case and architecture
Getting started
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Comprehend
D i s c o v e r i n s i g h t s a n d r e l a t i o n s h i p s i n t e x t
WORLD SERIES WASHINGTON
Entities
Key phrases
Language
Sentiment
Syntax
STORM STOCK MARKET
LIBRARY OF
NEWS ARTICLES
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Comprehend
Amazon.com, Inc. is located in
Seattle, WA and was founded
July 5, 1994 by Jeff Bezos. Our
customers love buying everything
from books to blenders at great
prices.
Named Entities
- Amazon.com: Organization
- Seattle, WA: Location
- July 5,1994: Date
- Jeff Bezos: Person
Key phrases
- our customers
- books
- blenders
- great prices
Sentiment
Positive
Language
English
E x t r a c t i n g i n s i g h t f r o m t e x t
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Comprehend
U s e c a s e s
Voice of customer
analytics
Semantic search Knowledge management/discovery
Analyzing what customers are saying about
your brand, products, and services
Making search smarter by searching on
key phrase, sentiment, and topic
Organizing documents, categorizing by
topic, and personalizing experiences
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatically discover and categorize your data,
making it immediately searchable and queryable
across data sources
Generate code to clean, enrich, and reliably move data
between various data sources. Easily customize this
code or bring your own.
Run your jobs on a serverless, fully managed, scale-
out environment. No compute resources to provision
or manage.
Discover
Develop
Deploy
AWS Glue enables you to …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is AWS Glue Data Catalog?
Unified metadata repository across relational databases, Amazon Relational
Database Service (Amazon RDS), Amazon Redshift, and Amazon Simple Storage
Service (Amazon S3) … with support for more coming soon!
Single searchable view into your data, no matter where it is stored
Ability to automatically crawl and classify your data
Augment technical metadata with business metadata for tables
Track data evolution using schema versioning
Apache Hive metastore compatible and integrated with AWS analytics services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is a crawler ?
Crawlers automatically build your AWS Glue Data Catalog and keep it in sync
Scan your data stored in various data stores, extract metadata and data
statistics, and add table definitions to your Data Catalog
• Classify data using built-in and custom classifiers
• You can write your own using Grok expressions
Discover new data, extracts schema definitions
• Detect schema changes and version tables
• Detect Apache Hive style partitions on Amazon S3
Run ad hoc or on a schedule; serverless – only pay when crawler runs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is AWS Glue ETL ?
1.Customize the mappings
2.AWS Glue generates transformation graph and Python or Scala code
3.Connect your notebook to development endpoints to customize your code
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue utilizes the power of Apache Spark
Spark core: RDDs
SparkSQL
DataFrames Dynamic Frames
AWS Glue ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workshop use case
• Ingest Amazon product reviews
• Perform NLP using Amazon Comprehend & AWS Glue
• Sentiment analysis
• Entity recognition
• Key phrase extraction
• Language detection
• Enrich product reviews dataset
• Query and visualize using Amazon Athena & Amazon QuickSight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workshop architecture
Amazon
Reviews Dataset
Data CatalogAWS Glue Crawler
1
2
3
4
5
AWS Glue Crawler
6
7
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started
1. Configure AWS Glue crawler: s3://amazon-reviews-pds/parquet/
2. Query crawled table in Amazon Athena
3. Create AWS Glue ETL job from the console. Select “A new script to be
authored by you.”
4. Once job is created, copy/paste PySpark script from
https://github.com/rhasson/reinvent2018_aim416
5. Follow instructions in the script to complete each section
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bonus–Predict future sentiment
1. Use Amazon Comprehend to add sentiment labels to review dataset
2. Select relevant features (columns)
3. Select possible computed features
4. Split data into train & test
5. Save train & test datasets to Amazon S3 in CSV format
6. Experiment and build your model using Amazon SageMaker
7. Profit!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jean-Pierre Dodel
Roy Hasson

More Related Content

What's hot

Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...
Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...
Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...Amazon Web Services
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018Amazon Web Services
 
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...Amazon Web Services
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Amazon Web Services
 
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...Amazon Web Services
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Amazon Web Services
 
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Amazon Web Services
 
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...Amazon Web Services
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...Amazon Web Services
 
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018Amazon Web Services
 
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...Amazon Web Services
 
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...Amazon Web Services
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018Amazon Web Services
 
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Amazon Web Services
 
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...Amazon Web Services
 
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Amazon Web Services
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...Amazon Web Services
 

What's hot (20)

Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...
Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...
Financial Svcs: Mine Actionable Insights from Customer Interactions Using Mac...
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
 
AWS reInvent 2018 recap edition
AWS reInvent 2018 recap editionAWS reInvent 2018 recap edition
AWS reInvent 2018 recap edition
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
 
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...
Deep Dive on Amazon Rekognition, ft. Tinder & News UK (AIM307-R) - AWS re:Inv...
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
 
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
 
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
 
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018
Unsupervised Learning with Amazon SageMaker (AIM333) - AWS re:Invent 2018
 
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...
From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) ...
 
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
 
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
 
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
 
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...
Build an Intelligent Multi-Modal User Agent with Voice and NLU (AIM340) - AWS...
 

Similar to Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Amazon Web Services
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Amazon Web Services
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018Amazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...Amazon Web Services
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 

Similar to Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018 (20)

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...
Market Prediction Using ML: Experiment with Amazon SageMaker and the Deutsche...
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build An ETL Pipeline To Analyze Customer Data Jean-Pierre Dodel Product Manager – Amazon Comprehend AWS A I M 4 1 6 Roy Hasson Manager, Global Business Development – Analytics AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Introduction to Amazon Comprehend Introduction to AWS Glue Workshop use case and architecture Getting started
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Comprehend D i s c o v e r i n s i g h t s a n d r e l a t i o n s h i p s i n t e x t WORLD SERIES WASHINGTON Entities Key phrases Language Sentiment Syntax STORM STOCK MARKET LIBRARY OF NEWS ARTICLES
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Comprehend Amazon.com, Inc. is located in Seattle, WA and was founded July 5, 1994 by Jeff Bezos. Our customers love buying everything from books to blenders at great prices. Named Entities - Amazon.com: Organization - Seattle, WA: Location - July 5,1994: Date - Jeff Bezos: Person Key phrases - our customers - books - blenders - great prices Sentiment Positive Language English E x t r a c t i n g i n s i g h t f r o m t e x t
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Comprehend U s e c a s e s Voice of customer analytics Semantic search Knowledge management/discovery Analyzing what customers are saying about your brand, products, and services Making search smarter by searching on key phrase, sentiment, and topic Organizing documents, categorizing by topic, and personalizing experiences
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatically discover and categorize your data, making it immediately searchable and queryable across data sources Generate code to clean, enrich, and reliably move data between various data sources. Easily customize this code or bring your own. Run your jobs on a serverless, fully managed, scale- out environment. No compute resources to provision or manage. Discover Develop Deploy AWS Glue enables you to …
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is AWS Glue Data Catalog? Unified metadata repository across relational databases, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, and Amazon Simple Storage Service (Amazon S3) … with support for more coming soon! Single searchable view into your data, no matter where it is stored Ability to automatically crawl and classify your data Augment technical metadata with business metadata for tables Track data evolution using schema versioning Apache Hive metastore compatible and integrated with AWS analytics services
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is a crawler ? Crawlers automatically build your AWS Glue Data Catalog and keep it in sync Scan your data stored in various data stores, extract metadata and data statistics, and add table definitions to your Data Catalog • Classify data using built-in and custom classifiers • You can write your own using Grok expressions Discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Apache Hive style partitions on Amazon S3 Run ad hoc or on a schedule; serverless – only pay when crawler runs
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is AWS Glue ETL ? 1.Customize the mappings 2.AWS Glue generates transformation graph and Python or Scala code 3.Connect your notebook to development endpoints to customize your code
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue utilizes the power of Apache Spark Spark core: RDDs SparkSQL DataFrames Dynamic Frames AWS Glue ETL
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workshop use case • Ingest Amazon product reviews • Perform NLP using Amazon Comprehend & AWS Glue • Sentiment analysis • Entity recognition • Key phrase extraction • Language detection • Enrich product reviews dataset • Query and visualize using Amazon Athena & Amazon QuickSight
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workshop architecture Amazon Reviews Dataset Data CatalogAWS Glue Crawler 1 2 3 4 5 AWS Glue Crawler 6 7
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started 1. Configure AWS Glue crawler: s3://amazon-reviews-pds/parquet/ 2. Query crawled table in Amazon Athena 3. Create AWS Glue ETL job from the console. Select “A new script to be authored by you.” 4. Once job is created, copy/paste PySpark script from https://github.com/rhasson/reinvent2018_aim416 5. Follow instructions in the script to complete each section
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bonus–Predict future sentiment 1. Use Amazon Comprehend to add sentiment labels to review dataset 2. Select relevant features (columns) 3. Select possible computed features 4. Split data into train & test 5. Save train & test datasets to Amazon S3 in CSV format 6. Experiment and build your model using Amazon SageMaker 7. Profit!
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jean-Pierre Dodel Roy Hasson