SlideShare a Scribd company logo
1 of 22
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build Machine Learning Solutions
on Data Lakes
Vikrant Kahlir
Solutions Architect
AWS
A R C 3 2 1
Joseph Glover
Solutions Architect
AWS
Gitansh Chadha
Solutions Architect
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Overview
Lab 1: Load data in Amazon DynamoDB
Lab 2: Amazon Kinesis
Lab 3: AWS Glue
Lab 4: Amazon SageMaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build machine learning solutions on data lakes
Real-time ratings
Processed
data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Link to workshop
https://bit.ly/2DOW6FB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build machine learning solutions on data lakes
Real-time ratings
Processed
data
Lab 1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Database Migration Service (AWS DMS)
How it works
Load table by table
Creates tables at target database
Sets up metadata required at target
Populates data from source
Each process loads one entire table
Can use multiple processes
Can be paused
When restarted will continue from where
it was stopped
Will reload any tables that were currently
in progress
Replication
instance
Source Target
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
Table creation options
PartitionKey, Type:
SortKey, Type:
Provisioned Reads:
Provisioned Writes:
LSI Schema GSI Schema
AttributeName [S,N,B]
AttributeName [S,N,B]
1+
1+
Provisioned Reads: 1+
Provisioned Writes: 1+
TableName
Optional
Required
CreateTable
String,
Number,
Binary ONLY
Per Second
Unique to
Account and
Region
Highly available
Consistent, single-digit
millisecond latency
at any scale
Fully managed
Integrates with AWS Lambda,
Amazon Redshift, and more
Secure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build machine learning solutions on data lakes
Real-time ratings
Processed
data
Lab 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming with Amazon Kinesis
Easily collect, process, and analyze video and data streams in real time
Capture, process, and
store video streams
Capture, process,
and store data
streams
Load data streams
into data stores
Analyze data streams
with SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build machine learning solutions on data lakes
Real-time ratings
Processed
data
Lab 3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
AWS Glue Data Catalog
Discover and organize your data in various databases, data
warehouses, and data lakes
Job authoring
Focus on the writing transformations
Generate code through a wizard, or write your own
Job execution
Runs jobs in Spark containers—automatic scaling
Fully managed hosting with auto scaling
AWS Glue is serverless—only pay for the resources you consume
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build machine learning solutions on data lakes
Real-time ratings
Processed
data
Lab 4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Overview: Lab 4
Data exploration and preparation
One hot encoding
Binary recommender
Training and hosting
Factorization machines
Hyperparameters
Movie recommender
Nearest neighbor
Inference call
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker
Build
Pre-built Jupyter Notebooks
Built-in, high-performance algorithms
Train & tune
One-click training
Hyperparameter optimization
Deploy
One-click deployment
Fully managed hosting with auto scaling
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Vikrant Kahlir
awsvik@amazon.com
Joseph Glover
joglover@amazon.com
Gitansh Chadha
gitanshc@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon Web Services
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Amazon Web Services
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeAmazon Web Services
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...Amazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Amazon Web Services
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
 
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018Amazon Web Services
 
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Amazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Amazon Web Services
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Amazon Web Services
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Amazon Web Services
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Amazon Web Services
 

What's hot (20)

Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT331) - AWS re:Inv...
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
 
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018
Machine Learning & Analytics on Snowball Edge (WPS314) - AWS re:Invent 2018
 
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
 

Similar to Build Machine Learning Solutions on Data Lakes (ARC321) - AWS re:Invent 2018

Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopAmazon Web Services
 
BDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSBDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSAmazon Web Services
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)Adir Sharabi
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018Amazon Web Services
 
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Amazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...Amazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightAmazon Web Services
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL PipelinesAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 

Similar to Build Machine Learning Solutions on Data Lakes (ARC321) - AWS re:Invent 2018 (20)

Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing Workshop
 
BDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSBDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWS
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
 
It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018It's all about the data - Tel Aviv Summit 2018
It's all about the data - Tel Aviv Summit 2018
 
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...
Intelligence of Things: IoT, AWS DeepLens and Amazon SageMaker - AWS Summit S...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...
CI/CD for Your Machine Learning Pipeline with Amazon SageMaker (DVC303) - AWS...
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Build Machine Learning Solutions on Data Lakes (ARC321) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build Machine Learning Solutions on Data Lakes Vikrant Kahlir Solutions Architect AWS A R C 3 2 1 Joseph Glover Solutions Architect AWS Gitansh Chadha Solutions Architect AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Overview Lab 1: Load data in Amazon DynamoDB Lab 2: Amazon Kinesis Lab 3: AWS Glue Lab 4: Amazon SageMaker
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build machine learning solutions on data lakes Real-time ratings Processed data
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Link to workshop https://bit.ly/2DOW6FB
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build machine learning solutions on data lakes Real-time ratings Processed data Lab 1
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Database Migration Service (AWS DMS) How it works Load table by table Creates tables at target database Sets up metadata required at target Populates data from source Each process loads one entire table Can use multiple processes Can be paused When restarted will continue from where it was stopped Will reload any tables that were currently in progress Replication instance Source Target
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon DynamoDB Table creation options PartitionKey, Type: SortKey, Type: Provisioned Reads: Provisioned Writes: LSI Schema GSI Schema AttributeName [S,N,B] AttributeName [S,N,B] 1+ 1+ Provisioned Reads: 1+ Provisioned Writes: 1+ TableName Optional Required CreateTable String, Number, Binary ONLY Per Second Unique to Account and Region Highly available Consistent, single-digit millisecond latency at any scale Fully managed Integrates with AWS Lambda, Amazon Redshift, and more Secure
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build machine learning solutions on data lakes Real-time ratings Processed data Lab 2
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams Capture, process, and store data streams Load data streams into data stores Analyze data streams with SQL
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build machine learning solutions on data lakes Real-time ratings Processed data Lab 3
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue AWS Glue Data Catalog Discover and organize your data in various databases, data warehouses, and data lakes Job authoring Focus on the writing transformations Generate code through a wizard, or write your own Job execution Runs jobs in Spark containers—automatic scaling Fully managed hosting with auto scaling AWS Glue is serverless—only pay for the resources you consume
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build machine learning solutions on data lakes Real-time ratings Processed data Lab 4
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Overview: Lab 4 Data exploration and preparation One hot encoding Binary recommender Training and hosting Factorization machines Hyperparameters Movie recommender Nearest neighbor Inference call
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker Build Pre-built Jupyter Notebooks Built-in, high-performance algorithms Train & tune One-click training Hyperparameter optimization Deploy One-click deployment Fully managed hosting with auto scaling
  • 21. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Vikrant Kahlir awsvik@amazon.com Joseph Glover joglover@amazon.com Gitansh Chadha gitanshc@amazon.com
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.