SlideShare a Scribd company logo
1 of 31
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, Solutions Architect – Data and Analytics, AI/ML
Wednesday, May 22, 2019
Data Lifecycle –
Driving Insights with Analytics and Machine
Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Driving Insights:
Deliver decisions makers the
insights to transform an
organization by identifying
unmet needs within the
customers or by optimizing
operational processes
Questions to ask:
What business question is being answered?
Does the data support answering them?
Who are the users driving the insights?
What skills do those users have?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business needs come in various forms:
• Present actionable information and reporting to executives and
managers
• Combined heterogeneous datasets together to be able to
answer additional questions
• Query and Investigate your datasets
• Drive operational and security understanding.
• Understand what’s happening in the business now
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics solutions
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Warehousing
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Present actionable information and reporting to
executives and managers
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift – data warehousing
Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost
Massively parallel, scale from gigabytes to petabytes
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
$
Inexpensive
As low as $1,000 per
terabyte per year, 1/10 the
cost of traditional data
warehouse solutions; start
at $0.25 per hour
Open file formats Secure
Audit everything; encrypt
data end-to-end;
extensive certification and
compliance
Analyze optimized data
formats on the latest SSD,
and all open data formats in
Amazon S3
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Data Warehouse
Relational data
Gigabytes to petabytes scale
Reporting and analysis
Schema defined prior to data load
AWS
Glue ETL
On Prem
Amazon
QuickSight
Existing or new
BI tool
Redshift
COPY
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Complementary to EDW (not replacement) Data lake can be source for EDW
Schema on read (no predefined schemas) Schema on write (predefined schemas)
Structured/semi-structured/Unstructured data Structured data only
Fast ingestion of new data/content Time consuming to introduce new content
Data Science + Prediction/Advanced Analytics + BI use
cases
BI use cases
Data at low level of detail/granularity Data at summary/aggregated level of detail
Loosely defined SLAs Tight SLAs (production schedules)
Flexibility in tools (open source/tools for advanced
analytics)
Limited flexibility in tools (SQL only)
Elastic storage and compute capacity – decoupled
Explicitly sized environments, compute and storage
scaled in linearly
A Data Lake is not an Enterprise Data Warehouse
Data Lake EDW
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum
E x t e n d t h e d a t a w a r e h o u s e t o e x a b y t e s o f d a t a i n A m a z o n S 3 d a t a l a k e
Amazon S3
Data Lake
Amazon
Redshift data
Amazon Redshift Spectrum
query engine
Exabyte Redshift SQL queries against Amazon S3
Join data across Redshift and Amazon S3
Scale compute and storage separately
Stable query performance and unlimited concurrency
CSV, ORC, Grok, Avro, & Parquet data formats
Pay only for the amount of data scanned
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amaz on Reds hift
Spec tr um
Quer y your D ata Lak e
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon Redshift
Spectrum
Scale-out serverless compute
AWS Glue Data Catalog
COPY
commands
Hot data
Query directly
on Data Lake
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes extend the
traditional data warehouse
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visual insights for everyone
with Amazon QuickSight
Pay only for what you use
Scale to tens of thousands of users
Embedded analytics
Build end-to-end BI solutions
Visualization
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Processing
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Combined heterogeneous datasets together to be able to
answer additional questions
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR – big data processing
Analytics and ML at scale
Run other popular distributed frameworks such as Apache Spark, HBase, Presto, and
Flink, and many others
Enterprise-grade security
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Low cost
Flexible billing with per-
second billing, Amazon
EC2 Spot, Reserved
Instances, and Auto
Scaling to reduce costs
50%-80%
Amazon S3 storage
Process data directly in
the Amazon S3 data lake
securely with high
performance using the
EMRFS connector
Easy
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Data Lake
100110000100101011100
1010101110010101000
00111100101100101
010001100001
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hadoop / Spark Analytics on AWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop / Spark
Object storage
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fitting this into the Common Data Catalog
Amazon S3
Interactive Spark cluster
Amazon EMR
Amazon EMR
EMRFS
HDFS
Transient ETL job
Source of Truth
EMRFS
HDFS
Describes the data
MySQL DB
instance
Unifieddataview
AWS Glue
Data Catalog
Stores the data
…
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Interactive Query
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Query and Investigate your datasets Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena – interactive analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
$ SQL
Query instantly
Zero setup cost; just
point to Amazon S3
and start querying
Pay per query
Pay only for queries run;
save 30%–90% on per-
query costs through
compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operational Analytics
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Present actionable information and reporting to
executives and managers
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operational analytics for logs and search
with Amazon Elasticsearch Service
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash
and Kibana
Amazon VPC support; at-rest
and in-transit encryption
Easily scale up and down
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real time Analytics
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Understand what’s happening in the business now
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operational analytics for logs and search
with Amazon Elasticsearch Service
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash
and Kibana
Amazon VPC support; at-rest
and in-transit encryption
Easily scale up and down
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demonstration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning solutions
AI Services ML Services
ML Frameworks and
Infrastructure
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Language Chatbots
A M A Z O N
S A G E M A K E R
B U I L D T R A I N
F O R E C A S T
Forecasting
T E X T R A C T
Recommendations
D E P L O Y
Pre-built algorithms
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization (N E O )
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3
& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
I N F E R E N C E
Reinforcement learning
Algorithms & models ( A W S M A R K E T P L A C E
F O R M A C H I N E L E A R N I N G )
I N F E R E N T I A
Notebook Hosting
One-click deployment & hosting
Auto-scaling
Virtual Private Cloud
Private Link
Elastic Inference integration
Hyper Parameter Optimization
P E R S O N A L I Z E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Language Chatbots
F O R E C A S T
Forecasting
T E X T R A C T
Recommendations
P E R S O N A L I Z E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One-click
model training
& deployment
10x
better algorithm
performance
Predictive insights
to improve
decision making
A M A Z O N
S A G E M A K E R
B U I L D T R A I N D E P L O Y
Pre-built algorithms
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization (N E O )
M L S E R V I C E S
Reinforcement learning
Algorithms & models ( A W S M A R K E T P L A C E
F O R M A C H I N E L E A R N I N G )
Notebook Hosting
Hyper Parameter Optimization
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demonstration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics solutions
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Machine Learning solutions
AI Services ML Services
ML Frameworks and
Infrastructure
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started:
• Start Small, build upon successes
• Use MVP principles incrementally building
• Build Loosely/De-coupled solutions
• Pick the right tool for the right job
• Based on business question
• Users
• Data
• Leverage Managed/Serverless solutions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Questions?

More Related Content

What's hot

Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
Mat Keep
 
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin FanSpark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Data Con LA
 

What's hot (20)

Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SF
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
 
Data estate modernization feb webinar 2 18 2020
Data estate modernization   feb webinar 2 18 2020Data estate modernization   feb webinar 2 18 2020
Data estate modernization feb webinar 2 18 2020
 
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of Data
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
 
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
 
devopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying MLdevopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying ML
 
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin FanSpark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 

Similar to Leveraging Cloud Analytics to Support Data-Driven Decisions

Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
 

Similar to Leveraging Cloud Analytics to Support Data-Driven Decisions (20)

Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Data Lake na área da saúde- AWS
Data Lake na área da saúde- AWSData Lake na área da saúde- AWS
Data Lake na área da saúde- AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Build and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data ArchitectureBuild and Innovate with a Modern Data Architecture
Build and Innovate with a Modern Data Architecture
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Immersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dadoImmersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dado
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Leveraging Cloud Analytics to Support Data-Driven Decisions

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, Solutions Architect – Data and Analytics, AI/ML Wednesday, May 22, 2019 Data Lifecycle – Driving Insights with Analytics and Machine Learning
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Driving Insights: Deliver decisions makers the insights to transform an organization by identifying unmet needs within the customers or by optimizing operational processes Questions to ask: What business question is being answered? Does the data support answering them? Who are the users driving the insights? What skills do those users have?
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business needs come in various forms: • Present actionable information and reporting to executives and managers • Combined heterogeneous datasets together to be able to answer additional questions • Query and Investigate your datasets • Drive operational and security understanding. • Understand what’s happening in the business now
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics solutions Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Analytics
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Warehousing Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Present actionable information and reporting to executives and managers Analytics
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift – data warehousing Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost Massively parallel, scale from gigabytes to petabytes Fast at scale Columnar storage technology to improve I/O efficiency and scale query performance $ Inexpensive As low as $1,000 per terabyte per year, 1/10 the cost of traditional data warehouse solutions; start at $0.25 per hour Open file formats Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Analyze optimized data formats on the latest SSD, and all open data formats in Amazon S3 Analytics
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Data Warehouse Relational data Gigabytes to petabytes scale Reporting and analysis Schema defined prior to data load AWS Glue ETL On Prem Amazon QuickSight Existing or new BI tool Redshift COPY Analytics
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Complementary to EDW (not replacement) Data lake can be source for EDW Schema on read (no predefined schemas) Schema on write (predefined schemas) Structured/semi-structured/Unstructured data Structured data only Fast ingestion of new data/content Time consuming to introduce new content Data Science + Prediction/Advanced Analytics + BI use cases BI use cases Data at low level of detail/granularity Data at summary/aggregated level of detail Loosely defined SLAs Tight SLAs (production schedules) Flexibility in tools (open source/tools for advanced analytics) Limited flexibility in tools (SQL only) Elastic storage and compute capacity – decoupled Explicitly sized environments, compute and storage scaled in linearly A Data Lake is not an Enterprise Data Warehouse Data Lake EDW Analytics
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Spectrum E x t e n d t h e d a t a w a r e h o u s e t o e x a b y t e s o f d a t a i n A m a z o n S 3 d a t a l a k e Amazon S3 Data Lake Amazon Redshift data Amazon Redshift Spectrum query engine Exabyte Redshift SQL queries against Amazon S3 Join data across Redshift and Amazon S3 Scale compute and storage separately Stable query performance and unlimited concurrency CSV, ORC, Grok, Avro, & Parquet data formats Pay only for the amount of data scanned Analytics
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amaz on Reds hift Spec tr um Quer y your D ata Lak e Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon Redshift Spectrum Scale-out serverless compute AWS Glue Data Catalog COPY commands Hot data Query directly on Data Lake Analytics
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes extend the traditional data warehouse Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and nonrelational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning Analytics
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visual insights for everyone with Amazon QuickSight Pay only for what you use Scale to tens of thousands of users Embedded analytics Build end-to-end BI solutions Visualization
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Processing Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Combined heterogeneous datasets together to be able to answer additional questions Analytics
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR – big data processing Analytics and ML at scale Run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink, and many others Enterprise-grade security $ Latest versions Updated with the latest open source frameworks within 30 days of release Low cost Flexible billing with per- second billing, Amazon EC2 Spot, Reserved Instances, and Auto Scaling to reduce costs 50%-80% Amazon S3 storage Process data directly in the Amazon S3 data lake securely with high performance using the EMRFS connector Easy Launch fully managed Hadoop & Spark in minutes; no cluster setup, node provisioning, cluster tuning Data Lake 100110000100101011100 1010101110010101000 00111100101100101 010001100001 Analytics
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hadoop / Spark Analytics on AWS YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS Amazon S3 Amazon EMR Managed Hadoop / Spark Object storage Analytics
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fitting this into the Common Data Catalog Amazon S3 Interactive Spark cluster Amazon EMR Amazon EMR EMRFS HDFS Transient ETL job Source of Truth EMRFS HDFS Describes the data MySQL DB instance Unifieddataview AWS Glue Data Catalog Stores the data … Analytics
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interactive Query Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Query and Investigate your datasets Analytics
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena – interactive analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load $ SQL Query instantly Zero setup cost; just point to Amazon S3 and start querying Pay per query Pay only for queries run; save 30%–90% on per- query costs through compression Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types Easy Serverless: zero infrastructure, zero administration Integrated with Amazon QuickSight Analytics
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operational Analytics Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Present actionable information and reporting to executives and managers Analytics
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operational analytics for logs and search with Amazon Elasticsearch Service Fully managed; deploy production-ready cluster in minutes Direct access to Elasticsearch open-source APIs, Logstash and Kibana Amazon VPC support; at-rest and in-transit encryption Easily scale up and down Analytics
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real time Analytics Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Understand what’s happening in the business now
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operational analytics for logs and search with Amazon Elasticsearch Service Fully managed; deploy production-ready cluster in minutes Direct access to Elasticsearch open-source APIs, Logstash and Kibana Amazon VPC support; at-rest and in-transit encryption Easily scale up and down Analytics
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demonstration
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning solutions AI Services ML Services ML Frameworks and Infrastructure
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Language Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T Forecasting T E X T R A C T Recommendations D E P L O Y Pre-built algorithms Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Reinforcement learning Algorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G ) I N F E R E N T I A Notebook Hosting One-click deployment & hosting Auto-scaling Virtual Private Cloud Private Link Elastic Inference integration Hyper Parameter Optimization P E R S O N A L I Z E
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Language Chatbots F O R E C A S T Forecasting T E X T R A C T Recommendations P E R S O N A L I Z E
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. One-click model training & deployment 10x better algorithm performance Predictive insights to improve decision making A M A Z O N S A G E M A K E R B U I L D T R A I N D E P L O Y Pre-built algorithms Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) M L S E R V I C E S Reinforcement learning Algorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G ) Notebook Hosting Hyper Parameter Optimization
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demonstration
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics solutions Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Machine Learning solutions AI Services ML Services ML Frameworks and Infrastructure
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting Started: • Start Small, build upon successes • Use MVP principles incrementally building • Build Loosely/De-coupled solutions • Pick the right tool for the right job • Based on business question • Users • Data • Leverage Managed/Serverless solutions
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Questions?

Editor's Notes

  1. Attend this webinar to learn about AWS business intelligence (BI) analytics, visualization, artificial intelligence, and machine learning services that can transform data into insights.
  2. Data analytics is the stage where an organization can identify ways to increase revenue or reduce cost. Analytics and visualization delivers decisions makers the insights to transform an organization by identifying unmet needs within the customers or by optimizing operational processes. Data-driven decisions leads to transforming how managers allocate resources and evaluate results within an organization. Reliance on data reduces the role of hearsay and instincts when making choices. A manager’s intuition is now backed with data at the front-end of the planning process, through the course of implementation, and when evaluating the impact of his or her decisions. Key considerations in this phase include the requirements for analytics being clearly defined; the output being aligned to the use cases; and the consumers of data within the organization finding the insight generated as actionable data. Let’s review some of the solutions available for analytics within the AWS portfolio during this stage.
  3. Picking the right analytical engine for your needs (200) AWS offers analytical engines for several use cases such as big data processing, data warehousing, ad-hoc analysis, real-time streaming, and operational/log analytics. In this session, you will learn about what engines you can use for your use case to analyze all of your data stored in your Amazon S3 data lake in open formats. You will also learn how to use these engines together for generating new insights, such as complementing your data warehouse workloads with ad-hoc and real-time analytics engines to incorporate new data into your reports.
  4. We begin with data warehousing. Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds. With Amazon Redshift, you can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions. Fast Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and by parallelizing queries across multiple nodes. Data load speed scales linearly with cluster size, with integrations to Amazon S3, Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and any SSH-enabled host. Inexpensive You only pay for what you use. You can have unlimited number of users doing unlimited analytics on all your data for just $1000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions. Most customers see 3-4x reduction of data size after compression, reducing their costs to $250-$333 per uncompressed terabyte per year. Extensible Redshift Spectrum enables you to run queries against exabytes of data in Amazon S3 as easily as you run queries against petabytes of data stored on local disks in Amazon Redshift, using the same SQL syntax and BI tools you use today. You can store highly structured, frequently accessed data on Redshift local disks, keep vast amounts of unstructured data in an Amazon S3 “Data Lake”, and query seamlessly across both. Simple Amazon Redshift allows you to easily automate most of the common administrative tasks to manage, monitor, and scale your data warehouse. By handling all these time-consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business. Scalable You can easily resize your cluster up and down as your performance and capacity needs change with just a few clicks in the console or a simple API call. Secure Security is built-in. You can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate your clusters using Amazon VPC and even manage your keys using AWS Key Management Service (KMS) and hardware security modules (HSMs).
  5. Traditionally, analytics was run through a relational data warehouse. It collects data from multiple source systems and produce operational reports. This method of analytics had a few characteristics: It was optimized for relational data sources It scaled up to PBs It required the questions to be answered prior to the DW design because schema had to be created to know what type of data is loaded into the data warehouse It enabled operational reporting on top of the data in the DW The belief that data is an asset is causing pressure on traditional architectures. It can’t be business as usual anymore because these new customer requirements might break the traditional approach. Customers need to: Capture and store new non-relational data at EB scale. Customers want to store new non-relational data that is being sourced by different places not currently in the data warehouse. This includes machine generated data (ie. IoT devices), logfiles, clickstream data, social media, etc. These new sources of data are being generated at a high volume that can scale to Exabyte-size. The traditional data warehouse was not optimized for storing all of this non-relational data because it was designed for relational data at PB scale. Secure and combine data from new and existing sources. Customers want to have a single view of all of their data and they want an easy way to catalog, search all of this data to do analytics on top of it. Furthermore, they want their data to be secured to prevent unauthorized access. The traditional data architectures were not built to account for this. Data exists in silos or if they are centralized into an enterprise data warehouse, it is extremely costly to build ETL to move the data which will not scale at EB data volumes. Do new types of analysis on their data (Machine Learning, Big Data processing & real-time analytics). Customers are increasingly needing to do new types of analytics. They want to move from answering questions that happen in the past to using statistical models and forecast techniques to understand and answer what could happen. To do this, customers need to move to incorporate machine learning, big data processing, and real-time analytics. However, their traditional architecture could only accommodate reporting and ad hoc analysis on relational data.
  6. --==[WHAT TO SAY]==-- This table provides a point of comparison, from the old world to the new… AWS cloud is the best place to build a data lake… A data lake on AWS gives you access to the most complete platform for big data. AWS provides you with secure infrastructure and offers a broad set of scalable, cost-effective services to collect, store, categorize, and analyze your data to get meaningful insights. AWS makes it easy to build and tailor your data lake to your specific data analytic requirements.
  7. Amazon Redshift Spectrum enables you to run Amazon Redshift SQL queries against exabytes of data in Amazon S3. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “Data Lake” -- without having to load or transform any data. Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries. Redshift Spectrum directly queries data in Amazon S3 using the open data formats you already use, including Avro, CSV, Grok, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, you can run sophisticated queries using the same Business Intelligence (BI) tools you use today. You can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and your full data sets stored cost-effectively in Amazon S3. Start Querying Instantly Same SQL. Same BI tools. No loading required. With Amazon Redshift Spectrum, you can start querying your data in Amazon S3 immediately, with no loading or transformation required. You just need to register your Amazon Athena, AWS Glue Data Catalog, or Apache Hive Metastore as an external schema. You can use the same SQL you use for querying Amazon Redshift tables and any BI tool that supports Redshift today. Fast Performance Leverage the powerful Amazon Redshift query optimizer. Amazon Redshift delivers super-fast performance whether it is for ad-hoc analysis on large unstructured data sets in Amazon S3 or frequent analysis on structured data sets in Redshift tables. You can maintain hot data in your Amazon Redshift clusters to get the performance of local disks, and use Amazon Redshift Spectrum to extend your queries to cold data stored in Amazon S3 for unlimited scalability and low cost. The Amazon Redshift query optimizer will automatically determine how to minimize data scanned in Amazon S3 and the number of Redshift Spectrum nodes to use in the query. Limitless Scalability Separate compute and storage. With Amazon Redshift Spectrum, you don’t have to worry about scaling your cluster. It lets you separate storage and compute, allowing you to scale each independently. You can even run multiple Amazon Redshift clusters against the same Amazon S3 Data Lake, enabling limitless concurrency. Redshift Spectrum automatically scales out to thousands of instances if needed, so queries run quickly, whether processing a terabyte, a petabyte or an exabyte. Pay Per Query Only pay for data processed. With Amazon Redshift Spectrum, you only pay for the queries you run. You are charged $5 per terabyte of data processed to execute your query. Redshift Spectrum can query compressed data. You can both save 30% to 90% on your per-query costs and improve performance by compressing, partitioning, and converting your data to a columnar format. There are no charges for Redshift Spectrum when you’re not running queries. You pay standard Amazon S3 rates for data storage and Amazon Redshift instance rates for the clusters used.
  8. If we take a look behind the scenes, we have the Redshift cluster in green and the purple boxes are the autoscaling, multi tenant, spectrum fleet of compute nodes. tricks for Spectrum in Redshift optimizer, high levels of parallism to operate on S3 data. slice upto 10 Spectrum compute : GBs or even an Exabyte. Separate storage + compute. loading data in your local cluster is now optional. data in S3 + Redshift + Spectrum. Many clusters, perf hit but flexible.
  9. For big data processing, Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB. Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics. We have found EMR to be one of the most cost effective Hadoop and Spark distribution because of all the flexible ways we can bill the customer with per second billing introduced this year. We have spot pricing that can dramatically lower your bill or reserved instances that can lower your bill 50-80%. Finally, with the ability to automatically resize your clusters down based on scaling rules, EMR is the most cost effective place to run your Hadoop and Spark workloads. Easy to Use You can launch an Amazon EMR cluster in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis.  Low Cost Amazon EMR pricing is simple and predictable: You pay a per-second rate for every second used, with a one-minute minimum charge. You can launch a 10-node Hadoop cluster for as little as $0.15 per hour. Because Amazon EMR has native support for Amazon EC2 Spot and Reserved Instances, you can also save 50-80% on the cost of the underlying instances. Elastic With Amazon EMR, you can provision one, hundreds, or thousands of compute instances to process data at any scale. You can easily increase or decrease the number of instances manually or with Auto Scaling, and you only pay for what you use. Reliable You can spend less time tuning and monitoring your cluster. Amazon EMR has tuned Hadoop for the cloud; it also monitors your cluster —retrying failed tasks and automatically replacing poorly performing instances. Secure Amazon EMR automatically configures Amazon EC2 firewall settings that control network access to instances, and you can launch clusters in an Amazon Virtual Private Cloud (VPC), a logically isolated network you define. For objects stored in Amazon S3, you can use Amazon S3 server-side encryption or Amazon S3 client-side encryption with EMRFS, with AWS Key Management Service or customer-managed keys. Flexible You have complete control over your cluster. You have root access to every instance, you can easily install additional applications, and you can customize every cluster with bootstrap actions. You can also launch Amazon EMR clusters with custom Amazon Linux AMIs. 
  10. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets. Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance. Start Querying Instantly Serverless. No ETL. Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses. Just point to your data in Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows you to tap into all your data in S3 without the need to set up complex processes to extract, transform, and load the data (ETL). Pay Per Query Only pay for data scanned. With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries. You can save from 30% to 90% on your per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Athena queries data directly in Amazon S3. There are no additional storage charges beyond S3. Open. Powerful. Standard Built on Presto. Runs standard SQL. ANSI SQL interface, JDBC/ODBC drivers, Can handle multiple formats(CSV, JSON, AVRO, PARQUET, ORC, Geospatial), compression types (GZ, LZO, BZ2) and complex Joins and data types (Arrays, maps, structs) Easy Serverless. Zero Infrastructure. Zero Administration
  11. We are making rapid improvements to solve the three hardest challenges for customers to adopt AI/ML: Cost, Ease of Use, and Data. Our launches this year underscore that. We see the Machine Learning stack having three key layers. ML Frameworks: The bottom layer is for expert machine learning practitioners—researchers and developers. These are people who are comfortable building models, tuning models, training models, figuring out how to deploy into production, and manage them themselves. And the vast majority of machine learning in the cloud today at this layer is being down through Amazon SageMaker which provides a managed experience for frameworks, or the AWS Deep Learning AMI that we built that effectively embeds all the major frameworks. Infrastructure: AWS offers a broad array of compute options for training and inference with powerful GPU-based instances, compute and memory optimized instances, and even FPGAs. Our P3 instances provide up to 14 times better performance than previous-generation Amazon EC2 GPU compute instances. C5 instances offer higher memory to vCPU ratio and deliver 25% improvement in price/performance compared to C4 instances, and are ideal for demanding inference applications.  We also have Amazon EC2 F1, a compute instance with field programmable gate arrays (FPGAs) that you can program to create custom hardware accelerations for your machine learning applications. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code. You can reuse your designs as many times, and across as many F1 instances as you like. The new Amazon EC2 P3dn instance has four-times the networking bandwidth and twice the GPU memory of the largest P3 instance, P3dn is ideal for large scale distributed training. No one else has anything close. P3dn.24xlarge instances offer 96vCPUs of Intel Skylake processors to reduce preprocessing time of data required for machine learning training. The enhanced networking of the P3n instance allows GPUs to be used more efficiently in multi-node configurations so training jobs complete faster. Finally, the extra GPU memory allows developers to easily handle more advanced machine learning models such as holding and processing multiple batches of 4k images for image classification and object detection systems ML Services: But, if you want to enable most enterprises and companies to be able to scale machine learning, we’ve solved that problem for organizations by making ML accessible for everyday developers and scientists. Amazon SageMaker removes the heavy lifting, complexity, and guesswork from each step of the machine learning process. SageMaker makes model building and training easier by providing pre-built development notebooks, popular machine learning algorithms optimized for petabyte-scale datasets, and automatic model tuning, enabling developers to build, train, and deploy models in a single click. SageMaker is already helping thousands of developers easily get started with building, training, and deploying models.  AI Services: At the top layer are AI services which are ready-made for all developers—no ML skills. For example, customers say here is an object, tell me what's in it, or here's a face, tell me if it's part of this facial group using Amazon Rekognition Or let me translate text to speech using Amazon Polly Or let’s build conversational apps with Amazon Lex. Convert speech to text with Amazon Transcribe Translate text between languages using Amazon Translate Understand relationships and find insights from unstructured text using Amazon Comprehend
  12. We have a portfolio of solution based AI services that can be accessed via a simple API call across vision, speech, language service and conversational chatbots. AWS has invested deeply in these services as they address some of the most common problems and or opportunities customers are facing where AI can advance the state of the art. AWS has the capability to invest at a level of scale that would be uneconomical for most customer, and our scale enables us to offer these services at low cost. Customers can build these capabilities into their new and existing applications to reduce costs, increase speed,  improve customer satisfaction and insight, and build ‘modern’ intelligent applications Our AI services are intentionally easy to use. They can be accessed via a simple API call. When used in conjunction, create compelling solutions that target common business problems and use cases. ADDITIONAL COLOR: Amazon Rekognition: Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Rekognition API, and the service can identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial recognition on images and video that you provide. You can detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases. Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3. Amazon Rekognition is always learning from new data, and we are continually adding new labels and facial recognition features to the service. More info: https://aws.amazon.com/rekognition/ Amazon Polly: Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly is a text to speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. More info: https://aws.amazon.com/polly/ Amazon Transcribe: Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content. The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so that you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language. More info: https://aws.amazon.com/transcribe/ Amazon Translate: Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to localize content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently. More info: https://aws.amazon.com/translate/ Amazon Comprehend: Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic. Using these APIs, you can analyze text and apply the results in a wide range of applications including voice of customer analysis, intelligent document search, and content personalization for web applications.  More info: https://aws.amazon.com/comprehend Amazon Lex: Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots  More info: https://aws.amazon.com/lex    
  13. Amazon SageMaker is the most widely used machine learning service And it is because SageMaker removes the complexity that holds back developer success It allows companies of all sizes to easily build sophisticated machine learning models—from prediction engines to intelligent applications and processes Industry leaders use SageMaker to transform their business—let’s take a look at some successes…