Submit Search
Upload
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
•
0 likes
•
1 view
S
SasikumarPalanivel3
Follow
AWS cloud for data lake
Read less
Read more
Internet
Report
Share
Report
Share
1 of 37
Download now
Download to read offline
Recommended
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
Amazon Web Services
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Amazon Web Services
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Amazon Web Services
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Amazon Web Services
Recommended
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
Amazon Web Services
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Amazon Web Services
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Amazon Web Services
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Amazon Web Services
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Amazon Web Services
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
Implementing a Data Lake
Implementing a Data Lake
Amazon Web Services
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Amazon Web Services
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Amazon Web Services
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Amazon Web Services
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Amazon Web Services
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Amazon Web Services
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
divyansh0kumar0
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
girls4nights
More Related Content
Similar to Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Amazon Web Services
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
Implementing a Data Lake
Implementing a Data Lake
Amazon Web Services
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Amazon Web Services
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Amazon Web Services
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Amazon Web Services
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Amazon Web Services
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Amazon Web Services
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Similar to Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
(20)
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Implementing a Data Lake
Implementing a Data Lake
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Recently uploaded
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
divyansh0kumar0
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
girls4nights
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
rahman018755
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
Fs
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
APNIC
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
soniya singh
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
APNIC
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
APNIC
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
anamikaraghav4
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
Christopher H Felton
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
anamikaraghav4
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
anamikaraghav4
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
3sw2qly1
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural in villages of india
imessage0108
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
bigorange77
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
soniya singh
Recently uploaded
(20)
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural in villages of india
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
1.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Building Your Data Lake on AWS Luke Anderson Business Development, AWS
2.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1. Defining the Data Lake 2. Reducing Costs 3. Increasing Performance 4. Planning for the Future
3.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Rethink how to become a data-driven business • Business outcomes • Experimentation • Agile and timely
4.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Traditionally, Analytics looked like this (Duplication & Sprawl) Hadoop Spark NoSQL Storage Arrays Databases Data Warehouse Structured Data SQL Raw Data ETL Advanced Analytics ETL
5.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Defining the AWS data lake Data lake is an architecture with a virtually limitless centralized storage platform capable of categorization, processing, analysis, and consumption of heterogeneous data sets Key data lake attributes • Decoupled storage and compute • Rapid ingest and transformation • Secure multi-tenancy • Query in place • Schema on read
6.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Data Lake Components Any analytic workload, any scale, at the lowest possible cost Insights Analytics Data Lake Data Movement QuickSight SageMaker Glue (ETL & Data Catalog) S3/Glacier (Storage) Redshift +Spectrum EMR Athena Elasticsearch service Kinesis Data Analytics Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams Real-time Comprehend DW Big data processing Interactive
7.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrations Most ways to bring data in Reasons to choose Amazon S3 for data lake
8.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Reducing Data Lake Costs
9.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Optimize costs with data tiering Hot Cold Amazon S3 standard Amazon S3— infrequent access Amazon Glacier HDFS Use EMR/Hadoop with local HDFS for hottest data sets Store cooler data in S3 and cold in Glacier to reduce costs Use S3 Analytics to optimize tiering strategy S3 Analytics
10.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Process data in place… Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue Amazon S3
11.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR: Decouple compute & storage Highly distributed processing frameworks such as Hadoop/Spark Compress datasets Columnar file formats Aggregate small files S3distcp “group-by” clause
12.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Spectrum: Exabyte Scale query-in-place Structured data w/ joins Multiple on-demand clusters-scale concurrency Columnar file formats Data partitioning Better query performance with predicate pushdown
13.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena: Query without ETL Serverless service Schema on read Compress datasets Columnar file formats Optimize file sizes Optimize querying (Presto backend) Query Data in Glacier (Coming)
14.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
15.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to…. entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3 Amazon Glacier
16.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Select Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
17.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Motivation Behind S3 Select GET all the data from S3 objects, and my application will filter the data that I need Redshift Spectrum Example: • Beta customer: Run 50,000 queries • Amount of data fetched from S3: 6 PBs • Amount of data used in Redshift: 650 TB Data needed from S3: 10%
18.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select SELECT a filtered set of data from within an object using standard SQL Statements • First content aware API within Amazon S3 • Unlike Amazon Athena and Spectrum, operates within the Amazon S3 system • SQL Statement operates on a per-object basis—not across a group of objects • Works and scales like GET requests • Accessible via SDK (Java, Python), AWS CLI and Presto Connector—others to follow • Who will use it? • Amazon Redshift Spectrum, Amazon Athena, Presto and other custom Query engines • Everyone doing log mining
19.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
20.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
21.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select Lambda Trigger
22.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Before 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. Amazon S3 Select: Serverless MapReduce After 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. 2X Faster at 1/5 of the cost
23.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Demo – S3 Select Timing
24.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
25.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Before Amazon S3 Select: Accelerating big data After After 5X Faster with 1/40 of the CPU
26.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Using Amazon Glacier Select
27.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. How Amazon Glacier Select Works
28.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Delivering Results Faster
29.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Optimizing data lake performance Aggregate small files EMR: S3distcp Amazon Kinesis Firehose S3 Select Big data cheaper, faster Up to 400% faster Data Formats Columnar formats EMRFS consistent view Amazon S3 Amazon DynamoDB
30.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis—Real Time Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics SQL New
31.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
32.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue—Serverless Data catalog & ETL service Data Catalog ETL Job authoring Discover data and extract schema Auto-generates customizable ETL code in Python and Spark Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless
33.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker (GA) The quickest and easiest way to get ML models from idea to production End-to-End Machine Learning Platform Zero setup Flexible Model Training Pay by the second $
34.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Planning for the Future
35.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Transactional Data Stream Data Collect Store Analyze Visualize A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda Amazon Elastic MapReduce Amazon ElastiCache Search SQL NoSQL Cache Stream Processing Batch Interactive Logging Stream Storage IoT Applications File Storage Analysis & Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight File Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Evolve As Needed!
36.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Training Offer Make your data driven decisions count, and make a career in Big Data on AWS. Follow the Big Data Specialty learning path and become a specialist in Big Data: • Implement core AWS Big Data services according to best practices • Design and maintain Big Data • Leverage tools to automate data analysis Certified Cloud Practitioner Associate-level Certification AWS Certified Big Data - Specialty • Enterprise solutions architects • Data scientists • Big Data solutions architects • Data analysts Who should attend Free AWS digital training: Foundational knowledge Big Data on AWS – 3-day Classroom Training Free AWS digital training: Big Data Technology Fundamentals Visit www.aws.training to find out more.
37.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future. Thank You For Attending AWS Data Driven Decisions Webinar Series. aws-apac-marketing@amazon.com twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws
Download now