Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Build a Data Lake | AWS Summit Tel Aviv 2019

314 views

Published on

AWS makes it easy to build and operate a highly scalable and flexible data platforms to collect, process, and analyze data so you can get timely insights and react quickly to new information. In this session we will talk about how to improve over time using your data. How do you take your everyday data and build relevant business insights, to help and continuously improve your business processes, and keep your innovation going based on your data.

  • Be the first to comment

How to Build a Data Lake | AWS Summit Tel Aviv 2019

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How To Build a Data Lake Eden Perry Solutions Architect Amazon Web Services D E V 3 0 5 Adir Sharabi Solutions Architect Amazon Web Services
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda AWSome Airlines Recap Introduction to Data Lakes AWS Data Platform Services and Data Lakes Patterns Data Lake in Action: Building a Data Lake for AWSome Airlines and Developing Dashboards with Amazon QuickSight
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines Recap
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines Operational Dashboard
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines Operational Dashboard
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines High-Level Architecture FrontendData Microservices Common Interfaces Machine Learning Services Serverless Scheduler Data lake and Analytics Flights Resources 31 2 4 5
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. What About the Data? Resources Departures IoT Devices Weather Data Crews & Teams
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines Business Requirements 1. Establish a robust data pipeline that will capture and store all the generated data on AWSome Airlines 2. Provide business insights from the collected data, track KPIs and gain deep visibility in order to optimize the business flows
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introduction to Data Lakes
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale Data Lake Definition
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. • All data in one place, a single source of truth • Support Different Formats - structured/semi-structured/unstructured/raw data • Supports fast ingestion and consumption • Schema on read • Designed for low-cost storage • Decouples storage and compute • Supports protection and security rules Data Lake Main Concepts
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Simplified Data Pipeline Data Sources Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Multiple Data Sources Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Ingest Process & Analyze Consume Amazon S3 Catalog Store Amazon S3 Store
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon DynamoDB Fully managed, multi-region, multi-master database Nonrelational database that delivers reliable performance at any scale Consistent single-digit millisecond latency Built-in security, backup and restore, in-memory Caching Support Streams
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Process & Analyze Consume Ingestion Options Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog Store Amazon S3 Store
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Real-time processing High throughput; elastic Easy to use Integrated with Amazon EMR, Amazon S3, Amazon Redshift, DynamoDB Amazon Kinesis
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams • For technical developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Data Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift, and Amazon Elasticsearch Amazon Kinesis Data Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming Data Made Easy
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis: Streaming Data Made Easy Amazon Kinesis Data Streams • For technical developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Data Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis Data Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift, and Amazon Elasticsearch
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift, and Amazon Elasticsearch Amazon Kinesis + AWS Lambda AWS Lambda • Run your code without provisioning servers • Allows to process and transform records on the fly +
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Storage Layer Process & Analyze Consume Catalog IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Amazon S3 Store Amazon S3
  23. 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Secure, highly scalable, durable object storage with millisecond latency for data access Store any type of data–web sites, mobile apps, corporate applications, and IoT sensors, at any scale Store data in the format you want: Unstructured (logs, dump files) | semi-structured (JSON, XML) | structured (CSV, Parquet) Storage lifecycle integration Amazon S3-Standard | Amazon S3-Infrequent Access | Amazon Glacier Amazon S3 is the Base
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Store Data Discovery and Catalog Amazon S3 Process & Analyze Consume Catalog AWS Glue IngestIngest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Store Amazon S3
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless AWS Glue - Serverless Data Catalog and ETL
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Store Amazon S3 Process & Analyze Process and Analyze Ingest Amazon Kinesis AWS Snowball Amazon MSK Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices Database Migration Service Catalog AWS Glue
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand Amazon Athena - Interactive Analysis
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Ingest Consume Amazon Kinesis BI Tools Querying the Data Lake Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Amazon DynamoDB Web logs / cookies ERP Connected devices
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight Supports variety of Data source and Targets Fully managed and scalable Super fast and easy to use Cost-effective
  30. 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lake in Action: Building a Data Lake for AWSome Airlines and
  31. 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSome Airlines Business Requirements 1. Establish a robust data pipeline that will capture and store all the generated data on AWSome Airlines 2. Provide business insights from the collected data, track KPIs and gain deep visibility in order to optimize the business flows
  32. 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Blocks for AWSome Airlines Data Lake
  33. 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  34. 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  35. 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  36. 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. What have we learned? What is and when do we need to build a Data Lake? AWS Data Lake Building Blocks and Patterns How to use Amazon QuickSight to visualize and transform data into business insights Reach out to your AWS Contact or to AWS Partners and start building your Data Lake!
  37. 37. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Eden Perry @edenperr Adir Sharabi @adirs http://bit.ly/2SGp8Ls

×