Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecting an Open Data Lake for the Enterprise

2,080 views

Published on

Learn how to reduce development time and innovate on AWS. In this webinar, Beachbody - sellers of fitness, weight loss, and muscle-building home-exercise videos - talks about their experience migrating to a data lake on Amazon Simple Storage Service (Amazon S3) using Talend. Beachbody will describe how they created an open enterprise data platform, giving their employees access to secure, well-governed data, and increasing DevOps efficiency across the entire company.

  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours  http://ishbv.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ♣♣♣ https://tinyurl.com/y6qaaou7
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • only 2 cups a day for 1 week for a flat stomach ♣♣♣ http://ishbv.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • For AWS Online Course Coupons and Discounts Visit http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Architecting an Open Data Lake for the Enterprise

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. October 31, 2017 | 10:00 AM PT Architecting an Open Data Lake for the Enterprise © 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today’s Presenters Pratap Ramamurthy, Solutions Architect, Amazon Web Services Ashwin Viswanath, Director, Cloud Product Marketing, Talend Eric Anderson, Executive Director, Data, Beachbody
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today’s Agenda • An overview of AWS and AWS Marketplace, with an emphasis on AWS data lake solutions and Talend • Overview of the Talend solutions featured in our story • Challenges faced by Beachbody • The Beachbody success story with AWS and Talend • Q&A/Discussion
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Objectives: 1. How to migrate a variety of structured and unstructured data sources to a data lake 2. How to shorten development and testing cycles 3. How to mitigate complex deployment challenges common to real-time data 4. How to take advantage of Spark and Hadoop by generating native code
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Data Lake and AWS Drive business value with any type of data
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy Data Warehouses & RDBMS • Complex to setup and manage • Do not scale • Takes months to add new data sources • Queries take too long • Cost $MM upfront
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Should I Build a Data Lake? Starting by amassing "all your data" and dumping into a large repository for the data gurus to start finding "insights" is like trying to win the lottery by buying all the tickets
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rethink How to Become a Data-driven Business • Business outcomes - start with the insights and actions you want to drive, then work backwards to a streamlined design • Experimentation - start small, test many ideas, keep the good ones and scale those up, paying only for what you consume • Agile and timely - deploy data processing infrastructure in minutes, not months. take advantage of a rich platform of services to respond quickly to changing business
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Case Determines Platform Design Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights START HERE WITH A BUSINESS CASE
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Experiment and Scale Based on Your Business Needs MATCH AVAILABLE DATA Metrics and Monitoring Workflow Logs ERP Transactions Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Outcomes on a Modern Data Architecture Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use an Optimal Combination of Highly Interoperable Services
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Amazon S3 for Modern Data Architecture? Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance  Multiple upload  Range GET  Store as much as you need  Scale storage and compute independently  No minimum usage commitments Scalable  Amazon EMR  Amazon Redshift  Amazon DynamoDB  Amazon Athena IntegratedEasy to use  Simple REST API  AWS SDKs  Read-after-create consistency  Event notification  Lifecycle policies
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Decouple Storage and Compute • Legacy design was large databases or data warehouses with integrated hardware • Big Data architectures often benefit from decoupling storage and compute
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Improving Data Agility with Talend Ashwin Viswanath, Director of Cloud Product Marketing, Talend
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes: The First Phase and the Future First phase: Capture and store raw data of many different types at scale Next phase: Augment enterprise data warehousing strategies
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Data Lake Projects Fail No DevOps Practices for Scalability & Testing Lack of Expertise Siloed Operating Model Poor Data Governance Poor Architectural Design & Integration
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Foundational Elements of a Data Lake Data Preparation Self-service Data IngestMetadata Management Data Classification Data Lake Data GovernanceData Lineage Security Data Profiling
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streamline DevOps Process for Big Data Custom Cluster Configuration • Retrieve Hadoop configuration data from job server • Upload configuration files to different clusters based on role: dev/test/prod • Enforce uniform security standards • Available for Spark and Spark Streaming jobs Portable integration jobs across your environment
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Matching and Machine Learning on Spark • New Data Stewardship interface simplifies matching process • Improved performance through continuous matching speeding time to insight Harmonize data at scale by learning from your key experts
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Faster and Better Real-time Insights with Spark Enterprise – class robustness and Intelligent Integration • New Spark Support • Production-ready with Spark 2.1 • Toggle between Spark 1.X and 2.X • Easily upgrade to Spark 2.X • Natural Language Processing with Spark • Data Preparation for Spark Streaming • Talend Data Mapper runs with Spark Streaming • Spark Streaming support for Kerberized Kafka 0.10
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Governance Complete End-to-end Data Lineage • Understand more about your unstructured data with new cloud and big data metadata bridges • Save time by automatically harvesting data structures to build a data lake inventory • Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase, Cassandra, Apache Atlas Files systems Amazon S3, Hadoop HDFS, Unix, Windows, Linux File formats CSV, Excel, JSON, Avro, Parquet Know Your Data for Increased Data Protection, Accessibility and Compliance
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Beachbody – Fitness goes Big Data Driving innovation with Talend on AWS
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Beachbody • A leading provider of fitness, nutrition, and weight-loss programs • Creator of P90X® Series, INSANITY®, FOCUS T25®, 21 Day Fix®, Body Beast®, PiYo®, and Hip Hop Abs® • Empowers over 23 Million customers • Supports 350K+ independent “Coach” distributors • Operates with 800+ employees • Sees 5 million+ monthly unique visits across digital platforms • Reached $1 billion in gross sales in 2015
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Challenge - Do More Better, Faster, Cheaper
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Project Vision – Open Enterprise Data Lake • Build an OPEN Enterprise Data Platform • Open Source Technology: Bring Your Own Tool • Decentralized Data Ownership: Many teams can publish • Centralized people, processes, and tools available • Capture All Data as real-time as possible • Access to All raw + processed data by Authorized Users • HIPAA/PII encrypted or masked to for compliance • Shift Time from Collecting Data -> Analyzing Data
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Technology
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Architecture Amazon S3 Data Lake folder structure AWS
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural Design
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component - Storage
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component – Data Pipeline
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component – RDBMS
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component – Compute
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component – Analytics
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Benefits • Reduced Data Acquisition Time by 5x • Improved Marketing Campaigns • Reduced Site Tagging Costs • Improved Employee Retention and Satisfaction • Automated Customer Self-Service Order Status • Identified Web Funnel Conversion Opportunities (testing now)
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next steps and further information • Data Lake solution on AWS: https://aws.amazon.com/big-data/data-lake-on-aws/ • Take a Free 30-Day Trial of Talend Integration Cloud: https://iam.integrationcloud.talend.com/idp/federation/up/login • Try AWS for free: https://aws.amazon.com/
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q & A
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×