Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automating Big Data Technologies for Faster Time-to-Value

421 views

Published on

Big data technologies can be extremely complex and require manual operation. If you can intelligently automate your Big Data operations then you can lower your costs, make your team more productive, scale more efficiently, and lower the risk of failure. Demandbase, creator of a targeting and personalization platform for business-to-business (B2B) companies, uses Qubole and a data lake on AWS to reduce the management complexities and costs of processing and analyzing their data. Hear how Qubole empowers Demandbase to analyze trillions of rows of structured and unstructured data in real time, making their data scientists and data engineers productive since day one.

  • Be the first to comment

  • Be the first to like this

Automating Big Data Technologies for Faster Time-to-Value

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 1, 2017 | 11:00 AM PT Automating Big Data Technologies for Faster Time- to-Value © 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today’s Presenters David Potes, Solutions Architect, Amazon Web Services Minesh Patel, Technical Director, Qubole Seth Myers, Senior Data Scientist, Demandbase
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today’s Agenda 1. An overview of AWS and AWS Marketplace, with an emphasis on AWS data lake solutions and Qubole 2. Overview of the Qubole solutions featured in our story 3. Challenges faced by Demandbase 4. The Demandbase success story with AWS and Qubole 5. Q&A/Discussion
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Objectives 1. How to dramatically reduce management complexities for analytics operations 2. How to reduce the costs of processing and analyzing data in a data lake on AWS 3. How to operate at the scale and efficiency of a large enterprise, with a small data team
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introduction to Data Lake Concepts
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unlocking Data Most companies and organizations are embarking on ambitious innovation initiatives to unlock their data. The data already exists but goes unused or is locked away from complimentary data sets in isolated data silos.
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enter Data Lake Architectures Data Lake is a new and increasingly popular architecture to store and analyze massive volumes and heterogeneous types of data.
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – All Data in One Place Store and analyze all of your data, from all of your sources, in one centralized location. “Why is the data distributed in many locations? Where is the single source of truth ?”
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Quick Ingest Quickly ingest data without needing to force it into a pre-defined schema. “How can I collect data quickly from various sources and store it efficiently?”
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of a Data Lake – Schema on Read “Is there a way I can apply multiple analytics and processing frameworks to the same data?” A Data Lake enables ad-hoc analysis by applying schemas on read, not write.
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Approach to Data Lake
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 is the Data Lake
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Designed Benefits of an Amazon S3 Data Lake Fixed Cluster Data Lake Amazon S3 Data Lake • Limited to only the single tool contained on the cluster (i.e. Hadoop or data warehouse or Cassandra, etc.). Use cases & ecosystem tools change rapidly • Expensive to add nodes to add storage capacity • Expensive to replicate data against node loss • Complexity in scaling local storage capacity • Long refresh cycles to add additional storage equipment • Decouple storage and compute by making Amazon S3 object based storage, not a fixed tool cluster the data lake • Flexibility to use any and all tools in the ecosystem. The right tool for the job • Future proof your architecture. As new use cases and new tools emerge you can plug and play current best of breed.
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Amazon S3 for Data Lake? Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance  Multiple upload  Range GET  Store as much as you need  Scale storage and compute independently  No minimum usage commitments Scalable  Amazon EMR  Amazon Redshift  Amazon DynamoDB Integrated  Simple REST API  AWS SDKs  Read-after-create consistency  Event notification  Lifecycle policies Easy to use
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Automating Complex Tasks Qubole makes Big Data technologies swift and simple
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Qubole One of the largest cloud- agnostic Big Data as a Service companies Founded by the pioneers of “big data” @ Facebook and the creators of Apache Hive
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Poll Question #1 What is the status of your big data initiative?
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Vision
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Qubole Data Service Amazon
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Autonomous Data Management
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Qubole Cloud Agents
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Total Cost Savings Among Qubole Customers in 2016 and 2017 Cluster Life Cycle Management $150M Workload- aware Autoscaling $121M Spot Shopper $40M  Cluster Life Cycle Management Savings – Amount saved by automatically terminating a cluster when inactive  Workload-aware Auto-scaling Saving – Amount saved by predictively adjusting the number of nodes to meet demand  Spot Shopper savings – Amount saved by utilizing SPOT instances
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural Diagram
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Poll Question #2 What big data technology are you using or evaluating?
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Qubole?
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demandbase Automates With Qubole Demandbase provides more value for their B2B marketing customers by automating Big Data and Machine Learning operations.
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Who is Demandbase? Demandbase is a B2B marketing automation company that leverages artificial intelligence to automate all aspects of the advertising, selling, and marketing process.
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Challenge • Many factors determine which accounts a business should target • Do they have a need/budget for the product? • Are they currently in-market for the product? • Do they have decision makers ready to buy? • These insights must come from many different types of big datasets • Demandbase’s previous account identification tool took multiple days to run • Our clients could not iterate or modify their strategies with such slow turn-around
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Data Used to Identify Accounts • To determine an account’s need for the product • We have firmographic information on 14 Million accounts • We’ve built a knowledge graph of all accounts using NLP technology that crawls 350 TB of web pages a month • To determine if an account is in-market • We track 700 Billion web interactions a year, each one mapped to employees across all accounts • To identify decision makers • We are currently tracking over a 100 Million employees across all accounts
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. All 14M accounts are scored, top 5K available to user Keywords extracts from 700B web interactions Buyers at each account identified from 100M+ contacts Company 2 Company 3
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Solution • The user requests a new list of accounts with a button- press • 60 EC2 servers are spun up • A machine learning algorithm is built using Spark and MLLIB • For each of 14 Million accounts • Information about relevant web interactions, buyers, online content, etc. fed into machine learning model • The model scores each account • Top 5K accounts are pushed to web app, along with relevant info • From button-press to new account list – 20 minutes
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Qubole Makes This Possible • Qubole manages all of our EC2 instances • So far, we’ve tested 20 different concurrent models (20 X 60 EC2 servers) successfully • Qubole keeps our costs down through dynamic bidding and heterogeneous server clusters • Our web app calls Qubole’s easy-to-implement Play API, which spins up the EC2 instances and deploys our Spark job • With Qubole taking care of the infrastructure, we could focus on developing the machine learning • Qubole allowed us to build a self-serve machine-learning-as-service solution
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next Steps and Further Information • Try a pre-configured production-ready Qubole deployment on AWS Data Lake: • https://aws.amazon.com/quickstart/architecture/qubole-on-data-lake-foundation/ • Buy on AWS Marketplace: • https://aws.amazon.com/marketplace/pp/B06XX76R24 • Learn more about Qubole: • https://www.qubole.com/products/qds-for-aws/ • Learn more about Demandbase: • https://www.demandbase.com/technology/ • Try AWS: • https://aws.amazon.com/
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q & A
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×