Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to AWS Initiate Day Dublin 2019 – Big Data Meets AI(20)

Advertisement

More from Amazon Web Services(20)

AWS Initiate Day Dublin 2019 – Big Data Meets AI

  1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Wes Neary Solutions Architect, UKIR Public Sector Thursday 11th April 2019 Big Data Meets AI Driving Insights and Adding Intelligence to Your Solutions
  2. What we’ll cover • Big Data and why organizations care • Common Challenges - Which,What,Hows… • AWS Big Data Solutions • Big Data Driving Machine Learning • Final Design Tenets
  3. VisualizationVariability Big Data Is Defined Many Different Ways Volume Velocity Variety Veracity Value
  4. Data is a strategic asset for every organization The world’s most valuable resource is no longer oil, but data.* *Copyright: The Economist, 2017, David Parkins “ ”
  5. Organizations that successfully generate business value from their data will outperform their peers. An Aberdeen survey saw organizations who implemented a data lake outperforming similar companies by 9% in organic revenue growth.* 24% 15% Leaders Followers Organic revenue growth *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Most Important: Driving Value from Data
  6. Customers want more value from their data Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications
  7. Data Lakes Extend Traditional Approaches Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and nonrelational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  8. Data lakes on AWS Durable and available; EB scale Secure, compliant, auditable Object-level controls for fine-grain access Fast performance by retrieving subsets of data The most ways to bring data in 2x as many integrations with partners Broad set of analytics and ML services S3 Lake Formation & Glue Snowball Kinesis Data Streams Snowmobile Kinesis Data Firehose Redshift EMR Athena Kinesis Elasticsearch Service SageMaker Comprehend Rekognition
  9. Data Lake Components
  10. Common Questions
  11. WHICH tool should I use?
  12. Purpose-built engines. Right tool for the right job. WHICH tool should I use?
  13. WHAT Data Do I Have? Gartner: “Through 2018, 80% of data lakes will not include effective metadata management capabilities, making them inefficient." Data Lake on AWS Storage | Archival Storage | Data Catalog
  14. Set up a catalog, ETL, and data prep with AWS Glue Serverless provisioning, configuration, and scaling to run your ETL jobs Pay only for the resources used for jobs Crawl your data sources, identify data formats and suggest schemas and transformations Automates the effort in building, maintaining and running ETL jobs
  15. “Beeswax uses Amazon S3 and AWS Glue Data Catalog to build a highly reliable data lake that is fully managed by AWS. Our platform leverages the AWS Glue Data Catalog integration with Amazon EMR in Hive and Spark SQL applications to deliver reporting and optimization features to our customers.” —Ram Kumar Rengaswamy, CTO, Beeswax
  16. MOST Important: Selecting an Agile Framework Start with a tool that will serve the purpose Experiment, Test, Iterate, Adopt. HOW can I get started? Let’s look at an example: Evolution of Netflix Data pipeline
  17. Aggregate and upload events to Hadoop/Hive for batch processing EXPERIMENT new things Batch  Batch+ Real-time
  18. Chukwa front-end  Kafka Kafka front-endKafka ADAPT your solution
  19. “Amazon Kinesis Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds. We can discover and respond to issues in real time, ensuring high availability and a great customer experience.” FOCUS on business value
  20. Big Data Processing & Analytics
  21. Big data processing with Apache Spark & Hadoop with Amazon EMR • Easy to use notebooks • Low cost vs on-premises • Elastic autoscaling • Reliable 99.9% SLA • Secure with encryption and keys • Flexible, open source choice Enterprise-grade Easy Lowest cost
  22. FINRA’s legacy system did not scale to handle 135 billion events per day. They needed to run complex surveillance queries over 20+ PB of data FINRA migrated their big data appliance to a S3 Data Lake and uses EMR for ingestion and processing
  23. The Forrester Wave Cloud Hadoop/Spark Platforms Q1 2019 The 11 Providers That Matter Most and How They Stack Up by Noel Yuhanna and Mike Gualtieri February 13, 2019 The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave™. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
  24. Data warehouse for business reporting with Amazon RedShift • Fast—up to 10x faster than traditional data warehouses • Easy to setup, deploy and manage • Cost-effective • Scale on-demand for large data volume and high query concurrency • Query data in open formats directly from the data lake
  25. “20 percent of our queries now complete in less than one second. Best of all, we didn’t have to change anything to get this speed-up with Redshift, which supports our mission-critical workloads.” —Greg Rokita, Executive Director of Technology, Edmunds
  26. Real-time analytics for timely insights with Amazon Kinesis • Make streaming data available to multiple real-time analytics applications • Run streaming applications without managing any infrastructure • Durable to reduce the probability of data loss • Scalable to process data from hundreds of thousands of sources with low latencies
  27. “Amazon Kinesis makes it simple to scale our solution end to end, including the capture, processing, and delivery of actionable insights. This empowers our customers to better understand their user base.” — Indu Narayan, Director of Data, Yieldmo
  28. Operational analytics for logs and search with Amazon Elasticsearch • Fully managed; deploy production-ready cluster in minutes • Direct access to Elasticsearch open-source APIs, Logstash and Kibana • VPC support; at-rest and in-transit encryption • Scale up and down easily
  29. “Ultimately, we are improving our software products and offering better service to our customers because of the real-time visibility we’re getting into log data.” “Amazon Elasticsearch Service enables data forensic activities to take place and help find and fix application problems faster.” —Tommy Li, Senior Software Architect, Autodesk
  30. Interactive analysis with Amazon Athena • Interactive query service to analyze data in Amazon S3 using standard SQL • No infrastructure to set up or manage and no data to load • Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
  31. “We only pay when we’re actually querying the data, and we don’t have to keep a cluster running all the time. Using Amazon Athena, we’re able to query seven years’ worth of data—adding up to hundreds of terabytes—get results at least 50 percent faster, and save nearly $15,000 per month.” —Matt Chesler, director of DevOps at Movable Ink
  32. Serverless analytics Deliver on-demand analytics on the data lake S3 Data lake Glue (ETL & Data Catalog) Athena QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AWS IoT AI/ML Devices Web Sensors Social
  33. Machine Learning and Big Data
  34. Big Data driving Machine Learning Better Decisions Better Products More Users More DataClick stream User activity Generated content Purchases Clicks Likes Sensor data Object Storage Databases Data warehouse Streaming analytics BI Hadoop Spark/Presto Elasticsearch Machine Learning Deep Learning/ AI
  35. M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O Vision Speech Language Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T Forecasting T E X T R A C T P E R S O N A L I Z E Recommendations D E P L O Y Pre-built algorithms & notebooks Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) One-click deployment & hosting M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 N E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Reinforcement learningAlgorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G ) Agility in Machine Learning – for all users
  36. Visual insights for everyone with Amazon QuickSight • Pay only for what you use • Scale to tens of thousands of users • Embedded analytics • Build end-to-end BI solutions • ML Insights
  37. RNIB creates and distributes accessible information in the form of synthesized content • Largest library of audiobooks in the UK for nearly 2 million people with sight loss • Naturalness of generated speech is critical to captivate and engage readers • No restrictions on speech redistributions Supporting people with sight loss using Amazon Polly Amazon Polly delivers incredibly lifelike voices which captivate and engage our readers. John Worsfold Solutions Implementation Manager, RNIB ““
  38. Saving lives with Amazon SageMaker “The scalability of Amazon SageMaker, and its ability to integrate with native AWS services, adds enormous value for us. We are excited about how our continued collaboration between the GE Health Cloud and Amazon SageMaker will drive better outcomes for our healthcare provider partners and deliver improved patient care.” - Sharath Pasupunuti, AI Engineering Leader
  39. Core Tenets • Build decoupled systems • Use the right tool for the job • Leverage managed and serverless services • Use event-journal design patterns • Be cost-conscious • Machine learning (ML) enable your application • Replace capacity planning with a consumption model • Don’t forget metadata management
  40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. wesneary@amazon.co.uk Thank you
Advertisement