Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kudu Forrester Webinar

720 views

Published on


Topics including: The transformative value of real-time data and analytics, and current barriers to adoption. The importance of an end-to-end solution for data-in-motion that includes ingestion, processing, and serving. Apache Kudu’s role in simplifying real-time architectures.

Published in: Technology
  • Be the first to comment

Kudu Forrester Webinar

  1. 1. 1© Cloudera, Inc. All rights reserved. Apache Kudu Webinar Series Understanding and Unlocking the Value of Real-Time Data Ryan Lippert | Cloudera Michele Goetz | Forrester (Special Guest)
  2. 2. 2© Cloudera, Inc. All rights reserved. Kudu Webinar Series Part 1: Lambda Architectures – Simplified by Apache Kudu A look into the potential trouble involved with a lambda architecture, and how Apache Kudu can dramatically simplify real-time analytics. Part 2: Extending the Capabilities of Operational and Analytical Databases An examination of how Apache Kudu expands the set of use cases that Cloudera’s Operational and Analytical databases can handle. Part 3: Data-in-Motion: Unlock the Value of Real-Time Data Forrester will discuss their research into real-time data pipelines and analytics, and Cloudera will discuss how to make it a reality. Part 4: Techincal Deep-Dive into Apache Kudu An in-depth examination of the technical architecture and design of Apache Kudu, straight from a PMC Member.
  3. 3. 3© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for big data with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase OTHER Object Store FILESYSTEM HDFS RELATIONAL Kudu
  4. 4. 4© Cloudera, Inc. All rights reserved. Ingest data of any type or volume Process data as it arrives Serve data to users and applications Real-Time Data
  5. 5. 5© Cloudera, Inc. All rights reserved. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  6. 6. © 2017 FORRESTER. REPRODUCTION PROHIBITED. Michele Goetz Special Guest Speaker Principal Analyst Serving Enterprise Architecture Professionals
  7. 7. 7© 2017 FORRESTER. REPRODUCTION PROHIBITED. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  8. 8. 8© 2017 FORRESTER. REPRODUCTION PROHIBITED. Superior CX depends on data and insights
  9. 9. 9© 2017 FORRESTER. REPRODUCTION PROHIBITED. Fraud and risk management requires real-time data
  10. 10. 10© 2017 FORRESTER. REPRODUCTION PROHIBITED. IoT heat map shows where data matters most, now
  11. 11. 11© 2017 FORRESTER. REPRODUCTION PROHIBITED. Data bottlenecks are catalysts for transition
  12. 12. 12© 2017 FORRESTER. REPRODUCTION PROHIBITED. Create a road map for a real-time, agile data platform
  13. 13. 13© 2017 FORRESTER. REPRODUCTION PROHIBITED. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  14. 14. 14© 2017 FORRESTER. REPRODUCTION PROHIBITED. Leaders are focused on the technologies that allow data and insights to be consumed across the organization What are your firm's plans for the following data driven initiatives? Base: 3005 global data and analytics decision-makers. Source: Business Technographics® Global Data & Analytics Survey, 2016 51% 51% 51% 51% 51% 49% 52% 52% 54% 54% 58% 22% 22% 22% 22% 22% 24% 22% 23% 22% 23% 22% Creating an organizational center of excellence for business intelligence Combine content management and data management programs into a unified information management program Changing our processes to promote data stewardship and sharing Investing in platforms to and share out data content Creating a business led data stewardship or governance program Changing management incentives to promote data sharing Implementing analytics insights in software systems to aid customers or support employee decisions. Investing more in business friendly, self-service visualization and analytics Engaging external services providers or strategic business consultants for data and analytics or insights services Providing data preparation tools for self-service data management Investing in distributed real time insight delivery technology Expanding/Implemented Planning to implement within the next 12 months
  15. 15. 15© 2017 FORRESTER. REPRODUCTION PROHIBITED. Base: 325 global data and analytics technology decision-makers. “Don’t know” not shown. Source: Business Technographics® Global Data & Analytics Survey, 2016 Which of the following describes your [TDM=”IT budget data and analytics technology or services”; BDM=”business budget for data and analytics technology or services”] from 2015 to 2016? 4% 5% 6% 6% 22% 26% 30% 0% 5% 10% 15% 20% 25% 30% 35% Decrease by 5% to 10% Don’t know Decrease by 1-4% Increase by more than 10% Increase by 5% to 10% Increase by 1-4% Stay about the same 54% of data and analytics technology decision-makers increased their budgets for data and analytics from 2015 to 2016 54%
  16. 16. 16© 2017 FORRESTER. REPRODUCTION PROHIBITED. Companies of all sizes are spending millions for data & analytics Note: Don’t know excluded. Base: 765*, 1,288 global data and analytics decision makers Source: Business Technographics® Global Data & Analytics Survey, 2016 Please estimate, in millions, how much your data and analytics budget is for 2016? (Note: Number is in US Dollars) 55% 22% 9% 1% 1% 0% 0% 32% 30% 13% 4% 2% 2% 1% Less than $1 million $1 million to under $10 million $10 million to under $100 million $100 million to under $500 million $500 million to under $ 1 billion $1 billion to under $5 billion $5 billion or more SMB (20-999 employees)* Enterprise (1,000 or more employees)
  17. 17. 17© 2017 FORRESTER. REPRODUCTION PROHIBITED. Among the DM technologies Forrester tracks, interest for stream processing tools has grown the most YoY What are your firm's plans to use the following data management technologies? Base: 2094 and *1805 global data and analytics technology decision-makers. Source: Business Technographics® Global Data & Analytics Survey, 2016 % with commitment % with interest, but no immediate plans +5 p.p. +3 p.p. -2 p.p. -1 p.p. -2 p.p. -3 p.p. % with commitment (expanding, implemented, or planning to implement in the next 12 months) 59% 61% 63% 63% 60% 59% 64% 64% 61% 62% 58% 56% Stream processing tools Inverted index database Distributed NoSQL databases Hadoop Associative index databases RDF, triple store -20% -19% -19% -20% -19% -19% -13% -13% -16% -14% -14% -13%
  18. 18. 18© 2017 FORRESTER. REPRODUCTION PROHIBITED. Base: Total: 2094 Source: Business Technographics® Global Data & Analytics Survey, 2016 Which of the following are included in your plans for big data? 16% 18% 22% 23% 23% 26% 26% 27% 28% 30% 33% 36% 40% NoSQL other than Hadoop A MPP (massively parallel processing) data warehouse Semantic technologies (ontology building, search, auto curation, graph, etc.) Hadoop (including Hbase or Accumulo) Data anonymization or de-identification Creating or building out a data lake Marketing or digital data management platforms and service providers that brand their offerings as big data Packaged analytics technologies that brand themselves as big data Unstructured data mining / analytics Distributed in memory databases, grids, analytics tools Streaming analytics / computing Large scale predictive modeling, data mining or other advanced analytics Public cloud big data services Streaming analytics high in the list of big data plans
  19. 19. 19© Cloudera, Inc. All rights reserved. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  20. 20. 20© Cloudera, Inc. All rights reserved. Trend Towards Real-Time Data Platforms is Clear Drivers for Real-Time Platforms • Enhancing customer experiences • Risk Management • Advancement of IoT and broader instrumentation Adoption is Accelerating • Top data-driven initiative by investment: distributed delivery of real-time data • DM technology with highest momentum: stream processing • Top big data plans: streaming analytics is top 3 • Broad, large investments: 90% of decision makers are either continuing or increasing their investments in data and analytics; millions/billions being spent
  21. 21. 21© Cloudera, Inc. All rights reserved. The Underlying Driver What drives a use case to real-time? High Frequency Trading APT Detection Fraud Detection Predictive Maintenance Next Best Offer Inventory Management Shipping/Logistic Systems CRM Systems Employee Management Strategic Planning Real-time data management use cases are defined by a common set of characteristics. • Narrow time window in which to make a decision (automated or manual) • Opportunity for the data points to change the decision path • Decreasing value of data over time Not all use cases have a pressing need for real-time data. • Broader strategic decisions, for example, do not require real-time data input • Over time, decreases in HW costs and increases in availability of real-time systems will lead most use cases to be conducted in real-time Real Time Some Latency Acceptable
  22. 22. 22© Cloudera, Inc. All rights reserved. Moving to Real-Time and Leveraging Analytics What do we have to gain? “Monitoring System” Sensors are automatically monitored and programmed to deliver warnings when readings are delivered outside of an “optimal zone”. Basic models developed over small subsets of data. “Predictive System” Ingestion and processing of all sensor data into an unlimited data store with analytic capabilities enables machine learning, which can provide automated optimization and predictive maintenance. “Only 1 percent of data from an oil rig with 30,000 sensors is examined. The data that are used today are mostly for anomaly detection and control, not optimization and prediction, which provide the greatest value.” - McKinsey & Company Traditional Architectures Real-Time Analytic Capabilities
  23. 23. 23© Cloudera, Inc. All rights reserved. Ingest data of any type or volume Process data as it arrives Serve data to users and applications Real-Time Data
  24. 24. 24© Cloudera, Inc. All rights reserved. Ingestion at Cloudera • Apache Sqoop for data from relational databases • Apache Flume for logs, event based data • Apache Kafka is fast, scalable, and fault-tolerant messaging Partners, such as Streamsets, provide rich visualization tools Ingestion in Real-Time Stream Ingestion is a Must for Many Use Cases Ingestion isn’t just about internal business data anymore. • Traditional ingestion was internally focused, and often a matter of moving data from one silo or system to another • Today, businesses aim to take in data from a variety of external sources, IoT sensors, and machine-generated (user/network) data Your data journey can’t start until the data arrives. • Each step of the ingest/process/serve data pipeline must occur at real-time speed if decisions are to be made in time to affect the course of business Visualization help practitioners understand their data. • Complex tasks can be made less complex via graphical representations; data ingestion is no different
  25. 25. 25© Cloudera, Inc. All rights reserved. Stream Processing at Cloudera Spark Streaming, the leading open-source framework for real- time use cases, is deployed in Cloudera’s real-time architectures. Cloudera has the broadest base of Hadoop-adjacent experience with Spark and integrating it with Apache components. Ingestion in Real-Time Unlocking Value at Speed For some use cases, batch just isn’t enough. • Batch processing can lead to bottlenecks and delays in data transformations that cause missed opportunities. Apache Spark is gaining momentum for a reason. • Leveraging Apache Spark for stream processing enables real- time use cases with sub-second latency and best-in-class API’s. Spark has a best-in-class ecosystem. • Machine learning (via MLlib) is seamlessly integrated into Spark. • Broadest set of vendors and contributors working on Spark among available processing engines, leading to rapid innovation.
  26. 26. 26© Cloudera, Inc. All rights reserved. Data Serving at Cloudera Apache Kudu provides batch analysis and real-time serving within the same storage layer Apache HBase yields the best read/write performance Cloudera Search enables SQL-like faceted search in natural language Apache Kafka can be used to serve data to applications and users Serving in Real-Time Inject Data into Real-Time Decisions You need options that suit your use case. • Platform proliferation hurts IT departments as skillsets are divided; fewer platforms with broad capabilities help. Apache Kudu changes the game for open source software. • Combining real-time serving with analytic scans through a relational database had taken a complex lambda architecture until Kudu • Together, simplification and affordability should drive more use cases to real-time automated processes, in turn driving increased revenue, decreased risk, and better service for companies deploying Kudu
  27. 27. 27© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Apache Kudu: Filling the Analytic Gap Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  28. 28. 28© Cloudera, Inc. All rights reserved. Real-Time Data Analysis at Work Customer 360  “Next Best Offer 2.0” Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Customer Interaction Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling User’s navigation returns the results they are looking for, in addition to offers and suggestions hyper-customized for them. Illustrative, models will likely have >2 dimensions
  29. 29. 29© Cloudera, Inc. All rights reserved. Machine Learning Kudu opens the door to machine learning Kudu provides the ability to leverage real-time updates and analytic scans together - critical for many machine learning applications. Source: GHOSTS IN THE MACHINE: Artificial intelligence, risks and regulation in financial markets
  30. 30. 30© Cloudera, Inc. All rights reserved. The Time for Real-Time Data and Analytics is Now. And the platform for it is Cloudera Enterprise.
  31. 31. 31© Cloudera, Inc. All rights reserved.

×