SlideShare a Scribd company logo
1 of 31
1© Cloudera, Inc. All rights reserved.
Apache Kudu Webinar Series
Understanding and Unlocking
the Value of Real-Time Data
Ryan Lippert | Cloudera
Michele Goetz | Forrester (Special Guest)
2© Cloudera, Inc. All rights reserved.
Kudu Webinar Series
Part 1: Lambda Architectures – Simplified by Apache Kudu
A look into the potential trouble involved with a lambda architecture, and how Apache Kudu can
dramatically simplify real-time analytics.
Part 2: Extending the Capabilities of Operational and Analytical Databases
An examination of how Apache Kudu expands the set of use cases that Cloudera’s Operational and
Analytical databases can handle.
Part 3: Data-in-Motion: Unlock the Value of Real-Time Data
Forrester will discuss their research into real-time data pipelines and analytics, and Cloudera will
discuss how to make it a reality.
Part 4: Techincal Deep-Dive into Apache Kudu
An in-depth examination of the technical architecture and design of Apache Kudu, straight from a PMC
Member.
3© Cloudera, Inc. All rights reserved.
Updateable Analytic Storage
Simple real-time analytics and updates with Apache Kudu
Kudu: Storage for fast analytics on fast data
• Simplified architecture for building real-time analytic
applications
• Designed for next-generation hardware for faster analytic
performance across frameworks
• Native Hadoop storage engine
Flexibility for the right tools for the right use
case in one platform
• Only analytic database for big data with Kudu + Impala
• Simple real-time applications with Kudu + Spark
Use cases
• Time series data
• Machine data analytics
• Online reporting
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
OTHER
Kite
NoSQL
HBase
OTHER
Object Store
FILESYSTEM
HDFS
RELATIONAL
Kudu
4© Cloudera, Inc. All rights reserved.
Ingest data of any
type or volume
Process data as it
arrives
Serve data to users
and applications
Real-Time Data
5© Cloudera, Inc. All rights reserved.
Agenda
Drivers for agile, real-time data platforms
The key use cases that are driving businesses towards real time
platforms?
Data on adoption trends for real-time technologies
What is Forrester seeing in the market for real-time technologies?
Deploying a real-time OSS achitecture to grow your business
How can you build a scalable, cost-effective platform to grow your
business?
© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Michele Goetz
Special Guest Speaker
Principal Analyst Serving Enterprise Architecture Professionals
7© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Agenda
Drivers for agile, real-time data platforms
The key use cases that are driving businesses towards real time
platforms?
Data on adoption trends for real-time technologies
What is Forrester seeing in the market for real-time technologies?
Deploying a real-time OSS achitecture to grow your business
How can you build a scalable, cost-effective platform to grow your
business?
8© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Superior CX depends on data and insights
9© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Fraud and risk management requires real-time data
10© 2017 FORRESTER. REPRODUCTION PROHIBITED.
IoT heat map shows where data matters most, now
11© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Data bottlenecks are catalysts for transition
12© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Create a road map for a real-time, agile data platform
13© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Agenda
Drivers for agile, real-time data platforms
The key use cases that are driving businesses towards real time
platforms?
Data on adoption trends for real-time technologies
What is Forrester seeing in the market for real-time technologies?
Deploying a real-time OSS achitecture to grow your business
How can you build a scalable, cost-effective platform to grow your
business?
14© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Leaders are focused on the technologies that allow data and
insights to be consumed across the organization
What are your firm's plans for the following data driven initiatives?
Base: 3005 global data and analytics decision-makers.
Source: Business Technographics® Global Data & Analytics Survey, 2016
51%
51%
51%
51%
51%
49%
52%
52%
54%
54%
58%
22%
22%
22%
22%
22%
24%
22%
23%
22%
23%
22%
Creating an organizational center of excellence for business intelligence
Combine content management and data management programs into a unified information management
program
Changing our processes to promote data stewardship and sharing
Investing in platforms to and share out data content
Creating a business led data stewardship or governance program
Changing management incentives to promote data sharing
Implementing analytics insights in software systems to aid customers or support employee decisions.
Investing more in business friendly, self-service visualization and analytics
Engaging external services providers or strategic business consultants for data and analytics or insights
services
Providing data preparation tools for self-service data management
Investing in distributed real time insight delivery technology
Expanding/Implemented Planning to implement within the next 12 months
15© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Base: 325 global data and analytics technology decision-makers. “Don’t know” not shown.
Source: Business Technographics® Global Data & Analytics Survey, 2016
Which of the following describes your [TDM=”IT budget data and analytics technology or
services”; BDM=”business budget
for data and analytics technology or services”] from 2015 to 2016?
4%
5%
6%
6%
22%
26%
30%
0% 5% 10% 15% 20% 25% 30% 35%
Decrease by 5% to 10%
Don’t know
Decrease by 1-4%
Increase by more than 10%
Increase by 5% to 10%
Increase by 1-4%
Stay about the same
54% of data and analytics technology decision-makers increased
their budgets for data and analytics from 2015 to 2016
54%
16© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Companies of all sizes are spending millions for data & analytics
Note: Don’t know excluded. Base: 765*, 1,288 global data and analytics decision makers
Source: Business Technographics® Global Data & Analytics Survey, 2016
Please estimate, in millions, how much your data and analytics budget is for 2016? (Note:
Number is in US Dollars)
55%
22%
9%
1% 1% 0% 0%
32%
30%
13%
4%
2% 2% 1%
Less than $1 million $1 million to under $10
million
$10 million to under $100
million
$100 million to under
$500 million
$500 million to under $ 1
billion
$1 billion to under $5
billion
$5 billion or more
SMB (20-999 employees)*
Enterprise (1,000 or more employees)
17© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Among the DM technologies Forrester tracks, interest for stream
processing tools has grown the most YoY
What are your firm's plans to use the following data management technologies?
Base: 2094 and *1805 global data and analytics technology decision-makers.
Source: Business Technographics® Global Data & Analytics Survey, 2016
% with
commitment
% with
interest, but
no immediate
plans
+5 p.p. +3 p.p. -2 p.p. -1 p.p. -2 p.p. -3 p.p.
% with commitment (expanding, implemented, or planning to implement in the next 12 months)
59%
61%
63% 63%
60% 59%
64% 64%
61% 62%
58% 56%
Stream processing tools Inverted index database Distributed NoSQL
databases
Hadoop Associative index
databases
RDF, triple store
-20% -19% -19% -20% -19% -19%
-13% -13% -16% -14% -14% -13%
18© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Base: Total: 2094
Source: Business Technographics® Global Data & Analytics Survey, 2016
Which of the following are included in your plans for big data?
16%
18%
22%
23%
23%
26%
26%
27%
28%
30%
33%
36%
40%
NoSQL other than Hadoop
A MPP (massively parallel processing) data warehouse
Semantic technologies (ontology building, search, auto curation, graph, etc.)
Hadoop (including Hbase or Accumulo)
Data anonymization or de-identification
Creating or building out a data lake
Marketing or digital data management platforms and service providers that
brand their offerings as big data
Packaged analytics technologies that brand themselves as big data
Unstructured data mining / analytics
Distributed in memory databases, grids, analytics tools
Streaming analytics / computing
Large scale predictive modeling, data mining or other advanced analytics
Public cloud big data services
Streaming analytics high in the list of big data plans
19© Cloudera, Inc. All rights reserved.
Agenda
Drivers for agile, real-time data platforms
The key use cases that are driving businesses towards real time
platforms?
Data on adoption trends for real-time technologies
What is Forrester seeing in the market for real-time technologies?
Deploying a real-time OSS achitecture to grow your
business
How can you build a scalable, cost-effective platform to grow your
business?
20© Cloudera, Inc. All rights reserved.
Trend Towards Real-Time Data Platforms is Clear
Drivers for Real-Time Platforms
• Enhancing customer experiences
• Risk Management
• Advancement of IoT and broader instrumentation
Adoption is Accelerating
• Top data-driven initiative by investment: distributed delivery of
real-time data
• DM technology with highest momentum: stream processing
• Top big data plans: streaming analytics is top 3
• Broad, large investments: 90% of decision makers are either
continuing or increasing their investments in data and analytics;
millions/billions being spent
21© Cloudera, Inc. All rights reserved.
The Underlying Driver
What drives a use case to real-time?
High Frequency Trading
APT Detection
Fraud Detection
Predictive Maintenance
Next Best Offer
Inventory Management
Shipping/Logistic Systems
CRM Systems
Employee Management
Strategic Planning
Real-time data management use cases are
defined by a common set of characteristics.
• Narrow time window in which to make a decision
(automated or manual)
• Opportunity for the data points to change the
decision path
• Decreasing value of data over time
Not all use cases have a pressing need for
real-time data.
• Broader strategic decisions, for example, do not
require real-time data input
• Over time, decreases in HW costs and increases in
availability of real-time systems will lead most use
cases to be conducted in real-time
Real Time
Some Latency
Acceptable
22© Cloudera, Inc. All rights reserved.
Moving to Real-Time and Leveraging Analytics
What do we have to gain?
“Monitoring System”
Sensors are automatically
monitored and
programmed to deliver
warnings when readings
are delivered outside of
an “optimal zone”.
Basic models developed
over small subsets of
data.
“Predictive System”
Ingestion and processing
of all sensor data into an
unlimited data store with
analytic capabilities
enables machine
learning, which can
provide automated
optimization and
predictive maintenance.
“Only 1 percent of data from an oil rig with 30,000 sensors
is examined. The data that are used today are mostly for
anomaly detection and control, not optimization and
prediction, which provide the greatest value.”
- McKinsey & Company
Traditional Architectures Real-Time Analytic Capabilities
23© Cloudera, Inc. All rights reserved.
Ingest data of any
type or volume
Process data as it
arrives
Serve data to users
and applications
Real-Time Data
24© Cloudera, Inc. All rights reserved.
Ingestion at Cloudera
• Apache Sqoop for data from
relational databases
• Apache Flume for logs, event
based data
• Apache Kafka is fast,
scalable, and fault-tolerant
messaging
Partners, such as Streamsets,
provide rich visualization tools
Ingestion in Real-Time
Stream Ingestion is a Must for Many Use Cases
Ingestion isn’t just about internal business data anymore.
• Traditional ingestion was internally focused, and often a matter of
moving data from one silo or system to another
• Today, businesses aim to take in data from a variety of external
sources, IoT sensors, and machine-generated (user/network)
data
Your data journey can’t start until the data arrives.
• Each step of the ingest/process/serve data pipeline must occur
at real-time speed if decisions are to be made in time to affect
the course of business
Visualization help practitioners understand their data.
• Complex tasks can be made less complex via graphical
representations; data ingestion is no different
25© Cloudera, Inc. All rights reserved.
Stream Processing at Cloudera
Spark Streaming, the leading
open-source framework for real-
time use cases, is deployed in
Cloudera’s real-time
architectures.
Cloudera has the broadest base
of Hadoop-adjacent experience
with Spark and integrating it
with Apache components.
Ingestion in Real-Time
Unlocking Value at Speed
For some use cases, batch just isn’t enough.
• Batch processing can lead to bottlenecks and delays in data
transformations that cause missed opportunities.
Apache Spark is gaining momentum for a reason.
• Leveraging Apache Spark for stream processing enables real-
time use cases with sub-second latency and best-in-class API’s.
Spark has a best-in-class ecosystem.
• Machine learning (via MLlib) is seamlessly integrated into Spark.
• Broadest set of vendors and contributors working on Spark
among available processing engines, leading to rapid innovation.
26© Cloudera, Inc. All rights reserved.
Data Serving at Cloudera
Apache Kudu provides batch
analysis and real-time serving within
the same storage layer
Apache HBase yields the best
read/write performance
Cloudera Search enables SQL-like
faceted search in natural language
Apache Kafka can be used to serve
data to applications and users
Serving in Real-Time
Inject Data into Real-Time Decisions
You need options that suit your use case.
• Platform proliferation hurts IT departments as skillsets are
divided; fewer platforms with broad capabilities help.
Apache Kudu changes the game for open source
software.
• Combining real-time serving with analytic scans through a
relational database had taken a complex lambda architecture
until Kudu
• Together, simplification and affordability should drive more use
cases to real-time automated processes, in turn driving
increased revenue, decreased risk, and better service for
companies deploying Kudu
27© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Apache Kudu: Filling the Analytic Gap
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Kudu Kudu fills the Gap
Modern analytic
applications often
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Analytic
Gap
Pace of Analysis
PaceofData
28© Cloudera, Inc. All rights reserved.
Real-Time Data Analysis at Work
Customer 360  “Next Best Offer 2.0”
Kafka
Spark
Streaming
Kudu
Spark MLlib
Application
Data
Sources
Individual Session
Customer
Interaction
Spark
Full Model/Learning
Data Request Sent For Stream Processing
Data Cleaned/Ordered/Processed, Then
Delivered to Kudu for Modelling
User’s navigation returns the results they
are looking for, in addition to offers and
suggestions hyper-customized for them.
Illustrative,
models will
likely have
>2
dimensions
29© Cloudera, Inc. All rights reserved.
Machine Learning
Kudu opens the door to machine learning
Kudu provides the ability
to leverage real-time
updates and analytic
scans together - critical for
many machine learning
applications.
Source: GHOSTS IN THE MACHINE: Artificial intelligence, risks and regulation in financial markets
30© Cloudera, Inc. All rights reserved.
The Time for
Real-Time Data
and Analytics
is Now.
And the platform for it
is Cloudera Enterprise.
31© Cloudera, Inc. All rights reserved.

More Related Content

What's hot

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Live Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoLive Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoCloudera, Inc.
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Cloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseCloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningCloudera, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 

What's hot (20)

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Live Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoLive Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution Demo
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning

 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine Learning
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 

Viewers also liked

Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Cloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Cloudera, Inc.
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsManuel Hurtado
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Charlotte whiplash presentation
Charlotte whiplash presentationCharlotte whiplash presentation
Charlotte whiplash presentationcharlotteellis111
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Identificacion de peligros y evaluacion de riesgos en oficinas taller de nve...
Identificacion de peligros y evaluacion de riesgos en oficinas  taller de nve...Identificacion de peligros y evaluacion de riesgos en oficinas  taller de nve...
Identificacion de peligros y evaluacion de riesgos en oficinas taller de nve...Alex Cumbicus Saavedra
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experimentsmiso_uam
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaAmazon Web Services
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 

Viewers also liked (20)

Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution

 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
 
Apache Beam
Apache Beam Apache Beam
Apache Beam
 
Derechos de libertad
Derechos de libertadDerechos de libertad
Derechos de libertad
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
 
Vectores en r3
Vectores en r3Vectores en r3
Vectores en r3
 
Oe global2017 corti
Oe global2017 cortiOe global2017 corti
Oe global2017 corti
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Charlotte whiplash presentation
Charlotte whiplash presentationCharlotte whiplash presentation
Charlotte whiplash presentation
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Fuentes singulares
Fuentes singularesFuentes singulares
Fuentes singulares
 
Identificacion de peligros y evaluacion de riesgos en oficinas taller de nve...
Identificacion de peligros y evaluacion de riesgos en oficinas  taller de nve...Identificacion de peligros y evaluacion de riesgos en oficinas  taller de nve...
Identificacion de peligros y evaluacion de riesgos en oficinas taller de nve...
 
Christopher c. greene 2017
Christopher c. greene 2017Christopher c. greene 2017
Christopher c. greene 2017
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experiments
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 

Similar to Kudu Forrester Webinar

Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyNeo4j
 
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterStreaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterCubic Corporation
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer IBM
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDenodo
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataPrecisely
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steerAndy Steer
 
Big Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesBig Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesSAP Technology
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
Navigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting EcosystemNavigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting EcosystemWorkday, Inc.
 
Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsDavid J Rosenthal
 
Top 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For BusinessTop 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For BusinessAlbiorix Technology
 

Similar to Kudu Forrester Webinar (20)

Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph Technology
 
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterStreaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
Enabling 360-degree Business Insights with SAP Data
Enabling 360-degree Business Insights with SAP DataEnabling 360-degree Business Insights with SAP Data
Enabling 360-degree Business Insights with SAP Data
 
Leveraging Streaming Data through Automation
Leveraging Streaming Data through AutomationLeveraging Streaming Data through Automation
Leveraging Streaming Data through Automation
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer
 
Delivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data FabricDelivering Analytics at The Speed of Transactions with Data Fabric
Delivering Analytics at The Speed of Transactions with Data Fabric
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steer
 
Big Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesBig Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped Opportunities
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Navigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting EcosystemNavigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting Ecosystem
 
Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and Implementations
 
Top 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For BusinessTop 10 Digital Transformation Trends For Business
Top 10 Digital Transformation Trends For Business
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Kudu Forrester Webinar

  • 1. 1© Cloudera, Inc. All rights reserved. Apache Kudu Webinar Series Understanding and Unlocking the Value of Real-Time Data Ryan Lippert | Cloudera Michele Goetz | Forrester (Special Guest)
  • 2. 2© Cloudera, Inc. All rights reserved. Kudu Webinar Series Part 1: Lambda Architectures – Simplified by Apache Kudu A look into the potential trouble involved with a lambda architecture, and how Apache Kudu can dramatically simplify real-time analytics. Part 2: Extending the Capabilities of Operational and Analytical Databases An examination of how Apache Kudu expands the set of use cases that Cloudera’s Operational and Analytical databases can handle. Part 3: Data-in-Motion: Unlock the Value of Real-Time Data Forrester will discuss their research into real-time data pipelines and analytics, and Cloudera will discuss how to make it a reality. Part 4: Techincal Deep-Dive into Apache Kudu An in-depth examination of the technical architecture and design of Apache Kudu, straight from a PMC Member.
  • 3. 3© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for big data with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase OTHER Object Store FILESYSTEM HDFS RELATIONAL Kudu
  • 4. 4© Cloudera, Inc. All rights reserved. Ingest data of any type or volume Process data as it arrives Serve data to users and applications Real-Time Data
  • 5. 5© Cloudera, Inc. All rights reserved. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  • 6. © 2017 FORRESTER. REPRODUCTION PROHIBITED. Michele Goetz Special Guest Speaker Principal Analyst Serving Enterprise Architecture Professionals
  • 7. 7© 2017 FORRESTER. REPRODUCTION PROHIBITED. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  • 8. 8© 2017 FORRESTER. REPRODUCTION PROHIBITED. Superior CX depends on data and insights
  • 9. 9© 2017 FORRESTER. REPRODUCTION PROHIBITED. Fraud and risk management requires real-time data
  • 10. 10© 2017 FORRESTER. REPRODUCTION PROHIBITED. IoT heat map shows where data matters most, now
  • 11. 11© 2017 FORRESTER. REPRODUCTION PROHIBITED. Data bottlenecks are catalysts for transition
  • 12. 12© 2017 FORRESTER. REPRODUCTION PROHIBITED. Create a road map for a real-time, agile data platform
  • 13. 13© 2017 FORRESTER. REPRODUCTION PROHIBITED. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  • 14. 14© 2017 FORRESTER. REPRODUCTION PROHIBITED. Leaders are focused on the technologies that allow data and insights to be consumed across the organization What are your firm's plans for the following data driven initiatives? Base: 3005 global data and analytics decision-makers. Source: Business Technographics® Global Data & Analytics Survey, 2016 51% 51% 51% 51% 51% 49% 52% 52% 54% 54% 58% 22% 22% 22% 22% 22% 24% 22% 23% 22% 23% 22% Creating an organizational center of excellence for business intelligence Combine content management and data management programs into a unified information management program Changing our processes to promote data stewardship and sharing Investing in platforms to and share out data content Creating a business led data stewardship or governance program Changing management incentives to promote data sharing Implementing analytics insights in software systems to aid customers or support employee decisions. Investing more in business friendly, self-service visualization and analytics Engaging external services providers or strategic business consultants for data and analytics or insights services Providing data preparation tools for self-service data management Investing in distributed real time insight delivery technology Expanding/Implemented Planning to implement within the next 12 months
  • 15. 15© 2017 FORRESTER. REPRODUCTION PROHIBITED. Base: 325 global data and analytics technology decision-makers. “Don’t know” not shown. Source: Business Technographics® Global Data & Analytics Survey, 2016 Which of the following describes your [TDM=”IT budget data and analytics technology or services”; BDM=”business budget for data and analytics technology or services”] from 2015 to 2016? 4% 5% 6% 6% 22% 26% 30% 0% 5% 10% 15% 20% 25% 30% 35% Decrease by 5% to 10% Don’t know Decrease by 1-4% Increase by more than 10% Increase by 5% to 10% Increase by 1-4% Stay about the same 54% of data and analytics technology decision-makers increased their budgets for data and analytics from 2015 to 2016 54%
  • 16. 16© 2017 FORRESTER. REPRODUCTION PROHIBITED. Companies of all sizes are spending millions for data & analytics Note: Don’t know excluded. Base: 765*, 1,288 global data and analytics decision makers Source: Business Technographics® Global Data & Analytics Survey, 2016 Please estimate, in millions, how much your data and analytics budget is for 2016? (Note: Number is in US Dollars) 55% 22% 9% 1% 1% 0% 0% 32% 30% 13% 4% 2% 2% 1% Less than $1 million $1 million to under $10 million $10 million to under $100 million $100 million to under $500 million $500 million to under $ 1 billion $1 billion to under $5 billion $5 billion or more SMB (20-999 employees)* Enterprise (1,000 or more employees)
  • 17. 17© 2017 FORRESTER. REPRODUCTION PROHIBITED. Among the DM technologies Forrester tracks, interest for stream processing tools has grown the most YoY What are your firm's plans to use the following data management technologies? Base: 2094 and *1805 global data and analytics technology decision-makers. Source: Business Technographics® Global Data & Analytics Survey, 2016 % with commitment % with interest, but no immediate plans +5 p.p. +3 p.p. -2 p.p. -1 p.p. -2 p.p. -3 p.p. % with commitment (expanding, implemented, or planning to implement in the next 12 months) 59% 61% 63% 63% 60% 59% 64% 64% 61% 62% 58% 56% Stream processing tools Inverted index database Distributed NoSQL databases Hadoop Associative index databases RDF, triple store -20% -19% -19% -20% -19% -19% -13% -13% -16% -14% -14% -13%
  • 18. 18© 2017 FORRESTER. REPRODUCTION PROHIBITED. Base: Total: 2094 Source: Business Technographics® Global Data & Analytics Survey, 2016 Which of the following are included in your plans for big data? 16% 18% 22% 23% 23% 26% 26% 27% 28% 30% 33% 36% 40% NoSQL other than Hadoop A MPP (massively parallel processing) data warehouse Semantic technologies (ontology building, search, auto curation, graph, etc.) Hadoop (including Hbase or Accumulo) Data anonymization or de-identification Creating or building out a data lake Marketing or digital data management platforms and service providers that brand their offerings as big data Packaged analytics technologies that brand themselves as big data Unstructured data mining / analytics Distributed in memory databases, grids, analytics tools Streaming analytics / computing Large scale predictive modeling, data mining or other advanced analytics Public cloud big data services Streaming analytics high in the list of big data plans
  • 19. 19© Cloudera, Inc. All rights reserved. Agenda Drivers for agile, real-time data platforms The key use cases that are driving businesses towards real time platforms? Data on adoption trends for real-time technologies What is Forrester seeing in the market for real-time technologies? Deploying a real-time OSS achitecture to grow your business How can you build a scalable, cost-effective platform to grow your business?
  • 20. 20© Cloudera, Inc. All rights reserved. Trend Towards Real-Time Data Platforms is Clear Drivers for Real-Time Platforms • Enhancing customer experiences • Risk Management • Advancement of IoT and broader instrumentation Adoption is Accelerating • Top data-driven initiative by investment: distributed delivery of real-time data • DM technology with highest momentum: stream processing • Top big data plans: streaming analytics is top 3 • Broad, large investments: 90% of decision makers are either continuing or increasing their investments in data and analytics; millions/billions being spent
  • 21. 21© Cloudera, Inc. All rights reserved. The Underlying Driver What drives a use case to real-time? High Frequency Trading APT Detection Fraud Detection Predictive Maintenance Next Best Offer Inventory Management Shipping/Logistic Systems CRM Systems Employee Management Strategic Planning Real-time data management use cases are defined by a common set of characteristics. • Narrow time window in which to make a decision (automated or manual) • Opportunity for the data points to change the decision path • Decreasing value of data over time Not all use cases have a pressing need for real-time data. • Broader strategic decisions, for example, do not require real-time data input • Over time, decreases in HW costs and increases in availability of real-time systems will lead most use cases to be conducted in real-time Real Time Some Latency Acceptable
  • 22. 22© Cloudera, Inc. All rights reserved. Moving to Real-Time and Leveraging Analytics What do we have to gain? “Monitoring System” Sensors are automatically monitored and programmed to deliver warnings when readings are delivered outside of an “optimal zone”. Basic models developed over small subsets of data. “Predictive System” Ingestion and processing of all sensor data into an unlimited data store with analytic capabilities enables machine learning, which can provide automated optimization and predictive maintenance. “Only 1 percent of data from an oil rig with 30,000 sensors is examined. The data that are used today are mostly for anomaly detection and control, not optimization and prediction, which provide the greatest value.” - McKinsey & Company Traditional Architectures Real-Time Analytic Capabilities
  • 23. 23© Cloudera, Inc. All rights reserved. Ingest data of any type or volume Process data as it arrives Serve data to users and applications Real-Time Data
  • 24. 24© Cloudera, Inc. All rights reserved. Ingestion at Cloudera • Apache Sqoop for data from relational databases • Apache Flume for logs, event based data • Apache Kafka is fast, scalable, and fault-tolerant messaging Partners, such as Streamsets, provide rich visualization tools Ingestion in Real-Time Stream Ingestion is a Must for Many Use Cases Ingestion isn’t just about internal business data anymore. • Traditional ingestion was internally focused, and often a matter of moving data from one silo or system to another • Today, businesses aim to take in data from a variety of external sources, IoT sensors, and machine-generated (user/network) data Your data journey can’t start until the data arrives. • Each step of the ingest/process/serve data pipeline must occur at real-time speed if decisions are to be made in time to affect the course of business Visualization help practitioners understand their data. • Complex tasks can be made less complex via graphical representations; data ingestion is no different
  • 25. 25© Cloudera, Inc. All rights reserved. Stream Processing at Cloudera Spark Streaming, the leading open-source framework for real- time use cases, is deployed in Cloudera’s real-time architectures. Cloudera has the broadest base of Hadoop-adjacent experience with Spark and integrating it with Apache components. Ingestion in Real-Time Unlocking Value at Speed For some use cases, batch just isn’t enough. • Batch processing can lead to bottlenecks and delays in data transformations that cause missed opportunities. Apache Spark is gaining momentum for a reason. • Leveraging Apache Spark for stream processing enables real- time use cases with sub-second latency and best-in-class API’s. Spark has a best-in-class ecosystem. • Machine learning (via MLlib) is seamlessly integrated into Spark. • Broadest set of vendors and contributors working on Spark among available processing engines, leading to rapid innovation.
  • 26. 26© Cloudera, Inc. All rights reserved. Data Serving at Cloudera Apache Kudu provides batch analysis and real-time serving within the same storage layer Apache HBase yields the best read/write performance Cloudera Search enables SQL-like faceted search in natural language Apache Kafka can be used to serve data to applications and users Serving in Real-Time Inject Data into Real-Time Decisions You need options that suit your use case. • Platform proliferation hurts IT departments as skillsets are divided; fewer platforms with broad capabilities help. Apache Kudu changes the game for open source software. • Combining real-time serving with analytic scans through a relational database had taken a complex lambda architecture until Kudu • Together, simplification and affordability should drive more use cases to real-time automated processes, in turn driving increased revenue, decreased risk, and better service for companies deploying Kudu
  • 27. 27© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Apache Kudu: Filling the Analytic Gap Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  • 28. 28© Cloudera, Inc. All rights reserved. Real-Time Data Analysis at Work Customer 360  “Next Best Offer 2.0” Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Customer Interaction Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling User’s navigation returns the results they are looking for, in addition to offers and suggestions hyper-customized for them. Illustrative, models will likely have >2 dimensions
  • 29. 29© Cloudera, Inc. All rights reserved. Machine Learning Kudu opens the door to machine learning Kudu provides the ability to leverage real-time updates and analytic scans together - critical for many machine learning applications. Source: GHOSTS IN THE MACHINE: Artificial intelligence, risks and regulation in financial markets
  • 30. 30© Cloudera, Inc. All rights reserved. The Time for Real-Time Data and Analytics is Now. And the platform for it is Cloudera Enterprise.
  • 31. 31© Cloudera, Inc. All rights reserved.

Editor's Notes

  1. Ingest: Collecting the Data Today’s data-in-motion conversation, like the data journey itself, starts with ingestion. The increase in sensor-generated data associated with IoT, combined with the demands for social media data collection, has created a deluge of unstructured data that is difficult for organizations to contend with. As a common initial bottleneck in the data-in-motion journey, organizations often reach for a robust ingestion solution. However, it’s important to understand ingestion as part of a broader real-time data context; it’s a critical component, but only the first of three. Cloudera takes an open-source approach to ingestion, as it does with all three stages of the data-in-motion journey. Identifying the need for a streaming data capture system, Cloudera led the development of Apache Flume, the open standard for collecting and moving a vast amount of log data. The subsequent integration of Flume with Apache Kafka created an ingest architecture that has been replicated across Cloudera’s customer base in a variety of use cases. With Flume and Kafka, Cloudera deploys the leading streaming ingest platform. Flume can provide light weight agents deployed on edge nodes that number in the hundreds or thousands, each of which can be tiered to enable efficient ingest topologies. The integration between Kafka and Flume is bidirectional, meaning either component can be a producer or consumer of data depending on the specifics of your use case. A rising trend in data ingestion is the use of a rich visual interface that enables a user to interact with their ingestion architecture in an easy-to-use manner. While Cloudera delivers all the functionality underneath, we partner with best-in-class partners such as Streamsets, Cask, and others to deliver rich visualization. This enables Cloudera to focus on our core competency of data management, while enabling vendors with large engineering teams dedicated to visualization to focus on theirs. Portability, neutrality, and history of success for companies like Informatica,Talend, and others in similar spaces creates the best experience for our customers.
  2. Cloudera relies on Spark Streaming to process data once it is ingested. As the leading open-source processing framework for real-time use cases, Spark Streaming is an open standard and one of the most easily-recognizable components of the broader Apache Hadoop™ ecosystem. Cloudera has a the broadest base of Hadoop-adjacent experience with Apache Spark™ and Spark Streaming; this is a product of early adoption and integration of these projects into Cloudera Enterprise. CLOUDERA ENTERPRISE: THE INDUSTRY STANDARD FOR A COMPLETE DATA-IN-MOTION SOLUTION 5 WHITE PAPER Spark Streaming provides the strongest processing solution for data-in-motion use cases as a result of: • Best-in-class performance: - High throughput ensures that jobs will not bottleneck at the processing stage - Sub-second latency enables real-time capabilities • Best-in-class API and Features: - Easy-to-use SQL based API’s for authoring streaming jobs help expand the number of use cases and value of data in motion - “Exactly once” stream processing semantics help ensure accuracy - Sliding window computations enable fast insights into time period data slices - Built-in API’s for maintaining and updating in-memory information • Best-in-class ecosystem: - Largest set of vendors working with and around Spark among available processing engines, enabling access to latest innovations - Broadest and deepest machine learning library (MLib) is seamlessly integrated Spark Streaming from Cloudera, in particular, benefits users through the most robust integration into the ingestion and serving phases that bookend the data-in-motion story. This integration ensures a fast, easy, and secure delivery of processed data to the serving stage of data in motion.
  3. Whereas ingestion and processing have a relatively consistent flow irrespective of use case, the serving phase of a data-in-motion solution requires a variety of options in order to deliver the right data, to the right place, at the right time. Without this ability to quickly serve data to decision points, a solution loses its real-time capability and ceases to become a data-in-motion solution. Cloudera has a variety of options that help serve the diverse needs of individual use cases: • Apache Kudu™: A new, Cloudera-initiated Apache project, Kudu offers the unique ability to do fast scans on fast data. With an overwhelming number of data-in-motion use cases requiring analysis or visualization of streaming data, Kudu can enable the required batch analysis and real-time serving within the same storage layer. • Apache HBase™: HBase offers the best random read/write performance of any component within the Hadoop ecosystem. This capability, combined with high levels of concurrent access, enables online applications and operational needs that require the ability to query the latest data. • Cloudera Search: Powered by Apache Solr™, Cloudera Search democratizes data by enabling non-technical users to perform SQL-like, faceted search in natural language. Solr’s native integration into Cloudera Enterprise generates faster and more secure results. • Apache Kafka: Kafka’s fast, scalable, and durable design enables hundreds of megabytes of reads and writes per second, from thousands of clients.In addition to playing a role in ingestion, Kafka can be used to serve data to applications and users. This “last mile” step in the data-in-motion story is arguably the most critical step, which is why this breadth of options is necessary. Each use case, including the tendencies and workflows of the expected users, requires a different set of data access capabilities. Cloudera can meet any requirement through these tools, and can do so as the final step in an end-to-end data-in-motion story.
  4. Kudu allows you to have your cake and eat it too