2. PROPRIETARY & CONFIDENTIAL2
• Introduction
to
operational
data
applications
• Challenges
with
building
operational
data
applications
on
Hadoop
• Goals
and
Motivation
for
CDAP
• Introduction
to
CDAP
and
Architecture
Overview
• Building
Blocks
• Datasets
• Programs
• Application
and
Application
Template
• Use-‐cases
Agenda
3. PROPRIETARY & CONFIDENTIAL3
Applications
that
use
data
insights
to
enhance
the
customers/user
experience,
achieve
a
business
objective
or
improve
a
business
process.
What are Operational Data Applications?
4. PROPRIETARY & CONFIDENTIAL4
• 360-‐Degree
Customer
View
• Recommendation
Engine
• Predictive
Modeling
• Fraud
Analysis
• Network
Threat
Detection
• Telemetry
• Time
Series
Analysis
• And
many
more
Examples
10. Motivation
• Simple
yet
powerful
platform
for
developers
to
build
applications
on
Hadoop
• Expose
capabilities
rather
than
features
• Make
Hadoop
accessible
to
developers
with
no
Hadoop
knowledge
11. Goals
• Unified
platform
for
building
solutions
on
Hadoop
• Simpler
application
development
lifecycle
• Reusable
Data
and
Processing
Patterns
• Framework
level
correctness
and
consistency
20. Building Blocks
Dataset Program
Encapsulated
data
access
paBerns
and
data
model
in
a
reusable,
domain-‐specific
API
Standardized
containers
for
processing
paradigms
ProgramaTc
abstracTon
for
composing
mulTple
Datasets
and
Programs
that
integrates
ingesTon,
exploraTon,
transformaTon
and
serving
Application
Dataset ProgramProgramDataset
22. PROPRIETARY & CONFIDENTIAL22
RDBMS
Hadoop Dataset
Raw
Storage
Interfaces,
Data
Modeling,
Data
Layout,
OpTmizaTons
and
Schema
Raw
Storage
Raw
Distributed
Storage,
Model,
Layout,
Op5miza5ons
and
op5onal
Schema
• OpTmizaTons
are
pushed
closer
to
storage
• ApplicaTons
use
SQL
to
access
data
(store
or
retrieve)
• Simpler
ApplicaTons!
• Modeling,
layout
and
opTmizaTons
are
embedded
within
applicaTons
• Hard
to
scale
-‐
lack
of
reusability
• Access
through
domain
specific
APIs
with
opTonal
SQL
Interface
• OpTmizaTons
are
encapsulated
within
datasets
• Simpler
ApplicaTons!
Dataset Motivation
23. PROPRIETARY & CONFIDENTIAL23
• Encapsulate
a
data
access
paBern
and
data
model
in
a
reusable,
domain-‐specific
API
• Establishes
best
prac5ces
in
schema
definiTon
• Abstract
away
underlying
storage
plaorm
• Reusable
as
data
storage
templates
• Easy
sharing
of
stored
data:
• Between
applicaTons
• Batch
and
real-‐Tme
processing
• Integrated
with
TransacTons
for
consistency
• Integrated
tes5ng
• Extensible
to
create
your
own
soluTons
• Transparent
Integra5on
with
• Hive
metastore
• MR
Input/Output
Formats
• Spark
RDDs
Building Blocks - Dataset
24. PROPRIETARY & CONFIDENTIAL24
• Secondary
Indexes
• Example use case: Entity storage - store customer records indexed by location
• Object
Mapping
• Example use case: Entity storage - easily store User instances for user profiles
• Timeseries
Data
• Example use case: any data organized around a time dimension
• Data
Cube
• Example use case: Retail product sales reports, web analytics
• ParTToned
Fileset
• Example use case: Time partitioned processing of feeds
• And
many
more
Dataset - Types
25. PROPRIETARY & CONFIDENTIAL25
Dataset - Example
• A
Java
Library
• Table
Dataset
• First
Name,
Last
Name
and
Link
to
Picture
in
a
Table
• Fileset
Dataset
• Pictures
in
a
Fileset
• Instance
of
Dataset
as
• HBase
Table
and
• HDFS
Directory
• Access
using
SQL
(HIVE)
• Tigon,
MR
&
Spark
can
access
33. PROPRIETARY & CONFIDENTIAL33
ProgramaTc
abstracTon
for
composing
a
use
case
by
combining
Datasets
and
Programs
to
perform
ingesTon,
transformaTon
and
serving.
Building Blocks - Application
public
class
PurchaseApp
extends
AbstractApplication
{
@Override
public
void
configure()
{
.
.
.
addStream(new
Stream("purchaseStream"));
createDataset("frequentCustomers",
KeyValueTable.class);
createDataset("userProfiles",
KeyValueTable.class);
addFlow(new
PurchaseFlow());
addWorkflow(new
PurchaseHistoryWorkflow());
addService(new
PurchaseHistoryService());
addService(UserProfileServiceHandler.SERVICE_NAME,
new
UserProfileServiceHandler());
addService(new
CatalogLookupService());
try
{
createDataset("history",
PurchaseHistoryStore.class,
PurchaseHistoryStore.properties());
ObjectStores.createObjectStore(getConfigurer(),
"purchases",
Purchase.class);
}
catch
(UnsupportedTypeException
e)
{
throw
new
RuntimeException(e);
}
}
}
34. PROPRIETARY & CONFIDENTIAL34
• Is
a
use-‐case
Blueprint
• Composed
using
one
or
more
Programs
and
Datasets
• Supports
real-‐5me
or
batch
or
combina5on
• Highly
reusable
through
configuraTon
&
extensible
through
plugins
• Is
an
applicaTon
that
is
reusable
through
configuraTon
and
extensible
through
plugins.
• Plugins
extend
the
ApplicaTon
Template
by
implemenTng
an
interface
expected
by
the
template.
• Support
with
an
end
to
end
tes5ng
framework
Building Blocks - Application Template
Application Template
Pluggable Interface
Adapter1
Plugin
Config1
Config2
Config3 Adapter2
Plugin
Adapter3
Plugin
35. PROPRIETARY & CONFIDENTIAL35
• Scalable
and
reliable
real-‐time
business
critical
analytics
• Closed
Loop
Recommendation
and
Analytics
• Data
Ingestion
As
A
Service
-‐
Realtime
and
Batch
• Extendable
and
Reusable
use-‐case
blueprints
• Data
As
A
Service
• Reduce
application
development
and
operational
complexity
• ETL
Automation
-‐
Real-‐time
and
Batch
Use-cases
36. Want to Learn More?
Open-source (Apache License v2)
Website:
http://cdap.io
Mailing List:
cdap-user@googlegroups.com
cdap-dev@googlegroups.com
IRC:
#cdap on freenode.net