Red Hat - Presentation at Hortonworks Booth - Strata 2014

Discover
Red
Hat
and
Hortonworks
for
the
Modern
Data
Architecture
Kimberly
Palko
Product
Manager
Red
Hat
1 RED
HAT
JBOSS
MIDDLEWARE

2 RED
HAT
JBOSS
MIDDLEWARE
Agenda
● Red Hat and JBoss Middleware Overview
● Combining data in Hadoop with traditional data
sources
● Federating two geographically distributed
Hadoop clusters
● Virtual data marts for Hadoop Lake

RED
HAT
&
JBOSS
MIDDLEWARE
OVERVIEW
3 RED
HAT
JBOSS
MIDDLEWARE

Engineering
CollaboraFon
Benefits
Integra<on
with
JBoss
Data
Virtualiza<on
Enable
agile
Big
Data
Hadoop
integra<on
with
exis<ng
enterprise
assets
and
maximize
universal
data
u<liza<on
to
enable
self-‐service
analy<cs
4 RED
HAT
JBOSS
MIDDLEWARE
Integra<on
with
mul<ple
Red
Hat
JBoss
Middleware
product
family
Enables
millions
of
JBoss
developers
to
quickly
build
applica<ons
with
Hadoop
Integra<on
with
Red
Hat
Storage
Enables
Hadoop
to
use
Red
Hat
Storage
secure
resilient
storage
pool
for
data
applica<ons
Integra<on
with
Red
Hat
Enterprise
Linux
OpenStack
PlaOorm
Simplifies
automated
deployment
of
Hadoop
on
OpenStack
Integrated
with
Red
Hat
Enterprise
Linux
and
OpenJDK
Develop
and
deploy
Apache
Hadoop
as
an
integrated
component
for
mul<ple
deployment
scenarios

Big
Data
Integra<on:
Turn
Data
into
Ac<onable
Informa<on
Speed
of
Itera<on
leads
to
Success
Semi
/
Unstructured
Data
5 RED
SOCIAL,
LOGS
HAT
JBOSS
MIDDLEWARE
Hadoop
&
NoSQL
Data
Integra<on
&
Data
Services
JBoss
Data
Virtualiza<on
In-‐memory
data
management
JBoss
Data
Grid
BI
Analy<cs
(diagnos<c,
descrip<ve,
predic<ve,
prescrip<ve)
SOA
Applica<ons
Event
Processing
&
Messaging
JBoss
BRMS
&
JBoss
A-‐MQ
Structured
Data
DW,
OLAP,
OLTP
Streaming
Data
EVENTS,
IOT
Red
Hat
Enterprise
Linux
Red
Hat
Storage
Analyze
Integrate
Enrich
Ingest

Data
Challenges
Geang
Bigger…
HBase
6 RED
HAT
JBOSS
MIDDLEWARE
NoSQL
Hive
MapReduce
HDFS
Storm
Spark

Make
Big
Data
Accessible
for
Everyone
7 RED
HAT
JBOSS
MIDDLEWARE

Data Supply and Integration Solution
Data
Virtualiza<on
sits
in
front
of
mul<ple
data
sources
and
! allows
them
to
be
treated
a
single
source
8 RED
HAT
JBOSS
MIDDLEWARE
! delivering
the
desired
data
! in
the
required
form
! at
the
right
<me
! to
any
applica<on
and/or
user.
THINK
VIRTUAL
MACHINE
FOR
DATA

Easy
Access
to
Big
Data
Hive
9 RED
● Repor<ng
tool
accesses
the
data
virtualiza<on
server
via
HAT
JBOSS
MIDDLEWARE
rich
SQL
dialect
● The
data
virtualiza<on
server
translates
rich
SQL
dialect
to
HiveQL
● Hive
translates
HiveQL
to
MapReduce
● MapReduce
runs
MR
job
on
big
data
MapReduce
HDFS
Analytical
Reporting
Tool
Data
Virtualization
Server
Hadoop
Big Data

Different
Users
Different
Views
of
Big
Data
Hive
10 RED
● Logical
tables
with
different
forms
of
aggrega<on
● Logical
tables
containing
extra
derived
data
● Logical
tables
with
filtered
data
● All
reports/users
share
the
same
specifica<ons
HAT
JBOSS
MIDDLEWARE
MapReduce
HDFS

USE
CASE
1:
COMBINING
DATA
FROM
HADOOP
WITH
TRADITIONAL
SOURCES
-‐
USING
JBOSS
DATA
VIRTUALIZATION
11 RED
HAT
JBOSS
MIDDLEWARE

Integra<on
of
Big
Data
with
“Small
Data”
12 RED
HAT
JBOSS
MIDDLEWARE
• Integra<ng
small
data
with
big
data
is
easy
• Integra<on
specifica<ons
can
be
shared
or
be
developed
for
individual
reports
MapReduce
HDFS
Database
Hive
Applica<on
Server

Hive
13 RED
HAT
JBOSS
MIDDLEWARE
Caching
the
Big
Data
• Caches
to
speed
up
interac<ve
repor<ng
• Caches
to
create
a
consistent
view
of
big
data
• Different
caches
for
different
reports
MapReduce
HDFS

14 RED
HAT
JBOSS
MIDDLEWARE
USE
CASE
2:
GEOGRAPHICALLY
DISTRIBUTED HADOOP
CLUSTERS WITH DATA
VIRTUALIZATION
- SECURING DATA BY USER ROLE

Role based access control
15 RED
HAT
JBOSS
MIDDLEWARE
Roles
• Define
roles
based
on
organiza<on
hierarchy
Users
• External
authen<ca<on
via
Kerberos,
LDAP,
etc.
VDB
• Assign
users
and
groups
to
a
virtual
data
base

16 RED
HAT
JBOSS
MIDDLEWARE
Authentication
Kerberos
From
client
to
the
virtual
data
base
Login
Modules
LDAP
(MS
Ac<ve
Directory,
OpenLDAP,
etc.),
any
JAAS
based
security
domain
REST
and
Web
Services
WS-‐UsernameToken
HTTP
Basic
authen<ca<on
SAML
SAML
authen<ca<on
for
web
client
applica<ons

Audit Logging via Dashboard
17 RED
HAT
JBOSS
MIDDLEWARE

Row
and
Column
Masking
18 RED
-‐ Row
based
masking
Ex:
keyed
off
geographic
marker
-‐
Column
masking
to
a
constant,
null,
or
a
SQL
statement
Example:
change
all
but
the
Last
4
digits
in
a
credit
card
number
to
stars
concat('****',
substring(column,
length(column)-‐4))
HAT
JBOSS
MIDDLEWARE

Summary
of
Security
Capabili<es
● Authentication
– Kerberos, LDAP, WS-UsernameToken, HTTP Basic,
SAML
19 RED
HAT
JBOSS
MIDDLEWARE
● Authorization
– Virtual data views, Role based access control
● Administration
– Centralized management of VDB privileges
● Audit
– Centralized audit logging and dashboard
● Protection
– Row and column masking
– SSL encryption (ODBC and JDBC)

Demonstration
Geographically Distributed
Hadoop Clusters with Data
Virtualization - Securing
Data by User Role
20 RED
HAT
JBOSS
MIDDLEWARE

Use Case 2: Federating across
Geographically Distributed
Hadoop Clusters
Problem:
Geographically distributed Hadoop
clusters contains sensitive data like
patient records or customer
identification that cannot be
accessed by other regions due to
regulatory policy. IT needs access
to all data, but users can only
access the data in their region.
21 RED
HAT
JBOSS
MIDDLEWARE
Solution:
Leverage JBoss Data Virtualization to
provide Row Level Security and
Masking of columns while
federating across Hadoop clusters.
Data
can
be
accessed
by
mulFple
tools
and
methods
already
in-‐house
Consume
Compose
Connect
JBoss
Data
Virtualiza<on
Hiv
e
Hadoop
cluster
in
one
geographic
region
Hiv
e
Hadoop
cluster
in
a
second
geographic
region

Use Case 2 - Architecture
APPLICATIONS
22 RED
HAT
JBOSS
MIDDLEWARE
DATA
SYSTEM
Business
AnalyFcs
Custom
ApplicaFons
Packaged
ApplicaFons
VIRTUAL
DATA
MART

Use Case 2 - Resources
23 RED
HAT
JBOSS
MIDDLEWARE
• GUIDE
How
to
guide:
https://github.com/DataVirtualizationByExample/
HortonworksUseCase2
Tutorial:
Available
soon
• VIDEOS:
hpp://vimeo.com/user16928011/hortonworksusecase2short
hpp://vimeo.com/user16928011/hortonworksusecase2short
• SOURCE:
hpps://github.com/DataVirtualiza<onByExample/HortonworksUseCase2

24 RED
HAT
JBOSS
MIDDLEWARE
USE
CASE
3:
VIRTUAL DATA
MARTS FOR HADOOP DATA
LAKE
- WITH JBOSS DATA VIRTUALIZATION

Data for entire organization in Hadoop Data Lake
Problem:
How
does
IT
control
access
and
give
business
users
just
the
data
they
need?
-‐
Does
every
line
of
business
have
access
to
everyone’s
data?
-‐
How
do
business
users
get
access
to
the
data
they
need
in
a
simple
(even
self-‐service)
way?
Hadoop
Data
Lake
HR
Employee
Files
25 RED
HAT
JBOSS
MIDDLEWARE
Marke<ng
Clickstream
Data
Finance
Expense
Reports
Server
Logs
Sales
Transac<ons
Customer
Twiper
Accounts
Sen<ment
Data

Secure, Self-Service Virtual Data Marts for Hadoop
SoluFon:
Use
JBoss
Data
VirtualizaFon
to
create
virtual
data
marts
on
top
of
a
Hadoop
cluster
-‐ Lines
of
Business
get
access
to
the
data
they
need
in
a
simple
manner
-‐ IT
maintains
the
process
and
control
it
needs
-‐ All
data
remains
in
the
data
lake,
nothing
is
copied
or
moved
Marke<ng
Finance
IT
Hadoop
Data
Lake
26 RED
HAT
JBOSS
MIDDLEWARE
Marke<ng
Clickstream
Data
Customer
Twiper
Accounts
Sen<ment
Data
Sales
Server
Logs
HR
Employee
Sales
Transac<ons
Files
Finance
Expense
Reports

Optional hierarchical data architectures with virtual
data mart
Can be combined with security features like user role
access and row and column masking
Dept
Base
Virtual
Database
(VDB)
27 RED
HAT
JBOSS
MIDDLEWARE
Team
1
VDB
Team2
VDB
View1
View2

Virtual Data Marts for Operational Data
Problem:
All
the
legacy
and
archived
data
is
in
the
Hadoop
data
lake.
We
want
to
access
the
most
recent,
up
to
the
minute,
operaFonal
data
oen
and
quickly.
Hadoop
Data
Lake
Historical
Data
HR
Employee
Files
28 RED
HAT
JBOSS
MIDDLEWARE
Marke<ng
Clickstream
Data
Finance
Expense
Reports
Server
Logs
Sales
Transac<ons
Customer
Accounts
Twiper
Sen<ment
Data

Caching
For
Faster
Performance
–
Materialized
View
Query
1
29 RED
HAT
JBOSS
MIDDLEWARE
Cached
or
Materialized
View
1
View
1
Query
2
Virtual
Database
(VDB)
• Same
cached
view
for
mul<ple
queries
• Refreshed
automa<cally
or
manually
• Cache
repository
can
be
any
supported
data
source

Virtual operational data store
SoluFon:
Use
JBoss
Data
VirtualizaFon
to
integrate
up
to
the
minute
data
from
mulFple
diverse
data
sources
that
can
be
quickly
queried.
-‐
Use
HDP
for
older
data
Materialized
View
30 RED
Hadoop
Data
Lake
HR
Employee
Files
HAT
JBOSS
MIDDLEWARE
-‐
-‐
Use
JDV
to
materialize
the
data
in
HDP
for
-‐
faster
access
and
to
combine
with
operaFonal
VDB
-‐
Marke<ng
Clickstream
Data
Finance
Expense
Reports
Server
Logs
Sales
Transac<ons
Customer
Accounts
Twiper
Sen<ment
Data
Opera<onal
Historical
Data
VDB
with
up
to
the
minute
data
Periodic
Transfer
from
Data
Sources

Demonstration
Virtual Data Marts
with
Hadoop Data Lake
31 RED
HAT
JBOSS
MIDDLEWARE

Use Case 3 - Overview
xxx ObjecFve:
32 RED
HAT
JBOSS
MIDDLEWARE
–Purpose
oriented
data
views
for
func<onal
teams
over
a
rich
variety
of
semi-‐structured
and
structured
data
Problem:
–Data
Lakes
have
large
volumes
of
consolidated
clickstream
data,
product
and
customer
data
that
need
to
be
constrained
for
mul<-‐
departmental
use.
SoluFon:
–Leverage
HDP
to
mashup
Clickstream
analysis
data
with
product
and
customer
data
on
HDP
to
answer
-‐
Leverage
Jboss
Data
Virt
to
provide
Virtual
data
marts
for
Marke<ng
and
Product
teams

33 RED
HAT
JBOSS
MIDDLEWARE
APPLICATIONS
Business
AnalyFcs
Custom
ApplicaFons
Packaged
ApplicaFons
DATA
SYSTEM
SOURCES
Emerging
Sources
(Sensor,
SenFment,
Geo,
Unstructured)
ExisFng
Sources
(CRM,
ERP,
Clickstream,
Logs)
HDP
2.1
Governance
& Integration
Security
Operations
Data Access
Data
Management
VIRTUAL
DATA
MART

• GUIDE
How to guide: https://github.com/DataVirtualizationByExample/
HortonworksUseCase3
Tutorial: Available soon
• VIDEOS:
http://vimeo.com/user16928011/hwxuc3configuration
http://vimeo.com/user16928011/hwxuc3run
http://vimeo.com/user16928011/hwxuc3overview
• SOURCE:
https://github.com/DataVirtualizationByExample/HortonworksUseCase3
34 RED
HAT
JBOSS
MIDDLEWARE

Demonstration
Combining Sentiment Data
with Sales Data
35 RED
HAT
JBOSS
MIDDLEWARE

Use Case 1: Combine data from
Hadoop with traditional data
sources
Problem:
Data from new data sources like
social media, clickstream and
sensors needs to be combined
with data from traditional sources
to get the full value.
36 RED
HAT
JBOSS
MIDDLEWARE
Solution:
Leverage JBoss Data Virtualization
to mashup new data in Hadoop
with data in traditional data
sources without moving or
copying any data and access it
through a variety of BI tools and
SOA technologies.
Data
can
be
accessed
by
mulFple
tools
and
methods
already
in-‐house
Consume
Compose
Connect
JBoss
Data
Virtualiza<on
Hiv
e
SOURCE
1:
Hive/Hadoop
contains
data
from
new
data
sources
like
social
media,
clickstream
and
sensor
data
SOURCE
2:
TradiFonal
relaFonal
databases
in
the
enterprise

RDBMS
EDW
MPP
37 RED
HAT
JBOSS
MIDDLEWARE
DATA
SYSTEM
TRADITIONAL
REPOSITORIES
APPLICATIONS
Business
AnalyFcs
Custom
ApplicaFons
Packaged
ApplicaFons
VIRTUAL
DATA
MART

38 RED
HAT
JBOSS
MIDDLEWARE
Use Case 1 – Demo

http://hortonworks.com/hadoop-tutorial/evolving-data-stratagic-
asset-using-hdp-red-hat-jboss-data-virtualization/
39 RED
HAT
JBOSS
MIDDLEWARE

Benefits
of
Data
Virtualiza<on
on
Big
Data
● Enterprise
democra<za<on
of
big
data
● Any
repor<ng
or
analy<cal
tool
can
be
used
40 RED
HAT
JBOSS
MIDDLEWARE
● Easy
access
to
big
data
● Seamless
integra<on
of
big
data
and
small
data
● Sharing
of
integra<on
specifica<ons
● Collabora<ve
development
on
big
data
● Fine-‐grained
security
of
big
data
● Speedy
delivery
of
reports
on
big
data

41 RED
HAT
JBOSS
MIDDLEWARE
QUESTIONS

Red Hat - Presentation at Hortonworks Booth - Strata 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Red Hat - Presentation at Hortonworks Booth - Strata 2014

Similar to Red Hat - Presentation at Hortonworks Booth - Strata 2014 (20)

More from Hortonworks

More from Hortonworks (20)

Recently uploaded

Recently uploaded (20)

Red Hat - Presentation at Hortonworks Booth - Strata 2014