Modern Data Architecture: In-Memory with Hadoop - the new BI

Hadoop and the new BI:
The Modern Data Architecture
…for in memory Big Data Analytics

10 December 2013

Quick Housekeeping
Q&A box is available for your questions

Webinar will be recorded for future viewing

Thank You for joining!

© Hortonworks Inc. 2013

Modern Data Architecture


Page 3

Your Presenters
• Paul Groom (@datagroom)
– Chief Innovation Officer
– 28 years buried in the big data of the data
guiding business users to value
– Two wheels are more fun than four

• John Kreisa (@marked_man)
– VP Strategic Marketing, Hortonworks
– Over 20 years in data management as a
developer and a marketer
– Avid camper


Page 4

Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in the MDA
• Kognitio’s role in the MDA
• Q&A


Page 5

APPLICATIONS

Existing Data Architecture
Business
Analytics

Custom
Applications

Packaged
Applications

DATA SYSTEM

2.8 ZB in 2012
85% from New Data Types
RDBMS

EDW

MPP

REPOSITORIES

15x Machine Data by 2020
40 ZB by 2020

SOURCES

Source: IDC

Existing Sources
(CRM, ERP, Clickstream, Logs)


Page 6

APPLICATIONS

Modern Data Architecture Enabled
Business
Analytics

Custom
Applications

Packaged
Applications
DEV & DATA
TOOLS

SOURCES

DATA SYSTEM

BUILD &
TEST

OPERATIONAL
TOOLS
RDBMS

EDW

MANAGE &
MONITOR

MPP

REPOSITORIES

Existing Sources

Emerging Sources


(Sensor, Sentiment, Geo, Unstructured)


Page 7

Hadoop Powers Modern Data Architecture
Hadoop Cluster
compute
&
storage

.

.

.

.

.

.

.

.

.

.

compute
&
storage

Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware

Apache Hadoop is an open source project
governed by the Apache Software Foundation
(ASF) that allows you to gain insight from massive
amounts of structured and unstructured data
quickly and without significant investment.


Page 8

Drivers of Hadoop Adoption

New Business
Applications
From NEW types of
Data (or existing
types for longer)


Page 9

Most Common NEW TYPES OF DATA
1. Sentiment
Understand how your customers feel about your brand and
products – right now

2. Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3. Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines

4. Geographic
Analyze location-based data to manage operations where
they occur

5. Server Logs
Research logs to diagnose process failures and prevent
security breaches

6. Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents


Value

Keep Existing Data Around Longer
• Online archive
– Data that was once moved to tape can
now be queried to understand long term trends

• Compliance retention
– Industry specific requirements for retention
of data

Value

• Combine with external historical data sources
– Weather, survey, research, purchased, etc.


Drivers of Hadoop Adoption
Architectural
A Modern Data
Architecture
Complement your existing data
systems: the right workload in the
right place

New Business
Applications


Page 12

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role
in the Modern Data Architecture

Integrated

Key Services

Interoperable with
existing data center
investments

Platform, operational and
data services essential for
the enterprise

Skills
Leverage your existing
skills: development,
operations, analytics


Page 13

Requirements for Enterprise Hadoop

1
2
3

Key Services
Platform, Operational and
Data services essential
for the enterprise

OPERATIONAL
SERVICES
AMBARI

HBASE

PIG

SQOOP

HIVE &
HCATALOG

LOAD &
EXTRACT

Skills

NFS

CORE
PLATFORM
SERVICES

Integrated

WebHDFS

KNOX*

MAP
REDUCE

TEZ

YARN
HDFS
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS
DATA PLATFORM (HDP)

Engineered with existing
data center investments
OS/VM


FLUME

FALCON*
OOZIE

analytics, operations

DATA
SERVICES

Cloud

Appliance

Page 14


3


Integration

DEVELOP
ANALYZE

2

Skills

data services essential
for the enterprise

OPERATE

1

Key Services
COLLECT

PROCESS

BUILD

EXPLORE

QUERY

DELIVER

PROVISION

MANAGE

MONITOR



Page 15

Familiar and Existing Tools

3


Integration

DEVELOP
ANALYZE

2

Skills

data services essential
for the enterprise

OPERATE

1

Key Services
COLLECT

PROCESS

BUILD

EXPLORE

QUERY

DELIVER

PROVISION

MANAGE

MONITOR



Page 16

APPLICATIONS

Business
Analytics

Custom
Applications

Packaged
Applications

Integrated with
DEV & DATA
TOOLS

Applications
BUILD &

DATA SYSTEM

Business Intelligence,
TEST
Developer IDEs,
Data Integration

SOURCES

3

OPERATIONAL
TOOLS
RDBMS

EDW

MANAGE &
Systems
MONITOR

MPP

Data Systems & Storage,
Systems Management

REPOSITORIES

Platforms

Integration
Existing Sources



Emerging Sources

Operating Systems,
Virtualization, Cloud,
Appliances

Page 17

SOURCES

DATA SYSTEM

APPLICATIONS

A Modern Data Architecture Applied
Business
Analytics

Custom
Applications

Packaged
Applications

Complement data systems
RDBMS

EDW

MPP

Right workload right place

REPOSITORIES

Existing Sources

Emerging Sources



© Hortonworks Inc. 2013 - Confidential

Page 18

APPLICATIONS

Kognitio in the Modern Data Architecture
Business
Analytics

Business
Intelligence Tools

OLAP Clients
DEV & DATA
TOOLS

SOURCES

DATA SYSTEM

In‐memory MPP Accelerator

BUILD &
TEST

OPERATIONAL
TOOLS
RDBMS

EDW

MANAGE &
MONITOR

MPP

REPOSITORIES

Existing Sources

Emerging Sources




Page 19

APPLICATIONS

BusinessObjects BI
DEV & DATA TOOLS

DATA SYSTEM


OPERATIONAL TOOLS

RDBMS

HANA

EDW

MPP

SOURCES

INFRASTRUCTURE

Existing Sources

Emerging Sources




Page 20

Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• Kognitio’s role in the MDA
• Q&A


Page 21

Hadoop and the new BI
Requirements for Hadoop’s Role
in the Modern Data Architecture

1

Integrated
Interoperable with
existing data center
investments


2

Skills

3

Key Services
data services essential for
the enterprise

operations, analytics

Page 22

Motivation
• Historical architecture = Existing investment

1

Key Services
Platform, Operational a
Data services essential
for the enterprise

Cognos

• Must plug-and-play with MDA
– Do not disrupt, enhance!

• Performance and behavior expectations
– Dynamic ad-hoc access
– Drill unlimited
– Report on-demand


Page 23

Business [Intelligence] Desires

More timely
Lower latency
Richer data model
More granularity
Better concurrency
Self service


Page 24

BI Activity

Insulate the Hadoop cluster


Page 25

In-memory analytical platform
• Software only
– Easy to deploy alongside HDP
– Simple two stage install

• Commodity Hardware

3

Integration

– X86/64 Linux Platform with 10GbE network – same as HDP
– Biased to more RAM and less disk

• Scale-out MPP
– Same compute model as Hadoop
– Strong focus on 100% effective CPU utilization for any given query

• Exploits features of underlying persistent store
– Simple ‘Pull data’ access methods
– Parallelism – all HDP nodes intercommunicating with all Kognitio nodes

• ANSI 2011 SQL
– Mature fully featured
– Transaction processing capable

• Not-only-SQL

2

Skills

– Any script or binaries executed in-line within SQL queries

Page 26

Tight Integration

3

• Map-reduce Connector
– Filtered access

Integration

• HDFS Connector
– Low Latency access
Page 27

So why In-memory?

INSTANT WAIT

• Exploit the ‘Dynamic’ access element of ‘D’-RAM
– Data placed in memory in structures best suited for CPUs, not for disks

Page 28

In-memory – getting work done


Page 29

Building Data Models
• Hadoop is a great repository
• Perfect to handle volume and variability without effort
• Perfect to ‘triage’ the data, to reshape, filter and project into…
• Data Virtualisation / Logical Data Warehouse
… but with the associated horsepower to dynamically analyse the data
• Plug standard tools straight in – not a Java programmer in sight!
• Central control and security
• Data model shelf life getting shorter – sandboxes and workbenches
– Build on-demand to meet todays needs – just pull data from your HDP
– Lots of project based discovery and analytics
– World is changing rapidly
– Ever tighter feedback loops


Page 30

Analytical Complexity

Increasing Computation
Machine learning
algorithms
Behaviour
modelling
Statistical
Analysis

Dynamic
Simulation

Clustering

Dynamic
Interaction
Reporting &
BPM

Campaign
Management

Fraud
detection

Technology/Automation

Page 31

The Analytical Enterprise

Data
Scientist

Systems
Admin

Business
Analyst

Key: “Graduation”
• Projects will need to easily Graduate
from the Data Science Lab and
become part of Business as Usual

Mature SQL atop Hadoop
Kognitio is an in‐memory
analytical platform that is tightly
integrated with Hadoop for high‐
performance advanced analytics
that make Big Data more
consumable for enterprises,
especially those with mature BI
environments or engrained
tools.

• Powering advanced analytics at
organizations worldwide, such as:

• Privately held
• Invented the in‐memory analytical platform
• Labs in the UK ‐ HQ in New York, NY


Page 33

APPLICATIONS

Business
Analytics

Business
Intelligence Tools

OLAP Clients
DEV & DATA
TOOLS

SOURCES

DATA SYSTEM


BUILD &
TEST

OPERATIONAL
TOOLS
RDBMS

EDW

MANAGE &
MONITOR

MPP

REPOSITORIES

Existing Sources

Emerging Sources




Page 34

Forrester Wave: a “strong performer”

•

•

Kognitio’s EDW is a strong, cost-effective
alternative to SAP HANA.

•

Kognitio…was designed from the start as an
MPP (distributed) in-memory RDBMS,
making extensive use of RAM-based
processing for maximum performance.

•

© Forrester Corp. Used with permission.

Kognitio’s entirely in-memory, distributed
EDW is appealing for customers looking for
fast performance on commodity hardware

Download a complimentary copy of the
full report at www.kognitio.com/wave

Page 35

The Modern Data Architecture
More about Kognito and Hortonworks
http://hortonworks.com/partner/kognitio

Get started with Hortonworks Sandbox
http://hortonworks.com/hadoop-tutorial/

Follow us:
@hortonworks @kognitio

Question & Answer session will be conducted electronically,
using the panel to the right of your screen
Today’s Slides available at: www.slideshare.net/kognitio

Modern Data Architecture: In-Memory with Hadoop - the new BI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Modern Data Architecture: In-Memory with Hadoop - the new BI

Similar to Modern Data Architecture: In-Memory with Hadoop - the new BI (20)

Recently uploaded

Recently uploaded (20)

Modern Data Architecture: In-Memory with Hadoop - the new BI