December 16, 2013

Making Big Data Analytics
Fast and Easy
Using Actian, Yellowfin and Hadoop

John Ryan

Ryan Templeton

Ivan Seow

Marketing Manager APAC
Actian Corporation

Snr Solutions Architect
Actian Corporation

Snr Technical Consultant
Yellowfin
Take Action on Big Data

Making BI Easy

2
Take Action on Big Data

Making BI Easy

Fastest Data Prep Engine
Fastest Hadoop Loader
Fastest Single Node Database
Fastest MPP Database
Huge library of Analytical Functions

3
Take Action on Big Data

Making BI Easy

Fastest Data Prep Engine

Ranked #1 BI Vendor
Dresner Global BI Study 2012 & 13

Fastest Hadoop Loader

#1 Dashboard Vendor:
BARC BI Survey 12

Fastest Single Node Database
Fastest MPP Database

#1 Enterprise Reporting Vendor:
BARC BI Survey 13

Huge library of Analytical Functions

Gartner: ‘Vendor to Consider’

4
Today’s Agenda
1.  Big Data Analytics with Hadoop
2.  Making Analytics in Hadoop Fast & Easy
3.  Customer Example (Telecom)
4.  Demo: From Data to Dashboard
• 
• 

Making Hadoop Fast and Easy
Making BI Fast and Easy

5.  Summary

5
Big Data Analytics
With Hadoop
Confidential © 2012 Actian Corporation

6
73%

Expect to have HDFS
in production

Based on 263 respondents
TDWI Best Practices Report – Q2 2013
7
71%

Big Data Source for Analytics
Most Likely to Benefit from Hadoop

Based on 263 respondents
TDWI Best Practices Report – Q2 2013
8
Why is analytics inside Hadoop
so hard and slow?

HDFS is a file system,
not a database

Need a Data Scientist

Queries not standard SQL,
only resemble SQL

MapReduce inefficient
for analytic queries

9
Making Big Data with
Hadoop Fast and Easy
With Actian and Yellowfin
Confidential © 2012 Actian Corporation

10
Actian Big Data Analytic Platform
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

11
Actian Big Data Analytic Platform
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

12
Actian Big Data Analytic Platform
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

13
Actian Big Data Analytic Platform
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

14
Actian Big Data Analytic Platform
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

15
Industry Leading Performance
Process Hadoop Data Faster

Analyze Data Faster

Dataflow vs PIG (MapReduce)

Database Benchmarks

DBT-3@1TB : Run times

TPC-H QphH@1TB Benchmarks (non-clustered)

16
Today’s demonstration

Connect
Hadoop

Transform
Data

Actian Dataflow

Parallel
Load

Fast
Database
Queries

Actian Vector

Fast
Analysis

BI Visualization Layer
Yellowfin BI

17
Telecom Example
Storing CDR Log Files inside Hadoop
Confidential © 2012 Actian Corporation

18
Customer Use Case
  Tier two telecom provider

  Planning for large growth
with minimal staff impact

  Business demands deeper insights

19
IT Challenges

Collect, manage, process
CDR data in Hadoop

Swamped with data.
Network switch dumps 200MB /min
during peak times.
Hundreds of thousands of records per drop.
170 columns.

Users are domain experts,
not data scientists

Too hard to analyze
Raw data must first be distilled
and enriched to gain insight

20
What the business was asking for
Fastest time to
decision

Speed up processing by an order of magnitude

Increased granularity
of analysis

Without increasing processing times or bogging down
backend

Proactive analysis,
not reactive

Enable trend analysis and predictive capabilities

Answer real
business questions

e.g. visual insight for near real-time customer and vendor
performance, determine routing performance
optimization, etc

Scale for future
growth

Extensible for future capabilities and scalable growth

21
Specific Business Questions - CDR Analysis
  Answer Service Rate (ASR & Adjusted ASR)
•  Calls completed vs. route attempts (vendor performance)
•  Calls completed vs. call attempts (customer satisfaction)

  Opportunity Monitor
•  Calculate profit/loss per call due to routing path chosen

  Post Dial Delay (PDD)
•  Annoying delay until path through network selected

  Analysis of near real time quality measures
•  Call duration, jitter and packet loss

  Trends and correlations of above metrics

22
CDR Workflow Overview
CONNECT

TRANSFORM
Filter data Logical functions

Extract failed
routing attempts

Split flow for separate
processing rules

Meta-node
encapsulates
processing

PARALLEL
DATA
LOAD

23
Data processing – Execution Plan
Compiled to a set
of physical graphs

Phase 1

Phase 2

Reader

FilterRows

DeriveFields

Group(partial)

Repartition

Group(final)

Writer

Reader

FilterRows

DeriveFields

Group(partial)

Repartition

Group(final)

Writer

Reader

FilterRows

DeriveFields

Group(partial)

Repartition

Group(final)

Writer

Reader

FilterRows

DeriveFields

Group(partial)

Repartition

Group(final)

Writer

24
Demo
Making Big Data Analytics Fast and Easy
Confidential © 2012 Actian Corporation

25
Customer Take Aways – Actionable Insights

FAST
Processing streaming
CDR data in seconds

26
Customer Take Aways - Analysis

Deeper
Analysis
visibility at the Area Code
and Exchange level

27
Customer Take Aways – Cost Savings

20,000

updates made to routing
tables during first week
of collecting data

28
Customer Take Aways - Scalability

8.9 Billion
rows of data collected
during first 6 months

29
Solution Architecture
Clustered Execution

Hadoop
Collection

Parallel Loading

Paraccel
Dataflow

Vectorwise
Very fast reporting
database

Extraction
Cleansing

Yellowfin BI

End Users

•  Dashboard
•  Ad Hoc
•  Statistics
•  Data Mining
•  Analytics
Desktop &
Mobile Devices

Enrichment
Aggregation
Data
Retention

Analysis
Mining

OSS/BSS

30

30
Summary – Take Action on Big Data
Big Data Storage

Business
Intelligence

Accelerating Big Data 2.0

Connect

Prepare

Optimize

Analyze

Enterprise

VALUE

DATA
Applications

DW

Advanced technology platform:

Multiple deployment options:

Industry leading:

  On-premise

  Scale

  Cloud

  Performance

  Hybrid

  Complexity

  Embedded

  Cost (price/performance)
  Time to Value

31
Actian

Ivan Seow

www.actian.com

Ivan.Seow@Yellowfin.BI

Yellowfin

John Ryan

www.Yellowfin.bi

John.Ryan@actian.com

Ryan Templeton
Ryan.Templeton@actian.com

Questions
Confidential © 2012 Actian Corporation

32

Making Big Data Analytics with Hadoop fast & easy (webinar slides)

  • 1.
    December 16, 2013 MakingBig Data Analytics Fast and Easy Using Actian, Yellowfin and Hadoop John Ryan Ryan Templeton Ivan Seow Marketing Manager APAC Actian Corporation Snr Solutions Architect Actian Corporation Snr Technical Consultant Yellowfin
  • 2.
    Take Action onBig Data Making BI Easy 2
  • 3.
    Take Action onBig Data Making BI Easy Fastest Data Prep Engine Fastest Hadoop Loader Fastest Single Node Database Fastest MPP Database Huge library of Analytical Functions 3
  • 4.
    Take Action onBig Data Making BI Easy Fastest Data Prep Engine Ranked #1 BI Vendor Dresner Global BI Study 2012 & 13 Fastest Hadoop Loader #1 Dashboard Vendor: BARC BI Survey 12 Fastest Single Node Database Fastest MPP Database #1 Enterprise Reporting Vendor: BARC BI Survey 13 Huge library of Analytical Functions Gartner: ‘Vendor to Consider’ 4
  • 5.
    Today’s Agenda 1.  BigData Analytics with Hadoop 2.  Making Analytics in Hadoop Fast & Easy 3.  Customer Example (Telecom) 4.  Demo: From Data to Dashboard •  •  Making Hadoop Fast and Easy Making BI Fast and Easy 5.  Summary 5
  • 6.
    Big Data Analytics WithHadoop Confidential © 2012 Actian Corporation 6
  • 7.
    73% Expect to haveHDFS in production Based on 263 respondents TDWI Best Practices Report – Q2 2013 7
  • 8.
    71% Big Data Sourcefor Analytics Most Likely to Benefit from Hadoop Based on 263 respondents TDWI Best Practices Report – Q2 2013 8
  • 9.
    Why is analyticsinside Hadoop so hard and slow? HDFS is a file system, not a database Need a Data Scientist Queries not standard SQL, only resemble SQL MapReduce inefficient for analytic queries 9
  • 10.
    Making Big Datawith Hadoop Fast and Easy With Actian and Yellowfin Confidential © 2012 Actian Corporation 10
  • 11.
    Actian Big DataAnalytic Platform Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 11
  • 12.
    Actian Big DataAnalytic Platform Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 12
  • 13.
    Actian Big DataAnalytic Platform Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 13
  • 14.
    Actian Big DataAnalytic Platform Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 14
  • 15.
    Actian Big DataAnalytic Platform Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 15
  • 16.
    Industry Leading Performance ProcessHadoop Data Faster Analyze Data Faster Dataflow vs PIG (MapReduce) Database Benchmarks DBT-3@1TB : Run times TPC-H QphH@1TB Benchmarks (non-clustered) 16
  • 17.
  • 18.
    Telecom Example Storing CDRLog Files inside Hadoop Confidential © 2012 Actian Corporation 18
  • 19.
    Customer Use Case  Tier two telecom provider   Planning for large growth with minimal staff impact   Business demands deeper insights 19
  • 20.
    IT Challenges Collect, manage,process CDR data in Hadoop Swamped with data. Network switch dumps 200MB /min during peak times. Hundreds of thousands of records per drop. 170 columns. Users are domain experts, not data scientists Too hard to analyze Raw data must first be distilled and enriched to gain insight 20
  • 21.
    What the businesswas asking for Fastest time to decision Speed up processing by an order of magnitude Increased granularity of analysis Without increasing processing times or bogging down backend Proactive analysis, not reactive Enable trend analysis and predictive capabilities Answer real business questions e.g. visual insight for near real-time customer and vendor performance, determine routing performance optimization, etc Scale for future growth Extensible for future capabilities and scalable growth 21
  • 22.
    Specific Business Questions- CDR Analysis   Answer Service Rate (ASR & Adjusted ASR) •  Calls completed vs. route attempts (vendor performance) •  Calls completed vs. call attempts (customer satisfaction)   Opportunity Monitor •  Calculate profit/loss per call due to routing path chosen   Post Dial Delay (PDD) •  Annoying delay until path through network selected   Analysis of near real time quality measures •  Call duration, jitter and packet loss   Trends and correlations of above metrics 22
  • 23.
    CDR Workflow Overview CONNECT TRANSFORM Filterdata Logical functions Extract failed routing attempts Split flow for separate processing rules Meta-node encapsulates processing PARALLEL DATA LOAD 23
  • 24.
    Data processing –Execution Plan Compiled to a set of physical graphs Phase 1 Phase 2 Reader FilterRows DeriveFields Group(partial) Repartition Group(final) Writer Reader FilterRows DeriveFields Group(partial) Repartition Group(final) Writer Reader FilterRows DeriveFields Group(partial) Repartition Group(final) Writer Reader FilterRows DeriveFields Group(partial) Repartition Group(final) Writer 24
  • 25.
    Demo Making Big DataAnalytics Fast and Easy Confidential © 2012 Actian Corporation 25
  • 26.
    Customer Take Aways– Actionable Insights FAST Processing streaming CDR data in seconds 26
  • 27.
    Customer Take Aways- Analysis Deeper Analysis visibility at the Area Code and Exchange level 27
  • 28.
    Customer Take Aways– Cost Savings 20,000 updates made to routing tables during first week of collecting data 28
  • 29.
    Customer Take Aways- Scalability 8.9 Billion rows of data collected during first 6 months 29
  • 30.
    Solution Architecture Clustered Execution Hadoop Collection ParallelLoading Paraccel Dataflow Vectorwise Very fast reporting database Extraction Cleansing Yellowfin BI End Users •  Dashboard •  Ad Hoc •  Statistics •  Data Mining •  Analytics Desktop & Mobile Devices Enrichment Aggregation Data Retention Analysis Mining OSS/BSS 30 30
  • 31.
    Summary – TakeAction on Big Data Big Data Storage Business Intelligence Accelerating Big Data 2.0 Connect Prepare Optimize Analyze Enterprise VALUE DATA Applications DW Advanced technology platform: Multiple deployment options: Industry leading:   On-premise   Scale   Cloud   Performance   Hybrid   Complexity   Embedded   Cost (price/performance)   Time to Value 31
  • 32.
    Actian Ivan Seow www.actian.com Ivan.Seow@Yellowfin.BI Yellowfin John Ryan www.Yellowfin.bi John.Ryan@actian.com RyanTempleton Ryan.Templeton@actian.com Questions Confidential © 2012 Actian Corporation 32