SlideShare a Scribd company logo
RUNNING COGNOS ON HADOOP
Cost Effective, Highly Scalable, High Speed
• Introduction
• Running Cognos on Hadoop
– Hadoop Overview
– Hive Overview
– Performance
– BigSheets Demo
• Additional Resources
Agenda
2Copyright 2016 Senturus, Inc. All Rights Reserved
Paul Yip
BigInsights Product Manager
IBM Analytics
Introduction: Today’s Presenters
Copyright 2016 Senturus, Inc. All Rights Reserved 3
John Peterson
CEO and Co-Founder
Senturus, Inc.
Presentation Slide Deck on www.senturus.com
Copyright 2016 Senturus, Inc. All Rights Reserved 4
Resource Library
The purpose of Senturus is to make you
successful with business analytics.
We host dozens of live webinars every
year and offer a comprehensive library of
recorded webinars, demos, white papers,
presentations, case studies, and reviews
of new software releases on our website.
Our content is constantly updated, so
visit us often to see what’s new in the
industry.
www.senturus.com/resources/
Copyright 2016 Senturus, Inc. All Rights Reserved.
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
… AND THE TOOLS TO CONFRONT THEM
THE CHALLENGES OF DATA TODAY
THE BIG MATCH UP
8Copyright 2016 Senturus, Inc. All Rights Reserved.
MEETS * Existing & new systems
The 4 V’s
of Data
• Volume
• Variety
• Veracity
• Velocity
Business Analytics*
• Standard (highly
formatted) Reports
• Dashboards
• Ad Hoc Analysis
• Alerts, etc.
• Predictive Analytics
• Virtually unlimited, low-cost Staging Area
that accepts all data formats
• Easy way to Explore raw data
• Low-cost Archive for past or less used data
• Repository for transformed data (subset)
that fully supports queries from standard BI
tools (typically SQL)
SO WHAT YOU NEED IS…
9Copyright 2016 Senturus, Inc. All Rights Reserved.
… that IBM is a major Hadoop stack vendor
DID YOU KNOW?
10Copyright 2016 Senturus, Inc. All Rights Reserved.
* Other Hadoop vendors include: Amazon, Microsoft, Intel, Pivotal/EMC, Teradata…
How is your organization combining SQL,
standard BI tools and Hadoop?
• Via HIVE
• Via a “value-add” tool like Impala or BigSQL
• Using Hadoop, but not SQL against it
• Not using Hadoop
• Don’t know
POLL
11Copyright 2016 Senturus, Inc. All Rights Reserved.
IBM BIGINSIGHTS & COGNOS
A POSSIBLE SOLUTION
© 2015 IBM Corporation
IBM BigInsights and Cognos
Hadoop Patterns
Paul Yip
BigInsights Product Manager
IBM Toronto Software Lab
ypaul@ca.ibm.com
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
© 2015 IBM Corporation15
What is Hadoop?
 General Formula
1. Bunch of commodity servers (nodes) with internal disk
• Example: 12 x 6TB disk = 72TB per node
2. Network them and install Hadoop
• Example: 20 nodes x 72TB = 1440 TB cluster
3. Result: A big file system and also runs analytics
 Features
 Significantly lower cost than SAN
 End-user / applications just see “files”
 Cluster and data is resilient
 Add performance / capacity by adding more nodes a
a
a
b
b
b
d
d
dc c
c
File1
a
b
c
d
NameNode
DataNodes
© 2015 IBM Corporation16
© 2015 IBM Corporation17
Distributed Analytics Example: MapReduce
 MapReduce computation model
 Data stored in a distributed file system spanning many inexpensive computers
 Bring function to the data
 Distribute application to the compute resources where the data is stored
 Scalable to thousands of nodes and petabytes of data
MapReduce Application
1. Map Phase
(break job into small parts)
2. Shuffle
(transfer interim output
for final processing)
3. Reduce Phase
(boil all output down to
a single result set)
Return a single result setResult Set
Shuffle
public static class TokenizerMapper
extends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritable
one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text val, Context
StringTokenizer itr =
new StringTokenizer(val.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWrita
private IntWritable result = new IntWritable();
public void reduce(Text key,
Iterable<IntWritable> val, Context context){
int sum = 0;
for (IntWritable v : val) {
sum += v.get();
. . .
Distribute map
tasks to cluster
Hadoop Data Nodes
© 2015 IBM Corporation18
OutputReduceMap
Hive provides a SQL interface to MapReduce
SQL
Hive
 The first SQL interface for Hadoop data
 De-facto standard for SQL on Hadoop
 Ships with all major Hadoop distributions
© 2015 IBM Corporation19
SQL on Hadoop Matters for Big Data Analytics
For BI Tools like Cognos
Visualizations from Cognos 10.2.2
© 2015 IBM Corporation20
DOWN SIDE?
Sounds great! But is there a…
© 2015 IBM Corporation21
Hive – Joins in MapReduce
 For joins, MR is used to group data together at the same reducer based
upon the join key
 Mappers read blocks from each “table” in the join
 The <key> is the value of the join key, the <value> is the record to be joined
 Reducer receives a mix of records from each table with the same join key
 Reducers produce the results of the join
reduce
dept 1
reduce
dept 2
reduce
dept 3
1011010
0101001
0011100
1111110
0101001
1010111
0111010
1
1 map
2 map
2
1 map
employees
1011010
0101001
0011110
0111011
1
depts
select e.fname, e.lname, d.dept_name
from employees e, depts d
where e.salary > 30000
and d.dept_id = e.dept_id
© 2015 IBM Corporation22
N-way Joins in MapReduce
 For N-way joins involving different join keys, multiple jobs are used
reduce
dept 1
reduce
dept 2
reduce
dept 3
1011010
0101001
0011100
1111110
0101001
1010111
0111010
1 1 map
2 map
2
1 map
employees
1011010
0101001
0011110
0111011
1
select e.fname, e.lname, d.dept_name, p.phone_type, p.phone_number
from employees e, depts d, emp_phones p
where e.salary > 30000
and d.dept_id = e.dept_id
and p.emp_id = e.emp_id
depts
1011010
0101001
0011100
1111110
0101001
1010111
0111010
1
2
1011010
0101001
0011110
0111011
1
1011010
0101001
0011100
1111110
0101001
1010111
0111010
1
2
1011010
0101001
0011100
1111110
0101001
1010111
0111010
1
2
emp_phones
(temp files)
1 map
2 map
1 map
1 map
2 map
1 map
2 map
reduce
dept 1 reduce
emp_id 1
reduce
emp_id 2
reduce
emp_id N
results
results
results
© 2015 IBM Corporation23
IBM BigInsights
© 2015 IBM Corporation24
Hive is Really 3 Things…
Storage Format, Metastore, and Execution Engine
24
SQL Execution Engine
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)MapReduce
Applications
© 2015 IBM Corporation25
Big SQL Preserves Open Source Foundation
Leverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.
25
SQL Execution Engines
IBM BigSQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)
Applications
© 2015 IBM Corporation26
IBM First/Only to Produce Audited Benchmark
Hadoop-DS (based on TPC-DS) / Oct 2014
 Letters of attestation are
available for both Hadoop-
DS benchmarks at 10TB and
30TB scale
 InfoSizing, Transaction
Processing Performance
Council Certified Auditors
verified both IBM results as
well as results on Cloudera
Impala and HortonWorks
HIVE
 These results are for a non-
TPC benchmark. A subset of
the TPC-DS Benchmark
standard requirements was
implemented
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
© 2015 IBM Corporation28
Performance Test – Hadoop-DS (based on TPC-DS)
20 (Physical Node) Cluster
 TPC-DS stands for Transaction Processing Council – Decision Support (workload) which is
an industry standard benchmark for SQL
Hive 1.2.1
IBM Open Platform V4.1
20 Nodes
Big SQL V4.1
IBM Open Platform V4.1
20 Nodes
Updated
Oct 2015
Results
© 2015 IBM Corporation
But first … is performance everything?
© 2015 IBM Corporation30
Big SQL runs more SQL out-of-box
Big SQL 4.1 Hive / Spark SQL 1.5.0
1 hour 3-4 weeksPorting Effort:
Big SQL is the
only engine that
can execute all 99
queries with
minimal porting
effort
© 2015 IBM Corporation31
Cognos & Hadoop Lessons Learned
Notes from Cognos Development
HIVE
 Very restrictive with respect to join
predicates
 Support of SQL has as history of
many limitations, such as:
• Limited set operation (union) which
had problems
• Lack of usable SQL-OLAP
 Resulting in more local processing
IMPALA
 Join restrictions are partially lifted
 Try to re-write using CROSS JOIN
(but not for outer joins)
 Cannot push FOJ
 Other gaps include
 Set operations (union, intersect
and except)
 Sub-queries
 ORDER BY (because they used to
require LIMIT N)
 Cannot have multiple distinct
aggregates
Queries on Big SQL
work out-of-box
© 2015 IBM Corporation32
Big SQL Security – Best In Class
Role Based Access Control Row Level Security
Colum Level Security Separation of Duties / Audit
BRANCH_A
BRANCH_B
FINANCE
See it in action on YouTube:
https://www.youtube.com/watch?v=N2FN5h25-_s
© 2015 IBM Corporation33
Announced at Strata + Hadoop World Sept 2015:
Big SQL V4.1 vs Hive 1.2.1 Performance Test Update
See it in action on YouTube:
www.youtube.com/watch?v=SYQgzRGhqVU
© 2015 IBM Corporation34
Performance Test Summary
Big SQL V4 vs. Hive 1.2.1 @ 1TB
 In 99 / 99 Queries, Big SQL was Faster
 On Average, Big SQL was 21X faster
 Excluding the Top 5 and Bottom 5 Results, Big SQL was 19X faster
© 2015 IBM Corporation35
IBM BigInsights
© 2015 IBM Corporation36
BigSheets
Browser based analytics tool for BigData
 Explore, visualize, transform
unstructured and structured data
 Visual data cleansing and
analysis
 Filter and enrich content
 Visualize Data
 Export data into common formats
No programming knowledge needed!
© 2015 IBM Corporation37
QUICK DEMO
BigSheets….
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
© 2015 IBM Corporation39
Major Canadian Insurance Company
BigSheets was the primary reason for choosing BigInsights. Client had huge
tables on their mainframe system and their business analysts wanted to have a
way to see the big picture opposed to sub-setting the data. (Traditional Excel way).
BigSheets allowed them to do what they needed to do and they described it as a
"game changer". - They were then able to analyze multiple tables/files that were
100GB large.
Mainframe
(small datasets, incomplete data,
spreadsheet proliferation)
db
subsets
BigInsights (complete data,
centralized data)
before
after
© 2015 IBM Corporation40
BigSheets Empowers Business Users on Hadoop
Tara Paider @ Nationwide
AVP, IT Architecture, Data Management & Analytics
"Nationwide runs Monte Carlo simulations. The data preparation step
normally takes approximately 3 days.
Frustrated with this, as part of a POC, business analysts re-implemented the
transformations on IBM BigInsights and it now runs in 10 minutes.*
Technologies within BigInsights enabled business users to leverage the
power of Hadoop and Map Reduce without advanced programming skills."
*The performance improvement is attributed to pushing computation and transformations close to the data
and the scale out capability of Hadoop.
© 2015 IBM Corporation
Putting it all together…
Big Data Technology Patterns
© 2015 IBM Corporation42
Information Movement & Transformation
Traditional Enterprise Analytic Environment
Data
Sources
Structured
Operational
Staging
Area
Task: Extract Operational Data to Staging Area
Task: Normalize Data for Enterprise Consumability
Task: Provide Guided and Interactive Access
Information
and Insight
Marts BI
Performance
Management
Predictive
Analytics and
Modeling
Task: Deliver Data for Deeper Analysis and Modeling
EDW
Archive
Task: Archive “Cold” Data to Reduce Costs
Task: Tooling to Facilitate Data Movement & Transformation
© 2015 IBM Corporation43
Information Movement & Transformation
Traditional Approach to Improve Analytic Architectures
Data
Sources
Structured
Operational
Information
and Insight
BI
Performance
Management
Predictive
Analytics and
Modeling
Archive
Marts
Expanded
EDW
Staging
Area
Put Staging Area in the EDW
Still only structured (models don’t get more aperture)
Doesn’t accelerate the speed of model lifecycles
$$ Expensive $$
© 2015 IBM Corporation44
Information Movement & Transformation
Faster, Deeper Insights While Reducing Costs
Enterprise
Warehouse
Structured
Operational
BI
Performance
Management
Predictive
Analytics and
Modeling
Marts
Faster Performance – up to 4-digit improvements
Reduces storage 10x – 25x
Load and Go (no tuning, +++)
Staging
Area
Landing
Exploration
Exploration
Discovery
Archive
Sensor
Geospatial
Time Series
Unstructured
External
Social
Day 0 Archive
Streaming
© 2015 IBM Corporation45
Current State: Analytics Development Cycle
Request
IT for data
extraction
Gather
Requirements
Data Integration
Effort Estimates
Solution Design
Infrastructure
Cost Analysis
Business
Case
Development
GO or NO-GO
Management
Approvals
Data Quality &
ETL
Development
Report
Development
Review
Results…
6 Months later
Typical Phases…. High cost of failure
• Business Analysts
• Solution Architects
• Infrastructure Architects
• Database Administrators
• SAN Administrators
• Management
• ETL Developers
• Report Developers
Actors Involved
Bright Idea!
Procurement
DB, storage
SW licenses
© 2015 IBM Corporation46
Target State: Rapid Prototyping
Land
data in
Hadoop
(if not
already)
Explore
Data with
BigSheets
Prototype
Reports
with
Cognos/
BigSQL
GO or No-Go
Management
Approvals
Infrastructure
Cost Analysis
Data Quality &
ETL
Development
Operationalization
Develop a culture of fail fast
Still Required for “Go To Production”
Elapsed Time: Days/WeeksMany Ideas…
• More Precise Requirements Gathering
• More Information about Data Quality
• More Accurate Project Estimates
• More Reliable Business Cases
• Actionable Insights before Production Ready Solution
© 2015 IBM Corporation47
Right Tool, Right Job
Text Analytics
Unstructured Semi Structured Structured
Big Sheets, Big SQL
Cognos w/Big SQL
Cognos +
RDBMS
© 2015 IBM Corporation48
Summary
 Hadoop is a Big File System that can run analytics
 Unlike a database, it can store anything as files (structured, semi-structured,
unstructured)
 Hadoop lowers the cost to pennies per GB – making it possible to have a copy of all data
from source systems
 SQL on Hadoop enabled analytics for the masses – no need to learn map reduce
 IBM BigInsights  BigSheets and Big SQL enables discovery and rapid BI prototyping
“Big SQL makes access to
Hive data faster and more secure”
© 2015 IBM Corporation49
Other Resources….
 Watch Big SQL take on Spark SQL
 https://www.youtube.com/watch?v=bAs74frPUq8
 Improve security for Hive data with Big SQL
 https://www.youtube.com/watch?v=N2FN5h25-_s
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
BUSINESS ANALYTICS:
ARCHITECTED TO SCALE
SENTURUS INTRODUCTION
Copyright 2016 Senturus, Inc. All Rights Reserved.
52
• Dashboards, Reporting & Visualizations
• Data Preparation
• Big Data & Advanced Analytics
• Enterprise Planning
Laser Focused on Business Analytics
Senturus Offerings
• Comprehensive Consulting
• Dedicated Resources
• Training
• Accelerated Development Tools
• Migrations/Upgrades/Installations
• Performance & Optimization
• Jumpstarts
• Project Roadmaps & Assessments
900+ Clients, 2000+ Projects, 16+ Years
54Copyright 2016 Senturus, Inc. All Rights Reserved
ADDITIONAL RESOURCES
55
• Big SQL Technology Sandbox is a large, shared
environment for data science
• You can use it to run R, SQL, Spark, and Hadoop jobs
• It is a high performance cluster demonstrating the
advantages of parallelized processing of big data sets
For more information, see link on Senturus website
Demo Cloud: Big SQL Technology Sandbox
56Copyright 2016 Senturus, Inc. All Rights Reserved.
You may also be interested in the following YouTube
videos authored by Paul Yip.
• Access Apache Hive Data Faster and More Securely
with Big SQL
• Spark vs IBM Big SQL Performance
• Hadoop HDFS vs Spectrum Scale (GPFS)
Related Videos
Free Resources on www.senturus.com
Copyright 2016 Senturus, Inc. All Rights Reserved 58
www.senturus.com/events
Upcoming Events
59Copyright 2016 Senturus, Inc. All Rights Reserved.
This slide deck is from the webinar: Running Cognos on
Hadoop: Cost Effective, Highly Scalable, High Speed
To view the FREE recording of the presentation, and
download this deck, go to:
www.senturus.com/resources/running-cognos-on-hadoop/
Hear the Recording
Copyright 2016 Senturus, Inc. All Rights Reserved.
Thank You!
www.senturus.com
info@senturus.com
888 601 6010
Copyright 2016 by Senturus, Inc.
This entire presentation is copyrighted and may not be
reused or distributed without the written consent of
Senturus, Inc.

More Related Content

What's hot

SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 

What's hot (19)

Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKSUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 

Viewers also liked

Contemporary hero’s quest
Contemporary hero’s questContemporary hero’s quest
Contemporary hero’s quest
USMEPCOM
 
Construction of marine and offshore structures(2007)
Construction of marine and offshore structures(2007)Construction of marine and offshore structures(2007)
Construction of marine and offshore structures(2007)
calir2lune
 
Customer visits - 10 Do's and Don'ts
Customer visits - 10 Do's and Don'tsCustomer visits - 10 Do's and Don'ts
Customer visits - 10 Do's and Don'ts
Gopal Shenoy
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
Chiranjeevi Adi
 

Viewers also liked (20)

Cognos Analytics Version 11 Questions Answered
Cognos Analytics Version 11 Questions AnsweredCognos Analytics Version 11 Questions Answered
Cognos Analytics Version 11 Questions Answered
 
7 Deadly Real Estate Prospecting Sins
7 Deadly Real Estate Prospecting Sins7 Deadly Real Estate Prospecting Sins
7 Deadly Real Estate Prospecting Sins
 
characterization of polymers
characterization of polymerscharacterization of polymers
characterization of polymers
 
Contemporary hero’s quest
Contemporary hero’s questContemporary hero’s quest
Contemporary hero’s quest
 
centrifuge principle and application
centrifuge principle and applicationcentrifuge principle and application
centrifuge principle and application
 
Construction of marine and offshore structures(2007)
Construction of marine and offshore structures(2007)Construction of marine and offshore structures(2007)
Construction of marine and offshore structures(2007)
 
Construction and demolition waste
Construction and demolition wasteConstruction and demolition waste
Construction and demolition waste
 
Customer visits - 10 Do's and Don'ts
Customer visits - 10 Do's and Don'tsCustomer visits - 10 Do's and Don'ts
Customer visits - 10 Do's and Don'ts
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
chapter 16 - social psychology
 chapter 16 - social psychology chapter 16 - social psychology
chapter 16 - social psychology
 
Emerging Trends in Clinical Data Management
Emerging Trends in Clinical Data ManagementEmerging Trends in Clinical Data Management
Emerging Trends in Clinical Data Management
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Introduction to Mobile Core Network
Introduction to Mobile Core NetworkIntroduction to Mobile Core Network
Introduction to Mobile Core Network
 
Cell Culture BASICS
Cell Culture  BASICSCell Culture  BASICS
Cell Culture BASICS
 
Cash Flow Statement PPT
Cash Flow Statement PPTCash Flow Statement PPT
Cash Flow Statement PPT
 
Co-op Marketing Strategies: Building a Better Co-op/MDF Program
Co-op Marketing Strategies: Building a Better Co-op/MDF ProgramCo-op Marketing Strategies: Building a Better Co-op/MDF Program
Co-op Marketing Strategies: Building a Better Co-op/MDF Program
 
Organizational Culture
Organizational CultureOrganizational Culture
Organizational Culture
 
Guide to Construction Procurement Strategies
Guide to Construction Procurement StrategiesGuide to Construction Procurement Strategies
Guide to Construction Procurement Strategies
 
Cell suspension culture
Cell suspension cultureCell suspension culture
Cell suspension culture
 
Final - Comcast
Final - ComcastFinal - Comcast
Final - Comcast
 

Similar to Running Cognos on Hadoop

SATISH NAKKA-04222015
SATISH NAKKA-04222015SATISH NAKKA-04222015
SATISH NAKKA-04222015
Satish Nakka
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
Chandan Das
 

Similar to Running Cognos on Hadoop (20)

NaliniProfile
NaliniProfileNaliniProfile
NaliniProfile
 
Présentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalPrésentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à Montréal
 
Global Innovation Nights - Spark
Global Innovation Nights - SparkGlobal Innovation Nights - Spark
Global Innovation Nights - Spark
 
Présentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecPrésentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à Québec
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Nyc mule soft_meetup_13_march_2021
Nyc mule soft_meetup_13_march_2021Nyc mule soft_meetup_13_march_2021
Nyc mule soft_meetup_13_march_2021
 
SATISH NAKKA-04222015
SATISH NAKKA-04222015SATISH NAKKA-04222015
SATISH NAKKA-04222015
 
Choosing the Right Tool for the Job: Cognos Workspace vs. Traditional Studios...
Choosing the Right Tool for the Job: Cognos Workspace vs. Traditional Studios...Choosing the Right Tool for the Job: Cognos Workspace vs. Traditional Studios...
Choosing the Right Tool for the Job: Cognos Workspace vs. Traditional Studios...
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
 
Build a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OSBuild a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OS
 
Transitioning to Cognos Workspace Advanced: Migrating from Query & Analysis S...
Transitioning to Cognos Workspace Advanced: Migrating from Query & Analysis S...Transitioning to Cognos Workspace Advanced: Migrating from Query & Analysis S...
Transitioning to Cognos Workspace Advanced: Migrating from Query & Analysis S...
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
Packaging Automation Best Practices for InduSoft Web Studio
Packaging Automation Best Practices for InduSoft Web StudioPackaging Automation Best Practices for InduSoft Web Studio
Packaging Automation Best Practices for InduSoft Web Studio
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
 
How to Architect and Develop Cloud Native Applications
How to Architect and Develop Cloud Native ApplicationsHow to Architect and Develop Cloud Native Applications
How to Architect and Develop Cloud Native Applications
 

More from Senturus

More from Senturus (20)

Power BI Gateway: Understanding, Installing, Configuring
Power BI Gateway: Understanding, Installing, ConfiguringPower BI Gateway: Understanding, Installing, Configuring
Power BI Gateway: Understanding, Installing, Configuring
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Power Automate for Power BI: Getting Started
Power Automate for Power BI: Getting StartedPower Automate for Power BI: Getting Started
Power Automate for Power BI: Getting Started
 
Collaborative BI: 3 Ways to Use Cognos with Power BI & Tableau
Collaborative BI:  3 Ways to Use Cognos with Power BI & TableauCollaborative BI:  3 Ways to Use Cognos with Power BI & Tableau
Collaborative BI: 3 Ways to Use Cognos with Power BI & Tableau
 
Tips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1xTips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1x
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI Migration
 
4 Common Analytics Reporting Errors to Avoid
4 Common Analytics Reporting Errors to Avoid4 Common Analytics Reporting Errors to Avoid
4 Common Analytics Reporting Errors to Avoid
 
Extending Power BI Functionality with R
Extending Power BI Functionality with RExtending Power BI Functionality with R
Extending Power BI Functionality with R
 
Take Control of Your Cloud
Take Control of Your CloudTake Control of Your Cloud
Take Control of Your Cloud
 
Using Python with Power BI
Using Python with Power BIUsing Python with Power BI
Using Python with Power BI
 
User-Friendly Power BI Report Nav
User-Friendly Power BI Report NavUser-Friendly Power BI Report Nav
User-Friendly Power BI Report Nav
 
Streamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & ConsolidationsStreamline Cognos Migrations & Consolidations
Streamline Cognos Migrations & Consolidations
 
What’s New in Cognos 11.2.1
What’s New in Cognos 11.2.1What’s New in Cognos 11.2.1
What’s New in Cognos 11.2.1
 
Planning for a Power BI Enterprise Deployment
Planning for a Power BI Enterprise DeploymentPlanning for a Power BI Enterprise Deployment
Planning for a Power BI Enterprise Deployment
 
Power BI Report Builder & Paginated Reports
Power BI Report Builder & Paginated Reports Power BI Report Builder & Paginated Reports
Power BI Report Builder & Paginated Reports
 
Tableau: 6 Ways to Publish & Share Dashboards
Tableau: 6 Ways to Publish & Share DashboardsTableau: 6 Ways to Publish & Share Dashboards
Tableau: 6 Ways to Publish & Share Dashboards
 
Cognos Analytics 11.2 New Features
Cognos Analytics 11.2 New FeaturesCognos Analytics 11.2 New Features
Cognos Analytics 11.2 New Features
 
Azure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating GameAzure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating Game
 
Secrets of High Performing Report Development Teams
Secrets of High Performing Report Development TeamsSecrets of High Performing Report Development Teams
Secrets of High Performing Report Development Teams
 
Power BI: Data Cleansing & Power Query Editor
Power BI: Data Cleansing & Power Query EditorPower BI: Data Cleansing & Power Query Editor
Power BI: Data Cleansing & Power Query Editor
 

Recently uploaded

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 

Running Cognos on Hadoop

  • 1. RUNNING COGNOS ON HADOOP Cost Effective, Highly Scalable, High Speed
  • 2. • Introduction • Running Cognos on Hadoop – Hadoop Overview – Hive Overview – Performance – BigSheets Demo • Additional Resources Agenda 2Copyright 2016 Senturus, Inc. All Rights Reserved
  • 3. Paul Yip BigInsights Product Manager IBM Analytics Introduction: Today’s Presenters Copyright 2016 Senturus, Inc. All Rights Reserved 3 John Peterson CEO and Co-Founder Senturus, Inc.
  • 4. Presentation Slide Deck on www.senturus.com Copyright 2016 Senturus, Inc. All Rights Reserved 4
  • 5. Resource Library The purpose of Senturus is to make you successful with business analytics. We host dozens of live webinars every year and offer a comprehensive library of recorded webinars, demos, white papers, presentations, case studies, and reviews of new software releases on our website. Our content is constantly updated, so visit us often to see what’s new in the industry. www.senturus.com/resources/ Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 6. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 7. … AND THE TOOLS TO CONFRONT THEM THE CHALLENGES OF DATA TODAY
  • 8. THE BIG MATCH UP 8Copyright 2016 Senturus, Inc. All Rights Reserved. MEETS * Existing & new systems The 4 V’s of Data • Volume • Variety • Veracity • Velocity Business Analytics* • Standard (highly formatted) Reports • Dashboards • Ad Hoc Analysis • Alerts, etc. • Predictive Analytics
  • 9. • Virtually unlimited, low-cost Staging Area that accepts all data formats • Easy way to Explore raw data • Low-cost Archive for past or less used data • Repository for transformed data (subset) that fully supports queries from standard BI tools (typically SQL) SO WHAT YOU NEED IS… 9Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 10. … that IBM is a major Hadoop stack vendor DID YOU KNOW? 10Copyright 2016 Senturus, Inc. All Rights Reserved. * Other Hadoop vendors include: Amazon, Microsoft, Intel, Pivotal/EMC, Teradata…
  • 11. How is your organization combining SQL, standard BI tools and Hadoop? • Via HIVE • Via a “value-add” tool like Impala or BigSQL • Using Hadoop, but not SQL against it • Not using Hadoop • Don’t know POLL 11Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 12. IBM BIGINSIGHTS & COGNOS A POSSIBLE SOLUTION
  • 13. © 2015 IBM Corporation IBM BigInsights and Cognos Hadoop Patterns Paul Yip BigInsights Product Manager IBM Toronto Software Lab ypaul@ca.ibm.com
  • 14. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 15. © 2015 IBM Corporation15 What is Hadoop?  General Formula 1. Bunch of commodity servers (nodes) with internal disk • Example: 12 x 6TB disk = 72TB per node 2. Network them and install Hadoop • Example: 20 nodes x 72TB = 1440 TB cluster 3. Result: A big file system and also runs analytics  Features  Significantly lower cost than SAN  End-user / applications just see “files”  Cluster and data is resilient  Add performance / capacity by adding more nodes a a a b b b d d dc c c File1 a b c d NameNode DataNodes
  • 16. © 2015 IBM Corporation16
  • 17. © 2015 IBM Corporation17 Distributed Analytics Example: MapReduce  MapReduce computation model  Data stored in a distributed file system spanning many inexpensive computers  Bring function to the data  Distribute application to the compute resources where the data is stored  Scalable to thousands of nodes and petabytes of data MapReduce Application 1. Map Phase (break job into small parts) 2. Shuffle (transfer interim output for final processing) 3. Reduce Phase (boil all output down to a single result set) Return a single result setResult Set Shuffle public static class TokenizerMapper extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text val, Context StringTokenizer itr = new StringTokenizer(val.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> val, Context context){ int sum = 0; for (IntWritable v : val) { sum += v.get(); . . . Distribute map tasks to cluster Hadoop Data Nodes
  • 18. © 2015 IBM Corporation18 OutputReduceMap Hive provides a SQL interface to MapReduce SQL Hive  The first SQL interface for Hadoop data  De-facto standard for SQL on Hadoop  Ships with all major Hadoop distributions
  • 19. © 2015 IBM Corporation19 SQL on Hadoop Matters for Big Data Analytics For BI Tools like Cognos Visualizations from Cognos 10.2.2
  • 20. © 2015 IBM Corporation20 DOWN SIDE? Sounds great! But is there a…
  • 21. © 2015 IBM Corporation21 Hive – Joins in MapReduce  For joins, MR is used to group data together at the same reducer based upon the join key  Mappers read blocks from each “table” in the join  The <key> is the value of the join key, the <value> is the record to be joined  Reducer receives a mix of records from each table with the same join key  Reducers produce the results of the join reduce dept 1 reduce dept 2 reduce dept 3 1011010 0101001 0011100 1111110 0101001 1010111 0111010 1 1 map 2 map 2 1 map employees 1011010 0101001 0011110 0111011 1 depts select e.fname, e.lname, d.dept_name from employees e, depts d where e.salary > 30000 and d.dept_id = e.dept_id
  • 22. © 2015 IBM Corporation22 N-way Joins in MapReduce  For N-way joins involving different join keys, multiple jobs are used reduce dept 1 reduce dept 2 reduce dept 3 1011010 0101001 0011100 1111110 0101001 1010111 0111010 1 1 map 2 map 2 1 map employees 1011010 0101001 0011110 0111011 1 select e.fname, e.lname, d.dept_name, p.phone_type, p.phone_number from employees e, depts d, emp_phones p where e.salary > 30000 and d.dept_id = e.dept_id and p.emp_id = e.emp_id depts 1011010 0101001 0011100 1111110 0101001 1010111 0111010 1 2 1011010 0101001 0011110 0111011 1 1011010 0101001 0011100 1111110 0101001 1010111 0111010 1 2 1011010 0101001 0011100 1111110 0101001 1010111 0111010 1 2 emp_phones (temp files) 1 map 2 map 1 map 1 map 2 map 1 map 2 map reduce dept 1 reduce emp_id 1 reduce emp_id 2 reduce emp_id N results results results
  • 23. © 2015 IBM Corporation23 IBM BigInsights
  • 24. © 2015 IBM Corporation24 Hive is Really 3 Things… Storage Format, Metastore, and Execution Engine 24 SQL Execution Engine Hive (Open Source) Hive Storage Model (open source) CSV Parquet RC Others…Tab Delim. Hive Metastore (open source)MapReduce Applications
  • 25. © 2015 IBM Corporation25 Big SQL Preserves Open Source Foundation Leverages Hive metastore and storage formats. No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time. 25 SQL Execution Engines IBM BigSQL (IBM) Hive (Open Source) Hive Storage Model (open source) CSV Parquet RC Others…Tab Delim. Hive Metastore (open source) Applications
  • 26. © 2015 IBM Corporation26 IBM First/Only to Produce Audited Benchmark Hadoop-DS (based on TPC-DS) / Oct 2014  Letters of attestation are available for both Hadoop- DS benchmarks at 10TB and 30TB scale  InfoSizing, Transaction Processing Performance Council Certified Auditors verified both IBM results as well as results on Cloudera Impala and HortonWorks HIVE  These results are for a non- TPC benchmark. A subset of the TPC-DS Benchmark standard requirements was implemented
  • 27. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 28. © 2015 IBM Corporation28 Performance Test – Hadoop-DS (based on TPC-DS) 20 (Physical Node) Cluster  TPC-DS stands for Transaction Processing Council – Decision Support (workload) which is an industry standard benchmark for SQL Hive 1.2.1 IBM Open Platform V4.1 20 Nodes Big SQL V4.1 IBM Open Platform V4.1 20 Nodes Updated Oct 2015 Results
  • 29. © 2015 IBM Corporation But first … is performance everything?
  • 30. © 2015 IBM Corporation30 Big SQL runs more SQL out-of-box Big SQL 4.1 Hive / Spark SQL 1.5.0 1 hour 3-4 weeksPorting Effort: Big SQL is the only engine that can execute all 99 queries with minimal porting effort
  • 31. © 2015 IBM Corporation31 Cognos & Hadoop Lessons Learned Notes from Cognos Development HIVE  Very restrictive with respect to join predicates  Support of SQL has as history of many limitations, such as: • Limited set operation (union) which had problems • Lack of usable SQL-OLAP  Resulting in more local processing IMPALA  Join restrictions are partially lifted  Try to re-write using CROSS JOIN (but not for outer joins)  Cannot push FOJ  Other gaps include  Set operations (union, intersect and except)  Sub-queries  ORDER BY (because they used to require LIMIT N)  Cannot have multiple distinct aggregates Queries on Big SQL work out-of-box
  • 32. © 2015 IBM Corporation32 Big SQL Security – Best In Class Role Based Access Control Row Level Security Colum Level Security Separation of Duties / Audit BRANCH_A BRANCH_B FINANCE See it in action on YouTube: https://www.youtube.com/watch?v=N2FN5h25-_s
  • 33. © 2015 IBM Corporation33 Announced at Strata + Hadoop World Sept 2015: Big SQL V4.1 vs Hive 1.2.1 Performance Test Update See it in action on YouTube: www.youtube.com/watch?v=SYQgzRGhqVU
  • 34. © 2015 IBM Corporation34 Performance Test Summary Big SQL V4 vs. Hive 1.2.1 @ 1TB  In 99 / 99 Queries, Big SQL was Faster  On Average, Big SQL was 21X faster  Excluding the Top 5 and Bottom 5 Results, Big SQL was 19X faster
  • 35. © 2015 IBM Corporation35 IBM BigInsights
  • 36. © 2015 IBM Corporation36 BigSheets Browser based analytics tool for BigData  Explore, visualize, transform unstructured and structured data  Visual data cleansing and analysis  Filter and enrich content  Visualize Data  Export data into common formats No programming knowledge needed!
  • 37. © 2015 IBM Corporation37 QUICK DEMO BigSheets….
  • 38. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 39. © 2015 IBM Corporation39 Major Canadian Insurance Company BigSheets was the primary reason for choosing BigInsights. Client had huge tables on their mainframe system and their business analysts wanted to have a way to see the big picture opposed to sub-setting the data. (Traditional Excel way). BigSheets allowed them to do what they needed to do and they described it as a "game changer". - They were then able to analyze multiple tables/files that were 100GB large. Mainframe (small datasets, incomplete data, spreadsheet proliferation) db subsets BigInsights (complete data, centralized data) before after
  • 40. © 2015 IBM Corporation40 BigSheets Empowers Business Users on Hadoop Tara Paider @ Nationwide AVP, IT Architecture, Data Management & Analytics "Nationwide runs Monte Carlo simulations. The data preparation step normally takes approximately 3 days. Frustrated with this, as part of a POC, business analysts re-implemented the transformations on IBM BigInsights and it now runs in 10 minutes.* Technologies within BigInsights enabled business users to leverage the power of Hadoop and Map Reduce without advanced programming skills." *The performance improvement is attributed to pushing computation and transformations close to the data and the scale out capability of Hadoop.
  • 41. © 2015 IBM Corporation Putting it all together… Big Data Technology Patterns
  • 42. © 2015 IBM Corporation42 Information Movement & Transformation Traditional Enterprise Analytic Environment Data Sources Structured Operational Staging Area Task: Extract Operational Data to Staging Area Task: Normalize Data for Enterprise Consumability Task: Provide Guided and Interactive Access Information and Insight Marts BI Performance Management Predictive Analytics and Modeling Task: Deliver Data for Deeper Analysis and Modeling EDW Archive Task: Archive “Cold” Data to Reduce Costs Task: Tooling to Facilitate Data Movement & Transformation
  • 43. © 2015 IBM Corporation43 Information Movement & Transformation Traditional Approach to Improve Analytic Architectures Data Sources Structured Operational Information and Insight BI Performance Management Predictive Analytics and Modeling Archive Marts Expanded EDW Staging Area Put Staging Area in the EDW Still only structured (models don’t get more aperture) Doesn’t accelerate the speed of model lifecycles $$ Expensive $$
  • 44. © 2015 IBM Corporation44 Information Movement & Transformation Faster, Deeper Insights While Reducing Costs Enterprise Warehouse Structured Operational BI Performance Management Predictive Analytics and Modeling Marts Faster Performance – up to 4-digit improvements Reduces storage 10x – 25x Load and Go (no tuning, +++) Staging Area Landing Exploration Exploration Discovery Archive Sensor Geospatial Time Series Unstructured External Social Day 0 Archive Streaming
  • 45. © 2015 IBM Corporation45 Current State: Analytics Development Cycle Request IT for data extraction Gather Requirements Data Integration Effort Estimates Solution Design Infrastructure Cost Analysis Business Case Development GO or NO-GO Management Approvals Data Quality & ETL Development Report Development Review Results… 6 Months later Typical Phases…. High cost of failure • Business Analysts • Solution Architects • Infrastructure Architects • Database Administrators • SAN Administrators • Management • ETL Developers • Report Developers Actors Involved Bright Idea! Procurement DB, storage SW licenses
  • 46. © 2015 IBM Corporation46 Target State: Rapid Prototyping Land data in Hadoop (if not already) Explore Data with BigSheets Prototype Reports with Cognos/ BigSQL GO or No-Go Management Approvals Infrastructure Cost Analysis Data Quality & ETL Development Operationalization Develop a culture of fail fast Still Required for “Go To Production” Elapsed Time: Days/WeeksMany Ideas… • More Precise Requirements Gathering • More Information about Data Quality • More Accurate Project Estimates • More Reliable Business Cases • Actionable Insights before Production Ready Solution
  • 47. © 2015 IBM Corporation47 Right Tool, Right Job Text Analytics Unstructured Semi Structured Structured Big Sheets, Big SQL Cognos w/Big SQL Cognos + RDBMS
  • 48. © 2015 IBM Corporation48 Summary  Hadoop is a Big File System that can run analytics  Unlike a database, it can store anything as files (structured, semi-structured, unstructured)  Hadoop lowers the cost to pennies per GB – making it possible to have a copy of all data from source systems  SQL on Hadoop enabled analytics for the masses – no need to learn map reduce  IBM BigInsights  BigSheets and Big SQL enables discovery and rapid BI prototyping “Big SQL makes access to Hive data faster and more secure”
  • 49. © 2015 IBM Corporation49 Other Resources….  Watch Big SQL take on Spark SQL  https://www.youtube.com/watch?v=bAs74frPUq8  Improve security for Hive data with Big SQL  https://www.youtube.com/watch?v=N2FN5h25-_s
  • 50. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 51. BUSINESS ANALYTICS: ARCHITECTED TO SCALE SENTURUS INTRODUCTION
  • 52. Copyright 2016 Senturus, Inc. All Rights Reserved. 52 • Dashboards, Reporting & Visualizations • Data Preparation • Big Data & Advanced Analytics • Enterprise Planning Laser Focused on Business Analytics
  • 53. Senturus Offerings • Comprehensive Consulting • Dedicated Resources • Training • Accelerated Development Tools • Migrations/Upgrades/Installations • Performance & Optimization • Jumpstarts • Project Roadmaps & Assessments
  • 54. 900+ Clients, 2000+ Projects, 16+ Years 54Copyright 2016 Senturus, Inc. All Rights Reserved
  • 56. • Big SQL Technology Sandbox is a large, shared environment for data science • You can use it to run R, SQL, Spark, and Hadoop jobs • It is a high performance cluster demonstrating the advantages of parallelized processing of big data sets For more information, see link on Senturus website Demo Cloud: Big SQL Technology Sandbox 56Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 57. You may also be interested in the following YouTube videos authored by Paul Yip. • Access Apache Hive Data Faster and More Securely with Big SQL • Spark vs IBM Big SQL Performance • Hadoop HDFS vs Spectrum Scale (GPFS) Related Videos
  • 58. Free Resources on www.senturus.com Copyright 2016 Senturus, Inc. All Rights Reserved 58
  • 59. www.senturus.com/events Upcoming Events 59Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 60. This slide deck is from the webinar: Running Cognos on Hadoop: Cost Effective, Highly Scalable, High Speed To view the FREE recording of the presentation, and download this deck, go to: www.senturus.com/resources/running-cognos-on-hadoop/ Hear the Recording Copyright 2016 Senturus, Inc. All Rights Reserved.
  • 61. Thank You! www.senturus.com info@senturus.com 888 601 6010 Copyright 2016 by Senturus, Inc. This entire presentation is copyrighted and may not be reused or distributed without the written consent of Senturus, Inc.