Qlik Sense and Big Data 
Making Big Data Relevant for the Business User 
Bob Hardaway – Solution Architect 
2 October, 2014
And now they coming, yeah, now they coming 
Out from the shadows 
To take me to the club because they know 
That I shut this down, 'cause they been watching all my windows 
They gathered up the wall and listening 
You understand, they got a plan for us 
I bet you didn't know that I was dangerous
Intelligence Community Comprehensive National 
Cyber Security Initiative Data Center (ICCNCSIDC) 
Capable of processing all forms of communication, including the 
complete contents of private emails, cell phone calls, and Internet 
searches, as well as all types of personal data trails—parking receipts, 
travel itineraries, bookstore purchases, and other digital 'pocket litter'.
Big Data comes with big challenges 
The Big Data bottleneck 
Reports 
Data Scientists 
Business Users 
Big Data 
“ many organizations lack the skills required to exploit big data 
” 
“ most of these skills are in short supply and rare in the market at large 
” 
“ data science encompasses hard skills 
” 
Source: Gartner Big Data Hype Cycle Report 2013
Qlik relieves the Big Data bottleneck 
The Big Data bottleneck 
Data Scientists 
Reports 
Analytics & 
Discovery 
Big Data Business Users 
QlikView’s user-centric Business Discover approach gives 
decision-makers access to the benefits of Big Data
What is Big Data?
Big Data happens in every part of History 
Paper Print Computer Internet 
• Medium to write 
ideas and 
information 
• Not enough writers 
to disseminate 
• Technology to 
distribute 
information 
• No place to store 
• Place to store 
• Can’t keep up with 
computing 
requirements 
• Distributed 
computing globally 
• Too many Emails 
to read 
We always create more than we can consume!
The Internet of Things (IoT) 
• Cisco estimates 50B connected 
devices by 2020 
• Intel says 15B by 2015 
• Uber adds 70000 drivers per 
week 
• AirBnB had 42M bookings last 
year 
• ZipCar lets you reserve a 
parking space anywhere 
The Physical Web – Google project to de-App devices 
“People should be able to walk up to any smart device – a vending machine, a 
poster, a toy, a bus stop, a rental car – and not have to download an app first,” – 
Scott Jenson
Quantifying Big Data 
Bigness is the least important thing … it’s the insights that can be 
gained from interactions vs. transactions … the customer experience 
vs. the value of what was purchased 
- Stephen Brobst, CTO Teradata 
Real time streaming data 
High volumes in Low latency 
Complexity in processing, analysis 
and deriving insights 
12TB/day across 80 servers 
32 billion rows per day 
Very large data sets 
Order of 100s of TB to PBs 
Structured & Unstructured Data, living together 
(OLTP, DW, data marts) 
text, audio, video, click streams, log files, etc 
75TB compressed data processing/day 
7500+ analytical jobs per day 
15TB per day @ 1:7 compression ratio 
4 PB storage 
Images - Flat file - DNA 4TB of TIFF to 11mn PDF files 
Using Hadoop in < 24hours
A Less Alliterative Definition 
• Big Data is about analyzing ALL your data, ALL the time 
– Traditional BI systems operate on assumptions, and limited data 
sets that preclude true discovery and insight 
– The Same question gets asked over and over 
• The cost of analysis has always been the limiting factor for 
Business Intelligence 
– Solutions have to be justified before they are deployed 
• Big Data is about storing everything, cheaply and letting the 
User look for value 
• Big Data is about driving the business based on Data 
• Big Data doesn’t solve every problem, but it does put the 
User in charge of the process
Hadoop – A Brief History 
Cutting joins 
Yahoo, estimates 
a billion page 
index will cost 
$500k and 
$30k/mos to 
support 
A 1400n Yahoo 
cluster sorts 500GB 
in 59s. Cloudera 
launches 
Google releases a 
paper on GFS, 
based on a 
distributed search 
platform called 
Nutch Hadoop promoted to 
top level Apache 
project, predictive 
search index creation 
time reduced from 
12days to 8hrs 
Yahoo spins 
remaining Hadoop 
folks out into 
Hortonworks 
Apache Spark 
becomes the most 
contributed to 
Hadoop related 
project 
3rd Hadoop World 
conf attracts 2300 
developers, vs 275 
the first time 
Cloudera adds real-time 
search, based 
on Lucene, also 
created by Cutting 
2006 2008 2011 2013 
2014
Real-time 
Analytics 
Big Data is much more than just storage 
Extreme Analytic 
Engines 
Big Data Exploration, DW/ETL 
Pre-processing 
Big Data Cache + BI 
Infrastructure 
Prepare for Big Data Business Demands 
Real-Time Agility 
Advanced Analytic 
Capability 
Transformation and 
Exploration 
Advanced Data 
Management 
1 
4 
3 
2 
1
Popular “Big Data” Myths 
• You need to have Ga-zinga-bytes to deploy a Big Data solution 
– Typical Cloudera Cluster is 15-20 nodes, < 10TB of data 
– Hadoop storage is 3-400% cheaper than an EDW 
• Hadoop is all you need 
– Hadoop is an enabling technology that provides the foundation for 
Big Data solutions 
– Focus today is on data management 
• The RDBMS is dead 
– RDBMS is still critical – but not for high volume, low quality analytics 
• ew can’t handle Big Data 
– Reality is a Human can’t handle Big Data 
– It’s all about the use case 
– Direct Discovery is a unique approach
Gartner Top Big Data Challenges 
You need to determine 
your goals/objectives 
Qlik can help you with 
these challenges
Turn Big Data (lots of dots) Into Small Data (Insights) 
The Value in Big Data Comes from Context and Relevance 
More History 
They’re both the same number of bricks! 
The same volume of data, same schema. 
You choose what is relevant to your analysis. 
More Categories
Hard Disk 
Drives (HDD) 
Solid State 
Storage (SSD) 
Random 
Access 
Memory (RAM) 
Speed (t/TB) 3300s 1000-300s 1s 
Price $/TB $ 50 $ 500 $ 4500 
• Keep data in memory when the value obtained from processing it is high 
• Leave data on disk when it is inactive or the value from processing it is low 
Value 
Size 
The Big Data Value Chain
Fine, Big Data is here, 
but 
what are the Big Data Use Cases 
that matter to my Business?
Initially Hadoop Came About to Reduce Costs 
• How cheaply? 
– By one estimate running a 75-node, 300TB Hadoop cluster costs 
$1.05 Million over 3 years. 
– Simply for an RDBMS may cost 2.5x for the same time period. 
• This type of savings means companies can keep ‘more’ or all of 
their data. 
• Hadoop is for storage, not analytics 
– Data storage remains the most common use case for Hadoop 
• Example: 
– Expedia is moving from DB2 to Cloudera with expected savings 
of approximately $100 million per year.
But Big Data Technologies are Evolving Rapidly 
• 2010 – Download Apache Hadoop, cobble together surplus 
hardware, hire a couple java developers 
• 2012 – CDH 4 from Cloudera reduces deployment time from days 
to minutes 
• 2013 – AWS introduces Elastic Map Reduce (EMR) 
• 2014 – Google Counters with Google Compute Engine (GCE) 
• Platform Vendors cover more than just Hadoop-like capabilities 
– Map-Reduce for large scale, batch processing 
– NoSQL for real-time, adhoc query with operational performance 
– Spark/Solr/Impala for real-time analytics 
– R Integration for deep predictive/advanced analytics 
– All need a delivery agent (aka Visualization tool) to bring the 
benefit to the business
Big Data Use Cases are About Finding Value 
• Internet (Expedia) 
– Search Index Generation 
– User Engagement Behavior 
– Targeting / Advertising 
Optimizations 
– Recommendations 
• BioMed (Carefusion) 
– Computational BioMedical 
Systems 
– Bioinformatics 
– Data Mining and Genome 
Analysis 
• Financial (Metlife / Wells Fargo) 
– Prediction Models 
– Fraud Analysis 
– Portfolio Risk Management 
• Telecom (BritTelecom/DeutscheTele) 
– Call data records 
– Set top & DVR streams 
• Social (Facebook) 
– Recommendations 
– Network Graphs 
– Feed Updates 
• Enterprise Operations 
– email and image processing 
– Robust ETL 
– Data Archival 
– Natural Language Processing 
• Media & Entertainment (DIRECTV) 
– Customer 360 
– Marketing Campaigns 
• Agriculture (ADM) 
– Process “agri” stream 
– Mineral Management 
• Image (Corbis) 
– Geo-Spatial processing 
• Education (State of …) 
– Systems Research 
– Statistical analysis of the web
Big Data Ecosystem is Much More Than Just Hadoop 
Data Visualization, Statistical & In-memory Analytics 
Open source Distributed Processing Frameworks 
Big Data Analytic Appliances 
Massively Parallel Processing Platforms 
Big data Integration 
Packaged Mapreduce platforms 
Big Insights & 
Streams 
Big Data 
Appliance 
HANA 
splunk >
Qlik Brings Big Data 
to the 
Business User
Insight Comes from Big Data, in Context 
NoSQL 
Databases 
SAP HANA 
Google 
BigQuery 
Batch 
Real-time 
Hadoop 
Advanced 
Analytics 
Platform 
Vendors
Leveraging QlikView for Big Data Discovery 
Define Your Use Case 
• A Hybrid approach that 
– Provides any/all business stakeholder with a simple but 
powerful environment for exploring data, without 
– Limiting or filtering what data is available for analysis 
when 
• Follow the Value 
– Start with simple questions: 
• What data do we already have they we are not making 
good use of today? 
– Let your business decide where the exploration goes 
• The technologies are cost effective, flexible and designed 
for a business-first methodology
QlikView Direct Discovery 
• Combines the associative capabilities of the QlikView in-memory 
dataset with a query model where: 
 The aggregated query result is passed back to a QlikView object 
without being loaded into the QlikView data model 
 The result set is still part of the associative experience 
 Capability to Drill to Detail records 
QlikView In-Memory Data Model 
QlikView Application 
Direct Discovery 
Batch Load
Complement Hadoop and EDW co-existence 
Data 
Warehouse 
Aggregates 
Direct Discovery 
Broad Application to 
discover new trends 
Deep Application to 
confirm and take action 
Move highly valuable data 
to EDW for more broad accessibility 
Point QlikView to new source
Big Data Business Needs 
Descriptive Analytics Predictive Analytics 
DATA 
Clinical, 
Claims, 
Monitoring, 
others 
How are we doing? What might happen in 
the future? 
Prescriptive Analytics 
Best course of action 
given objectives, 
requirements & 
constraints 
How many claims did we pay 
today? 
Which of tomorrow’s claims 
might be requesting an 
Emergency Room (ER) 
admission? 
What would be effective 
steps to reduce probability of 
ER admission? 
Qlikview is leader in Descriptive but barely plays in Predictive 
and Prescriptive. Radically different algorithmic and 
visualization concepts are needed to play in that arena
King.com: Big Data in Action 
• 1.6B rows of data per day in Hadoop — 
– 211M rows per day extracted for analysis in QlikView 
• Customer browsing activity: 
– Player Interactions within each game 
– Many additional metrics 
• Results: Marketing ROI of campaigns achieved for the first 
time (# of players, # of games played, time played, etc.)
Thank You

Big data4businessusers

  • 1.
    Qlik Sense andBig Data Making Big Data Relevant for the Business User Bob Hardaway – Solution Architect 2 October, 2014
  • 2.
    And now theycoming, yeah, now they coming Out from the shadows To take me to the club because they know That I shut this down, 'cause they been watching all my windows They gathered up the wall and listening You understand, they got a plan for us I bet you didn't know that I was dangerous
  • 3.
    Intelligence Community ComprehensiveNational Cyber Security Initiative Data Center (ICCNCSIDC) Capable of processing all forms of communication, including the complete contents of private emails, cell phone calls, and Internet searches, as well as all types of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital 'pocket litter'.
  • 4.
    Big Data comeswith big challenges The Big Data bottleneck Reports Data Scientists Business Users Big Data “ many organizations lack the skills required to exploit big data ” “ most of these skills are in short supply and rare in the market at large ” “ data science encompasses hard skills ” Source: Gartner Big Data Hype Cycle Report 2013
  • 5.
    Qlik relieves theBig Data bottleneck The Big Data bottleneck Data Scientists Reports Analytics & Discovery Big Data Business Users QlikView’s user-centric Business Discover approach gives decision-makers access to the benefits of Big Data
  • 6.
  • 7.
    Big Data happensin every part of History Paper Print Computer Internet • Medium to write ideas and information • Not enough writers to disseminate • Technology to distribute information • No place to store • Place to store • Can’t keep up with computing requirements • Distributed computing globally • Too many Emails to read We always create more than we can consume!
  • 8.
    The Internet ofThings (IoT) • Cisco estimates 50B connected devices by 2020 • Intel says 15B by 2015 • Uber adds 70000 drivers per week • AirBnB had 42M bookings last year • ZipCar lets you reserve a parking space anywhere The Physical Web – Google project to de-App devices “People should be able to walk up to any smart device – a vending machine, a poster, a toy, a bus stop, a rental car – and not have to download an app first,” – Scott Jenson
  • 9.
    Quantifying Big Data Bigness is the least important thing … it’s the insights that can be gained from interactions vs. transactions … the customer experience vs. the value of what was purchased - Stephen Brobst, CTO Teradata Real time streaming data High volumes in Low latency Complexity in processing, analysis and deriving insights 12TB/day across 80 servers 32 billion rows per day Very large data sets Order of 100s of TB to PBs Structured & Unstructured Data, living together (OLTP, DW, data marts) text, audio, video, click streams, log files, etc 75TB compressed data processing/day 7500+ analytical jobs per day 15TB per day @ 1:7 compression ratio 4 PB storage Images - Flat file - DNA 4TB of TIFF to 11mn PDF files Using Hadoop in < 24hours
  • 10.
    A Less AlliterativeDefinition • Big Data is about analyzing ALL your data, ALL the time – Traditional BI systems operate on assumptions, and limited data sets that preclude true discovery and insight – The Same question gets asked over and over • The cost of analysis has always been the limiting factor for Business Intelligence – Solutions have to be justified before they are deployed • Big Data is about storing everything, cheaply and letting the User look for value • Big Data is about driving the business based on Data • Big Data doesn’t solve every problem, but it does put the User in charge of the process
  • 11.
    Hadoop – ABrief History Cutting joins Yahoo, estimates a billion page index will cost $500k and $30k/mos to support A 1400n Yahoo cluster sorts 500GB in 59s. Cloudera launches Google releases a paper on GFS, based on a distributed search platform called Nutch Hadoop promoted to top level Apache project, predictive search index creation time reduced from 12days to 8hrs Yahoo spins remaining Hadoop folks out into Hortonworks Apache Spark becomes the most contributed to Hadoop related project 3rd Hadoop World conf attracts 2300 developers, vs 275 the first time Cloudera adds real-time search, based on Lucene, also created by Cutting 2006 2008 2011 2013 2014
  • 12.
    Real-time Analytics BigData is much more than just storage Extreme Analytic Engines Big Data Exploration, DW/ETL Pre-processing Big Data Cache + BI Infrastructure Prepare for Big Data Business Demands Real-Time Agility Advanced Analytic Capability Transformation and Exploration Advanced Data Management 1 4 3 2 1
  • 13.
    Popular “Big Data”Myths • You need to have Ga-zinga-bytes to deploy a Big Data solution – Typical Cloudera Cluster is 15-20 nodes, < 10TB of data – Hadoop storage is 3-400% cheaper than an EDW • Hadoop is all you need – Hadoop is an enabling technology that provides the foundation for Big Data solutions – Focus today is on data management • The RDBMS is dead – RDBMS is still critical – but not for high volume, low quality analytics • ew can’t handle Big Data – Reality is a Human can’t handle Big Data – It’s all about the use case – Direct Discovery is a unique approach
  • 14.
    Gartner Top BigData Challenges You need to determine your goals/objectives Qlik can help you with these challenges
  • 15.
    Turn Big Data(lots of dots) Into Small Data (Insights) The Value in Big Data Comes from Context and Relevance More History They’re both the same number of bricks! The same volume of data, same schema. You choose what is relevant to your analysis. More Categories
  • 16.
    Hard Disk Drives(HDD) Solid State Storage (SSD) Random Access Memory (RAM) Speed (t/TB) 3300s 1000-300s 1s Price $/TB $ 50 $ 500 $ 4500 • Keep data in memory when the value obtained from processing it is high • Leave data on disk when it is inactive or the value from processing it is low Value Size The Big Data Value Chain
  • 17.
    Fine, Big Datais here, but what are the Big Data Use Cases that matter to my Business?
  • 18.
    Initially Hadoop CameAbout to Reduce Costs • How cheaply? – By one estimate running a 75-node, 300TB Hadoop cluster costs $1.05 Million over 3 years. – Simply for an RDBMS may cost 2.5x for the same time period. • This type of savings means companies can keep ‘more’ or all of their data. • Hadoop is for storage, not analytics – Data storage remains the most common use case for Hadoop • Example: – Expedia is moving from DB2 to Cloudera with expected savings of approximately $100 million per year.
  • 19.
    But Big DataTechnologies are Evolving Rapidly • 2010 – Download Apache Hadoop, cobble together surplus hardware, hire a couple java developers • 2012 – CDH 4 from Cloudera reduces deployment time from days to minutes • 2013 – AWS introduces Elastic Map Reduce (EMR) • 2014 – Google Counters with Google Compute Engine (GCE) • Platform Vendors cover more than just Hadoop-like capabilities – Map-Reduce for large scale, batch processing – NoSQL for real-time, adhoc query with operational performance – Spark/Solr/Impala for real-time analytics – R Integration for deep predictive/advanced analytics – All need a delivery agent (aka Visualization tool) to bring the benefit to the business
  • 20.
    Big Data UseCases are About Finding Value • Internet (Expedia) – Search Index Generation – User Engagement Behavior – Targeting / Advertising Optimizations – Recommendations • BioMed (Carefusion) – Computational BioMedical Systems – Bioinformatics – Data Mining and Genome Analysis • Financial (Metlife / Wells Fargo) – Prediction Models – Fraud Analysis – Portfolio Risk Management • Telecom (BritTelecom/DeutscheTele) – Call data records – Set top & DVR streams • Social (Facebook) – Recommendations – Network Graphs – Feed Updates • Enterprise Operations – email and image processing – Robust ETL – Data Archival – Natural Language Processing • Media & Entertainment (DIRECTV) – Customer 360 – Marketing Campaigns • Agriculture (ADM) – Process “agri” stream – Mineral Management • Image (Corbis) – Geo-Spatial processing • Education (State of …) – Systems Research – Statistical analysis of the web
  • 21.
    Big Data Ecosystemis Much More Than Just Hadoop Data Visualization, Statistical & In-memory Analytics Open source Distributed Processing Frameworks Big Data Analytic Appliances Massively Parallel Processing Platforms Big data Integration Packaged Mapreduce platforms Big Insights & Streams Big Data Appliance HANA splunk >
  • 22.
    Qlik Brings BigData to the Business User
  • 23.
    Insight Comes fromBig Data, in Context NoSQL Databases SAP HANA Google BigQuery Batch Real-time Hadoop Advanced Analytics Platform Vendors
  • 24.
    Leveraging QlikView forBig Data Discovery Define Your Use Case • A Hybrid approach that – Provides any/all business stakeholder with a simple but powerful environment for exploring data, without – Limiting or filtering what data is available for analysis when • Follow the Value – Start with simple questions: • What data do we already have they we are not making good use of today? – Let your business decide where the exploration goes • The technologies are cost effective, flexible and designed for a business-first methodology
  • 25.
    QlikView Direct Discovery • Combines the associative capabilities of the QlikView in-memory dataset with a query model where:  The aggregated query result is passed back to a QlikView object without being loaded into the QlikView data model  The result set is still part of the associative experience  Capability to Drill to Detail records QlikView In-Memory Data Model QlikView Application Direct Discovery Batch Load
  • 26.
    Complement Hadoop andEDW co-existence Data Warehouse Aggregates Direct Discovery Broad Application to discover new trends Deep Application to confirm and take action Move highly valuable data to EDW for more broad accessibility Point QlikView to new source
  • 27.
    Big Data BusinessNeeds Descriptive Analytics Predictive Analytics DATA Clinical, Claims, Monitoring, others How are we doing? What might happen in the future? Prescriptive Analytics Best course of action given objectives, requirements & constraints How many claims did we pay today? Which of tomorrow’s claims might be requesting an Emergency Room (ER) admission? What would be effective steps to reduce probability of ER admission? Qlikview is leader in Descriptive but barely plays in Predictive and Prescriptive. Radically different algorithmic and visualization concepts are needed to play in that arena
  • 28.
    King.com: Big Datain Action • 1.6B rows of data per day in Hadoop — – 211M rows per day extracted for analysis in QlikView • Customer browsing activity: – Player Interactions within each game – Many additional metrics • Results: Marketing ROI of campaigns achieved for the first time (# of players, # of games played, time played, etc.)
  • 29.

Editor's Notes

  • #17 The Bloor Group write in “Why In-Memory Technology will dominate Big Data” from Kognitio download site http://www.kognitio.com/information-center/reports/ If the goal is to accelerate BI activities dramatically, the natural approach is to have an in memory processing resource that can be used where it makes a difference, flowing the data from disk through SSD to memory in order to support those BI workloads. In other words, data is kept in memory when the value obtained from processing it is high, and data stays on disk when it is inactive or the value from processing it is low.
  • #19 Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop
  • #20 Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop