SlideShare a Scribd company logo
Qlik Sense and Big Data 
Making Big Data Relevant for the Business User 
Bob Hardaway – Solution Architect 
2 October, 2014
And now they coming, yeah, now they coming 
Out from the shadows 
To take me to the club because they know 
That I shut this down, 'cause they been watching all my windows 
They gathered up the wall and listening 
You understand, they got a plan for us 
I bet you didn't know that I was dangerous
Intelligence Community Comprehensive National 
Cyber Security Initiative Data Center (ICCNCSIDC) 
Capable of processing all forms of communication, including the 
complete contents of private emails, cell phone calls, and Internet 
searches, as well as all types of personal data trails—parking receipts, 
travel itineraries, bookstore purchases, and other digital 'pocket litter'.
Big Data comes with big challenges 
The Big Data bottleneck 
Reports 
Data Scientists 
Business Users 
Big Data 
“ many organizations lack the skills required to exploit big data 
” 
“ most of these skills are in short supply and rare in the market at large 
” 
“ data science encompasses hard skills 
” 
Source: Gartner Big Data Hype Cycle Report 2013
Qlik relieves the Big Data bottleneck 
The Big Data bottleneck 
Data Scientists 
Reports 
Analytics & 
Discovery 
Big Data Business Users 
QlikView’s user-centric Business Discover approach gives 
decision-makers access to the benefits of Big Data
What is Big Data?
Big Data happens in every part of History 
Paper Print Computer Internet 
• Medium to write 
ideas and 
information 
• Not enough writers 
to disseminate 
• Technology to 
distribute 
information 
• No place to store 
• Place to store 
• Can’t keep up with 
computing 
requirements 
• Distributed 
computing globally 
• Too many Emails 
to read 
We always create more than we can consume!
The Internet of Things (IoT) 
• Cisco estimates 50B connected 
devices by 2020 
• Intel says 15B by 2015 
• Uber adds 70000 drivers per 
week 
• AirBnB had 42M bookings last 
year 
• ZipCar lets you reserve a 
parking space anywhere 
The Physical Web – Google project to de-App devices 
“People should be able to walk up to any smart device – a vending machine, a 
poster, a toy, a bus stop, a rental car – and not have to download an app first,” – 
Scott Jenson
Quantifying Big Data 
Bigness is the least important thing … it’s the insights that can be 
gained from interactions vs. transactions … the customer experience 
vs. the value of what was purchased 
- Stephen Brobst, CTO Teradata 
Real time streaming data 
High volumes in Low latency 
Complexity in processing, analysis 
and deriving insights 
12TB/day across 80 servers 
32 billion rows per day 
Very large data sets 
Order of 100s of TB to PBs 
Structured & Unstructured Data, living together 
(OLTP, DW, data marts) 
text, audio, video, click streams, log files, etc 
75TB compressed data processing/day 
7500+ analytical jobs per day 
15TB per day @ 1:7 compression ratio 
4 PB storage 
Images - Flat file - DNA 4TB of TIFF to 11mn PDF files 
Using Hadoop in < 24hours
A Less Alliterative Definition 
• Big Data is about analyzing ALL your data, ALL the time 
– Traditional BI systems operate on assumptions, and limited data 
sets that preclude true discovery and insight 
– The Same question gets asked over and over 
• The cost of analysis has always been the limiting factor for 
Business Intelligence 
– Solutions have to be justified before they are deployed 
• Big Data is about storing everything, cheaply and letting the 
User look for value 
• Big Data is about driving the business based on Data 
• Big Data doesn’t solve every problem, but it does put the 
User in charge of the process
Hadoop – A Brief History 
Cutting joins 
Yahoo, estimates 
a billion page 
index will cost 
$500k and 
$30k/mos to 
support 
A 1400n Yahoo 
cluster sorts 500GB 
in 59s. Cloudera 
launches 
Google releases a 
paper on GFS, 
based on a 
distributed search 
platform called 
Nutch Hadoop promoted to 
top level Apache 
project, predictive 
search index creation 
time reduced from 
12days to 8hrs 
Yahoo spins 
remaining Hadoop 
folks out into 
Hortonworks 
Apache Spark 
becomes the most 
contributed to 
Hadoop related 
project 
3rd Hadoop World 
conf attracts 2300 
developers, vs 275 
the first time 
Cloudera adds real-time 
search, based 
on Lucene, also 
created by Cutting 
2006 2008 2011 2013 
2014
Real-time 
Analytics 
Big Data is much more than just storage 
Extreme Analytic 
Engines 
Big Data Exploration, DW/ETL 
Pre-processing 
Big Data Cache + BI 
Infrastructure 
Prepare for Big Data Business Demands 
Real-Time Agility 
Advanced Analytic 
Capability 
Transformation and 
Exploration 
Advanced Data 
Management 
1 
4 
3 
2 
1
Popular “Big Data” Myths 
• You need to have Ga-zinga-bytes to deploy a Big Data solution 
– Typical Cloudera Cluster is 15-20 nodes, < 10TB of data 
– Hadoop storage is 3-400% cheaper than an EDW 
• Hadoop is all you need 
– Hadoop is an enabling technology that provides the foundation for 
Big Data solutions 
– Focus today is on data management 
• The RDBMS is dead 
– RDBMS is still critical – but not for high volume, low quality analytics 
• ew can’t handle Big Data 
– Reality is a Human can’t handle Big Data 
– It’s all about the use case 
– Direct Discovery is a unique approach
Gartner Top Big Data Challenges 
You need to determine 
your goals/objectives 
Qlik can help you with 
these challenges
Turn Big Data (lots of dots) Into Small Data (Insights) 
The Value in Big Data Comes from Context and Relevance 
More History 
They’re both the same number of bricks! 
The same volume of data, same schema. 
You choose what is relevant to your analysis. 
More Categories
Hard Disk 
Drives (HDD) 
Solid State 
Storage (SSD) 
Random 
Access 
Memory (RAM) 
Speed (t/TB) 3300s 1000-300s 1s 
Price $/TB $ 50 $ 500 $ 4500 
• Keep data in memory when the value obtained from processing it is high 
• Leave data on disk when it is inactive or the value from processing it is low 
Value 
Size 
The Big Data Value Chain
Fine, Big Data is here, 
but 
what are the Big Data Use Cases 
that matter to my Business?
Initially Hadoop Came About to Reduce Costs 
• How cheaply? 
– By one estimate running a 75-node, 300TB Hadoop cluster costs 
$1.05 Million over 3 years. 
– Simply for an RDBMS may cost 2.5x for the same time period. 
• This type of savings means companies can keep ‘more’ or all of 
their data. 
• Hadoop is for storage, not analytics 
– Data storage remains the most common use case for Hadoop 
• Example: 
– Expedia is moving from DB2 to Cloudera with expected savings 
of approximately $100 million per year.
But Big Data Technologies are Evolving Rapidly 
• 2010 – Download Apache Hadoop, cobble together surplus 
hardware, hire a couple java developers 
• 2012 – CDH 4 from Cloudera reduces deployment time from days 
to minutes 
• 2013 – AWS introduces Elastic Map Reduce (EMR) 
• 2014 – Google Counters with Google Compute Engine (GCE) 
• Platform Vendors cover more than just Hadoop-like capabilities 
– Map-Reduce for large scale, batch processing 
– NoSQL for real-time, adhoc query with operational performance 
– Spark/Solr/Impala for real-time analytics 
– R Integration for deep predictive/advanced analytics 
– All need a delivery agent (aka Visualization tool) to bring the 
benefit to the business
Big Data Use Cases are About Finding Value 
• Internet (Expedia) 
– Search Index Generation 
– User Engagement Behavior 
– Targeting / Advertising 
Optimizations 
– Recommendations 
• BioMed (Carefusion) 
– Computational BioMedical 
Systems 
– Bioinformatics 
– Data Mining and Genome 
Analysis 
• Financial (Metlife / Wells Fargo) 
– Prediction Models 
– Fraud Analysis 
– Portfolio Risk Management 
• Telecom (BritTelecom/DeutscheTele) 
– Call data records 
– Set top & DVR streams 
• Social (Facebook) 
– Recommendations 
– Network Graphs 
– Feed Updates 
• Enterprise Operations 
– email and image processing 
– Robust ETL 
– Data Archival 
– Natural Language Processing 
• Media & Entertainment (DIRECTV) 
– Customer 360 
– Marketing Campaigns 
• Agriculture (ADM) 
– Process “agri” stream 
– Mineral Management 
• Image (Corbis) 
– Geo-Spatial processing 
• Education (State of …) 
– Systems Research 
– Statistical analysis of the web
Big Data Ecosystem is Much More Than Just Hadoop 
Data Visualization, Statistical & In-memory Analytics 
Open source Distributed Processing Frameworks 
Big Data Analytic Appliances 
Massively Parallel Processing Platforms 
Big data Integration 
Packaged Mapreduce platforms 
Big Insights & 
Streams 
Big Data 
Appliance 
HANA 
splunk >
Qlik Brings Big Data 
to the 
Business User
Insight Comes from Big Data, in Context 
NoSQL 
Databases 
SAP HANA 
Google 
BigQuery 
Batch 
Real-time 
Hadoop 
Advanced 
Analytics 
Platform 
Vendors
Leveraging QlikView for Big Data Discovery 
Define Your Use Case 
• A Hybrid approach that 
– Provides any/all business stakeholder with a simple but 
powerful environment for exploring data, without 
– Limiting or filtering what data is available for analysis 
when 
• Follow the Value 
– Start with simple questions: 
• What data do we already have they we are not making 
good use of today? 
– Let your business decide where the exploration goes 
• The technologies are cost effective, flexible and designed 
for a business-first methodology
QlikView Direct Discovery 
• Combines the associative capabilities of the QlikView in-memory 
dataset with a query model where: 
 The aggregated query result is passed back to a QlikView object 
without being loaded into the QlikView data model 
 The result set is still part of the associative experience 
 Capability to Drill to Detail records 
QlikView In-Memory Data Model 
QlikView Application 
Direct Discovery 
Batch Load
Complement Hadoop and EDW co-existence 
Data 
Warehouse 
Aggregates 
Direct Discovery 
Broad Application to 
discover new trends 
Deep Application to 
confirm and take action 
Move highly valuable data 
to EDW for more broad accessibility 
Point QlikView to new source
Big Data Business Needs 
Descriptive Analytics Predictive Analytics 
DATA 
Clinical, 
Claims, 
Monitoring, 
others 
How are we doing? What might happen in 
the future? 
Prescriptive Analytics 
Best course of action 
given objectives, 
requirements & 
constraints 
How many claims did we pay 
today? 
Which of tomorrow’s claims 
might be requesting an 
Emergency Room (ER) 
admission? 
What would be effective 
steps to reduce probability of 
ER admission? 
Qlikview is leader in Descriptive but barely plays in Predictive 
and Prescriptive. Radically different algorithmic and 
visualization concepts are needed to play in that arena
King.com: Big Data in Action 
• 1.6B rows of data per day in Hadoop — 
– 211M rows per day extracted for analysis in QlikView 
• Customer browsing activity: 
– Player Interactions within each game 
– Many additional metrics 
• Results: Marketing ROI of campaigns achieved for the first 
time (# of players, # of games played, time played, etc.)
Thank You

More Related Content

What's hot

Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
Databricks
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
Data Con LA
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
Cloudera, Inc.
 
The Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent OffersThe Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent Offers
Cloudera, Inc.
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Vasu S
 
How can Insurers Accelerate Digital Transformation with Data Virtualization (...
How can Insurers Accelerate Digital Transformation with Data Virtualization (...How can Insurers Accelerate Digital Transformation with Data Virtualization (...
How can Insurers Accelerate Digital Transformation with Data Virtualization (...
Denodo
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera, Inc.
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Pentaho
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
GetInData
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
DataWorks Summit/Hadoop Summit
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
Why Your Customers Want a Cognitive Call Center
Why Your Customers Want a Cognitive Call CenterWhy Your Customers Want a Cognitive Call Center
Why Your Customers Want a Cognitive Call Center
Perficient, Inc.
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
DATAVERSITY
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
Pentaho
 
Customer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital TransformationCustomer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital Transformation
Cloudera, Inc.
 
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Denodo
 

What's hot (20)

Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
The Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent OffersThe Big Picture: Real-time Data is Defining Intelligent Offers
The Big Picture: Real-time Data is Defining Intelligent Offers
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
 
How can Insurers Accelerate Digital Transformation with Data Virtualization (...
How can Insurers Accelerate Digital Transformation with Data Virtualization (...How can Insurers Accelerate Digital Transformation with Data Virtualization (...
How can Insurers Accelerate Digital Transformation with Data Virtualization (...
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Why Your Customers Want a Cognitive Call Center
Why Your Customers Want a Cognitive Call CenterWhy Your Customers Want a Cognitive Call Center
Why Your Customers Want a Cognitive Call Center
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Customer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital TransformationCustomer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital Transformation
 
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
 

Similar to Big data4businessusers

Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
Skillwise Consulting
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Big Data
Big DataBig Data
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big data business case
Big data   business caseBig data   business case
Big data business case
Karthik Padmanabhan ( MLE℠)
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 

Similar to Big data4businessusers (20)

Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data
Big DataBig Data
Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

Big data4businessusers

  • 1. Qlik Sense and Big Data Making Big Data Relevant for the Business User Bob Hardaway – Solution Architect 2 October, 2014
  • 2. And now they coming, yeah, now they coming Out from the shadows To take me to the club because they know That I shut this down, 'cause they been watching all my windows They gathered up the wall and listening You understand, they got a plan for us I bet you didn't know that I was dangerous
  • 3. Intelligence Community Comprehensive National Cyber Security Initiative Data Center (ICCNCSIDC) Capable of processing all forms of communication, including the complete contents of private emails, cell phone calls, and Internet searches, as well as all types of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital 'pocket litter'.
  • 4. Big Data comes with big challenges The Big Data bottleneck Reports Data Scientists Business Users Big Data “ many organizations lack the skills required to exploit big data ” “ most of these skills are in short supply and rare in the market at large ” “ data science encompasses hard skills ” Source: Gartner Big Data Hype Cycle Report 2013
  • 5. Qlik relieves the Big Data bottleneck The Big Data bottleneck Data Scientists Reports Analytics & Discovery Big Data Business Users QlikView’s user-centric Business Discover approach gives decision-makers access to the benefits of Big Data
  • 6. What is Big Data?
  • 7. Big Data happens in every part of History Paper Print Computer Internet • Medium to write ideas and information • Not enough writers to disseminate • Technology to distribute information • No place to store • Place to store • Can’t keep up with computing requirements • Distributed computing globally • Too many Emails to read We always create more than we can consume!
  • 8. The Internet of Things (IoT) • Cisco estimates 50B connected devices by 2020 • Intel says 15B by 2015 • Uber adds 70000 drivers per week • AirBnB had 42M bookings last year • ZipCar lets you reserve a parking space anywhere The Physical Web – Google project to de-App devices “People should be able to walk up to any smart device – a vending machine, a poster, a toy, a bus stop, a rental car – and not have to download an app first,” – Scott Jenson
  • 9. Quantifying Big Data Bigness is the least important thing … it’s the insights that can be gained from interactions vs. transactions … the customer experience vs. the value of what was purchased - Stephen Brobst, CTO Teradata Real time streaming data High volumes in Low latency Complexity in processing, analysis and deriving insights 12TB/day across 80 servers 32 billion rows per day Very large data sets Order of 100s of TB to PBs Structured & Unstructured Data, living together (OLTP, DW, data marts) text, audio, video, click streams, log files, etc 75TB compressed data processing/day 7500+ analytical jobs per day 15TB per day @ 1:7 compression ratio 4 PB storage Images - Flat file - DNA 4TB of TIFF to 11mn PDF files Using Hadoop in < 24hours
  • 10. A Less Alliterative Definition • Big Data is about analyzing ALL your data, ALL the time – Traditional BI systems operate on assumptions, and limited data sets that preclude true discovery and insight – The Same question gets asked over and over • The cost of analysis has always been the limiting factor for Business Intelligence – Solutions have to be justified before they are deployed • Big Data is about storing everything, cheaply and letting the User look for value • Big Data is about driving the business based on Data • Big Data doesn’t solve every problem, but it does put the User in charge of the process
  • 11. Hadoop – A Brief History Cutting joins Yahoo, estimates a billion page index will cost $500k and $30k/mos to support A 1400n Yahoo cluster sorts 500GB in 59s. Cloudera launches Google releases a paper on GFS, based on a distributed search platform called Nutch Hadoop promoted to top level Apache project, predictive search index creation time reduced from 12days to 8hrs Yahoo spins remaining Hadoop folks out into Hortonworks Apache Spark becomes the most contributed to Hadoop related project 3rd Hadoop World conf attracts 2300 developers, vs 275 the first time Cloudera adds real-time search, based on Lucene, also created by Cutting 2006 2008 2011 2013 2014
  • 12. Real-time Analytics Big Data is much more than just storage Extreme Analytic Engines Big Data Exploration, DW/ETL Pre-processing Big Data Cache + BI Infrastructure Prepare for Big Data Business Demands Real-Time Agility Advanced Analytic Capability Transformation and Exploration Advanced Data Management 1 4 3 2 1
  • 13. Popular “Big Data” Myths • You need to have Ga-zinga-bytes to deploy a Big Data solution – Typical Cloudera Cluster is 15-20 nodes, < 10TB of data – Hadoop storage is 3-400% cheaper than an EDW • Hadoop is all you need – Hadoop is an enabling technology that provides the foundation for Big Data solutions – Focus today is on data management • The RDBMS is dead – RDBMS is still critical – but not for high volume, low quality analytics • ew can’t handle Big Data – Reality is a Human can’t handle Big Data – It’s all about the use case – Direct Discovery is a unique approach
  • 14. Gartner Top Big Data Challenges You need to determine your goals/objectives Qlik can help you with these challenges
  • 15. Turn Big Data (lots of dots) Into Small Data (Insights) The Value in Big Data Comes from Context and Relevance More History They’re both the same number of bricks! The same volume of data, same schema. You choose what is relevant to your analysis. More Categories
  • 16. Hard Disk Drives (HDD) Solid State Storage (SSD) Random Access Memory (RAM) Speed (t/TB) 3300s 1000-300s 1s Price $/TB $ 50 $ 500 $ 4500 • Keep data in memory when the value obtained from processing it is high • Leave data on disk when it is inactive or the value from processing it is low Value Size The Big Data Value Chain
  • 17. Fine, Big Data is here, but what are the Big Data Use Cases that matter to my Business?
  • 18. Initially Hadoop Came About to Reduce Costs • How cheaply? – By one estimate running a 75-node, 300TB Hadoop cluster costs $1.05 Million over 3 years. – Simply for an RDBMS may cost 2.5x for the same time period. • This type of savings means companies can keep ‘more’ or all of their data. • Hadoop is for storage, not analytics – Data storage remains the most common use case for Hadoop • Example: – Expedia is moving from DB2 to Cloudera with expected savings of approximately $100 million per year.
  • 19. But Big Data Technologies are Evolving Rapidly • 2010 – Download Apache Hadoop, cobble together surplus hardware, hire a couple java developers • 2012 – CDH 4 from Cloudera reduces deployment time from days to minutes • 2013 – AWS introduces Elastic Map Reduce (EMR) • 2014 – Google Counters with Google Compute Engine (GCE) • Platform Vendors cover more than just Hadoop-like capabilities – Map-Reduce for large scale, batch processing – NoSQL for real-time, adhoc query with operational performance – Spark/Solr/Impala for real-time analytics – R Integration for deep predictive/advanced analytics – All need a delivery agent (aka Visualization tool) to bring the benefit to the business
  • 20. Big Data Use Cases are About Finding Value • Internet (Expedia) – Search Index Generation – User Engagement Behavior – Targeting / Advertising Optimizations – Recommendations • BioMed (Carefusion) – Computational BioMedical Systems – Bioinformatics – Data Mining and Genome Analysis • Financial (Metlife / Wells Fargo) – Prediction Models – Fraud Analysis – Portfolio Risk Management • Telecom (BritTelecom/DeutscheTele) – Call data records – Set top & DVR streams • Social (Facebook) – Recommendations – Network Graphs – Feed Updates • Enterprise Operations – email and image processing – Robust ETL – Data Archival – Natural Language Processing • Media & Entertainment (DIRECTV) – Customer 360 – Marketing Campaigns • Agriculture (ADM) – Process “agri” stream – Mineral Management • Image (Corbis) – Geo-Spatial processing • Education (State of …) – Systems Research – Statistical analysis of the web
  • 21. Big Data Ecosystem is Much More Than Just Hadoop Data Visualization, Statistical & In-memory Analytics Open source Distributed Processing Frameworks Big Data Analytic Appliances Massively Parallel Processing Platforms Big data Integration Packaged Mapreduce platforms Big Insights & Streams Big Data Appliance HANA splunk >
  • 22. Qlik Brings Big Data to the Business User
  • 23. Insight Comes from Big Data, in Context NoSQL Databases SAP HANA Google BigQuery Batch Real-time Hadoop Advanced Analytics Platform Vendors
  • 24. Leveraging QlikView for Big Data Discovery Define Your Use Case • A Hybrid approach that – Provides any/all business stakeholder with a simple but powerful environment for exploring data, without – Limiting or filtering what data is available for analysis when • Follow the Value – Start with simple questions: • What data do we already have they we are not making good use of today? – Let your business decide where the exploration goes • The technologies are cost effective, flexible and designed for a business-first methodology
  • 25. QlikView Direct Discovery • Combines the associative capabilities of the QlikView in-memory dataset with a query model where:  The aggregated query result is passed back to a QlikView object without being loaded into the QlikView data model  The result set is still part of the associative experience  Capability to Drill to Detail records QlikView In-Memory Data Model QlikView Application Direct Discovery Batch Load
  • 26. Complement Hadoop and EDW co-existence Data Warehouse Aggregates Direct Discovery Broad Application to discover new trends Deep Application to confirm and take action Move highly valuable data to EDW for more broad accessibility Point QlikView to new source
  • 27. Big Data Business Needs Descriptive Analytics Predictive Analytics DATA Clinical, Claims, Monitoring, others How are we doing? What might happen in the future? Prescriptive Analytics Best course of action given objectives, requirements & constraints How many claims did we pay today? Which of tomorrow’s claims might be requesting an Emergency Room (ER) admission? What would be effective steps to reduce probability of ER admission? Qlikview is leader in Descriptive but barely plays in Predictive and Prescriptive. Radically different algorithmic and visualization concepts are needed to play in that arena
  • 28. King.com: Big Data in Action • 1.6B rows of data per day in Hadoop — – 211M rows per day extracted for analysis in QlikView • Customer browsing activity: – Player Interactions within each game – Many additional metrics • Results: Marketing ROI of campaigns achieved for the first time (# of players, # of games played, time played, etc.)

Editor's Notes

  1. The Bloor Group write in “Why In-Memory Technology will dominate Big Data” from Kognitio download site http://www.kognitio.com/information-center/reports/ If the goal is to accelerate BI activities dramatically, the natural approach is to have an in memory processing resource that can be used where it makes a difference, flowing the data from disk through SSD to memory in order to support those BI workloads. In other words, data is kept in memory when the value obtained from processing it is high, and data stays on disk when it is inactive or the value from processing it is low.
  2. Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop
  3. Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop