How to integrate
Qlik with Cloudera
10 de Novembro de 2018
Luciano Assad
Solution Architect – Pre-Sales
2
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
3
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
4
http://mattturck.com/bigdata2017/
5
6
Challenge
Pokemon
or
Big Data?
7
8
9
10
11
12
13
https://pixelastic.github.io/pokemonorbigdata/
- Pokemon or Big Data
14
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
15
What is Apache Hadoop?
• Hadoop is a software framework for storing, processing, and analyzing
“big data”
- Open source
- Distributed
- Scalable
- Fault-tolerant
• Hadoop - Blocks diagram:
HDFS MapReduceYARN
A file system to
manage the storage
of data
A framework to
define a data
processing task
A framework to run
the data processing
task
16
Large Ecosystem
In-Memory,
Data Flow
Engine
Analytical
SQL-on-
Hadoop
NoSQL
Database
Machine
Learning
Search Scripting Integration
&
Streaming
Management
&
Coordinantion
Resource
Management
Storage
Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components
17
Large Ecosystem
In-Memory,
Data Flow
Engine
Analytical
SQL-on-
Hadoop
NoSQL
Database
Machine
Learning
Search Scripting Integration
&
Streaming
Management
&
Coordinantion
Resource
Management
Storage
Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components
Most used for BI
18
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
19
CDH
Cloudera’s Distribution including Apache Hadoop - Components
Hadoop Distributed File System
YARN and MapReduce
Spark
HBase Flume
Sqoop Hive
Impala
Solr
...
Hadoop
Ecosystem
Hadoop Core
Components
CDH
20
CDH - Important Components
Component Definition
Project What does itdo?
Spark In-memory execution framework
HBase NoSQL database built onHDFS
Hive SQL processing engine designed for batch workloads
Impala SQL query engine designed for BI workloads
Parquet Very efficient columnar data storage format
Sqoop Data movement to/from RDBMSs
Flume, Kafka Streaming dataingestion
Solr Enables users to find the data they need
Hue Web-based user interface for Hadoop
Oozie Workflow scheduler used to managejobs
Sentry Authorization tool, providing security for Hadoop
21
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg
22
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg
Recommended requirements:
4 cores - 12 GB RAM
23
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
24
Qlik + Cloudera
15 Points of Integration with Cloudera
Go Beyond SQL
Fast & Flexible
BI & Analytics
Enterprise Ready
Data Lake Browser
Cloudera Data Explorer
Writeback with Kudu
Interactive analytics
IoT and Kafka Integration
Event driven / Streaming analytics
App on Demand w/ Impala
In memory user generated slices
Direct Query w/ Impala
Data stored in Parquet or Kudu
Complex Data Types with Impala
Maps, arrays, and structures
Data Science Workbench
Powered by Qlik Associative Engine
Advanced Analytics
Integration with Spark/Python/R
Solr Integration
In-memory apps built on Solr Data
Qlik Solr-API App on Demand
Search + QAP + D3js
Cloudera Altus
Analytic DB Integration
Cloudera Metadata Miner
Impala, Cloudera Manager, Navigator
SAP Offload with Attunity
SAP S&D Module into HDFS/Impala
Security – SSO Support
Kerberos delegation/SSO pass-thru
Cloudera Metrics Dashboard
REST API based management
console for Cloudera Manager
25
DEMO
Cloudera Data Lake Explorer
&
Cloudera Metadata Miner
26
Qlik + Cloudera
Cloudera Data Lake Explorer
https://goo.gl/g7PywC
27
Qlik + Cloudera
Cloudera Data Lake Explorer
https://goo.gl/g7PywC
28
Qlik + Cloudera
Cloudera Data Lake Explorer
https://goo.gl/g7PywC
29
Qlik + Cloudera
Cloudera Data Lake Explorer
https://goo.gl/g7PywC
30
Qlik + Cloudera
Cloudera Metadata Catalog
https://goo.gl/g7PywC
31
Qlik + Cloudera
http://cloudera.qlik.com/
Where to find more information?
32
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
33
Qlik Big Data Methodologies
Different data volumes and complexities are best met using different methods
Different methods ensure an optimized
experience for the user for every situation
Methods can be combined to meet different
use cases
Methods vary in deployment complexity
Data Volume
• Size (rows)
• Dimensions
(columns)
• Cardinality
(uniqueness)
App Complexity
• Computational
complexity such
as set analysis
• Object density
Segmentation
Chaining
In-Memory
On-Demand
App Generation
On-Demand App
Generation (API’s)
34
On Demand App Generation
1. User views summary data in
Selection App and selects a slice of
data
2. User requests the Analysis App to
be built
3. Source data is extracted and
Analysis App is created
4. Repeat steps 1-3 as many times as
needed
Big Data Repository
Selection
App
Summary
Data
1
Analysis App
Request
Request
2
2
3
35
ODAG – Selection App
Aggregated Data Dictionary
36
ODAG – Analysis App
2. Where-Statement Generation
1. Binding of Selections
37
DEMO
On Demand App Generation
38
On Demand App Generation
Too hard? Ask a wizard to help you !
https://goo.gl/dNkdB7
39
DEMO
ODAG Wizard
40
Will you share this presentation?
https://goo.gl/ZMcGP9
Obrigado !
Luciano Assad
Solution Architect – Pre-Sales Brasil
luciano.assad@qlik.com

QMeeting 2018 - Como integrar qlik e cloudera

  • 1.
    How to integrate Qlikwith Cloudera 10 de Novembro de 2018 Luciano Assad Solution Architect – Pre-Sales
  • 2.
    2 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 3.
    3 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    14 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 15.
    15 What is ApacheHadoop? • Hadoop is a software framework for storing, processing, and analyzing “big data” - Open source - Distributed - Scalable - Fault-tolerant • Hadoop - Blocks diagram: HDFS MapReduceYARN A file system to manage the storage of data A framework to define a data processing task A framework to run the data processing task
  • 16.
    16 Large Ecosystem In-Memory, Data Flow Engine Analytical SQL-on- Hadoop NoSQL Database Machine Learning SearchScripting Integration & Streaming Management & Coordinantion Resource Management Storage Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components
  • 17.
    17 Large Ecosystem In-Memory, Data Flow Engine Analytical SQL-on- Hadoop NoSQL Database Machine Learning SearchScripting Integration & Streaming Management & Coordinantion Resource Management Storage Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components Most used for BI
  • 18.
    18 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 19.
    19 CDH Cloudera’s Distribution includingApache Hadoop - Components Hadoop Distributed File System YARN and MapReduce Spark HBase Flume Sqoop Hive Impala Solr ... Hadoop Ecosystem Hadoop Core Components CDH
  • 20.
    20 CDH - ImportantComponents Component Definition Project What does itdo? Spark In-memory execution framework HBase NoSQL database built onHDFS Hive SQL processing engine designed for batch workloads Impala SQL query engine designed for BI workloads Parquet Very efficient columnar data storage format Sqoop Data movement to/from RDBMSs Flume, Kafka Streaming dataingestion Solr Enables users to find the data they need Hue Web-based user interface for Hadoop Oozie Workflow scheduler used to managejobs Sentry Authorization tool, providing security for Hadoop
  • 21.
    21 How do Icreate a Lab Environment? Cloudera QuickStats https://goo.gl/zwwDRg
  • 22.
    22 How do Icreate a Lab Environment? Cloudera QuickStats https://goo.gl/zwwDRg Recommended requirements: 4 cores - 12 GB RAM
  • 23.
    23 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 24.
    24 Qlik + Cloudera 15Points of Integration with Cloudera Go Beyond SQL Fast & Flexible BI & Analytics Enterprise Ready Data Lake Browser Cloudera Data Explorer Writeback with Kudu Interactive analytics IoT and Kafka Integration Event driven / Streaming analytics App on Demand w/ Impala In memory user generated slices Direct Query w/ Impala Data stored in Parquet or Kudu Complex Data Types with Impala Maps, arrays, and structures Data Science Workbench Powered by Qlik Associative Engine Advanced Analytics Integration with Spark/Python/R Solr Integration In-memory apps built on Solr Data Qlik Solr-API App on Demand Search + QAP + D3js Cloudera Altus Analytic DB Integration Cloudera Metadata Miner Impala, Cloudera Manager, Navigator SAP Offload with Attunity SAP S&D Module into HDFS/Impala Security – SSO Support Kerberos delegation/SSO pass-thru Cloudera Metrics Dashboard REST API based management console for Cloudera Manager
  • 25.
    25 DEMO Cloudera Data LakeExplorer & Cloudera Metadata Miner
  • 26.
    26 Qlik + Cloudera ClouderaData Lake Explorer https://goo.gl/g7PywC
  • 27.
    27 Qlik + Cloudera ClouderaData Lake Explorer https://goo.gl/g7PywC
  • 28.
    28 Qlik + Cloudera ClouderaData Lake Explorer https://goo.gl/g7PywC
  • 29.
    29 Qlik + Cloudera ClouderaData Lake Explorer https://goo.gl/g7PywC
  • 30.
    30 Qlik + Cloudera ClouderaMetadata Catalog https://goo.gl/g7PywC
  • 31.
  • 32.
    32 Agenda 1 Big DataLandscape 2 Hadoop Ecosystem 3 Cloudera 4 15 Points of Integration with Cloudera 5 Qlik Big Data Methodologies
  • 33.
    33 Qlik Big DataMethodologies Different data volumes and complexities are best met using different methods Different methods ensure an optimized experience for the user for every situation Methods can be combined to meet different use cases Methods vary in deployment complexity Data Volume • Size (rows) • Dimensions (columns) • Cardinality (uniqueness) App Complexity • Computational complexity such as set analysis • Object density Segmentation Chaining In-Memory On-Demand App Generation On-Demand App Generation (API’s)
  • 34.
    34 On Demand AppGeneration 1. User views summary data in Selection App and selects a slice of data 2. User requests the Analysis App to be built 3. Source data is extracted and Analysis App is created 4. Repeat steps 1-3 as many times as needed Big Data Repository Selection App Summary Data 1 Analysis App Request Request 2 2 3
  • 35.
    35 ODAG – SelectionApp Aggregated Data Dictionary
  • 36.
    36 ODAG – AnalysisApp 2. Where-Statement Generation 1. Binding of Selections
  • 37.
  • 38.
    38 On Demand AppGeneration Too hard? Ask a wizard to help you ! https://goo.gl/dNkdB7
  • 39.
  • 40.
    40 Will you sharethis presentation? https://goo.gl/ZMcGP9
  • 41.
    Obrigado ! Luciano Assad SolutionArchitect – Pre-Sales Brasil luciano.assad@qlik.com