QMeeting 2018 - Como integrar qlik e cloudera

How to integrate
Qlik with Cloudera
10 de Novembro de 2018
Luciano Assad
Solution Architect – Pre-Sales

2
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies

3
Agenda
2 Hadoop Ecosystem
3 Cloudera

4
http://mattturck.com/bigdata2017/

6
Challenge
Pokemon
or
Big Data?

13
https://pixelastic.github.io/pokemonorbigdata/
- Pokemon or Big Data

14
Agenda
2 Hadoop Ecosystem
3 Cloudera

15
What is Apache Hadoop?
• Hadoop is a software framework for storing, processing, and analyzing
“big data”
- Open source
- Distributed
- Scalable
- Fault-tolerant
• Hadoop - Blocks diagram:
HDFS MapReduceYARN
A file system to
manage the storage
of data
A framework to
define a data
processing task
A framework to run
the data processing
task

16
Large Ecosystem
In-Memory,
Data Flow
Engine
Analytical
SQL-on-
Hadoop
NoSQL
Database
Machine
Learning
Search Scripting Integration
&
Streaming
Management
&
Coordinantion
Resource
Management
Storage
Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components

17
Large Ecosystem
In-Memory,
Data Flow
Engine
Analytical
SQL-on-
Hadoop
NoSQL
Database
Machine
Learning
Search Scripting Integration
&
Streaming
Management
&
Coordinantion
Resource
Management
Storage
Reference: https://www.dotnettricks.com/learn/hadoop/apache-hadoop-ecosystem-and-components
Most used for BI

18
Agenda
2 Hadoop Ecosystem
3 Cloudera

19
CDH
Cloudera’s Distribution including Apache Hadoop - Components
Hadoop Distributed File System
YARN and MapReduce
Spark
HBase Flume
Sqoop Hive
Impala
Solr
...
Hadoop
Ecosystem
Hadoop Core
Components
CDH

20
CDH - Important Components
Component Definition
Project What does itdo?
Spark In-memory execution framework
HBase NoSQL database built onHDFS
Hive SQL processing engine designed for batch workloads
Impala SQL query engine designed for BI workloads
Parquet Very efficient columnar data storage format
Sqoop Data movement to/from RDBMSs
Flume, Kafka Streaming dataingestion
Solr Enables users to find the data they need
Hue Web-based user interface for Hadoop
Oozie Workflow scheduler used to managejobs
Sentry Authorization tool, providing security for Hadoop

21
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg

22
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg
Recommended requirements:
4 cores - 12 GB RAM

23
Agenda
2 Hadoop Ecosystem
3 Cloudera

24
Qlik + Cloudera
15 Points of Integration with Cloudera
Go Beyond SQL
Fast & Flexible
BI & Analytics
Enterprise Ready
Data Lake Browser
Cloudera Data Explorer
Writeback with Kudu
Interactive analytics
IoT and Kafka Integration
Event driven / Streaming analytics
App on Demand w/ Impala
In memory user generated slices
Direct Query w/ Impala
Data stored in Parquet or Kudu
Complex Data Types with Impala
Maps, arrays, and structures
Data Science Workbench
Powered by Qlik Associative Engine
Advanced Analytics
Integration with Spark/Python/R
Solr Integration
In-memory apps built on Solr Data
Qlik Solr-API App on Demand
Search + QAP + D3js
Cloudera Altus
Analytic DB Integration
Cloudera Metadata Miner
Impala, Cloudera Manager, Navigator
SAP Offload with Attunity
SAP S&D Module into HDFS/Impala
Security – SSO Support
Kerberos delegation/SSO pass-thru
Cloudera Metrics Dashboard
REST API based management
console for Cloudera Manager

25
DEMO
Cloudera Data Lake Explorer
&
Cloudera Metadata Miner

26
Qlik + Cloudera
https://goo.gl/g7PywC

27
Qlik + Cloudera

28
Qlik + Cloudera

29
Qlik + Cloudera

30
Qlik + Cloudera
Cloudera Metadata Catalog

31
Qlik + Cloudera
http://cloudera.qlik.com/
Where to find more information?

32
Agenda
2 Hadoop Ecosystem
3 Cloudera

33
Qlik Big Data Methodologies
Different data volumes and complexities are best met using different methods
Different methods ensure an optimized
experience for the user for every situation
Methods can be combined to meet different
use cases
Methods vary in deployment complexity
Data Volume
• Size (rows)
• Dimensions
(columns)
• Cardinality
(uniqueness)
App Complexity
• Computational
complexity such
as set analysis
• Object density
Segmentation
Chaining
In-Memory
On-Demand
App Generation
On-Demand App
Generation (API’s)

34
On Demand App Generation
1. User views summary data in
Selection App and selects a slice of
data
2. User requests the Analysis App to
be built
3. Source data is extracted and
Analysis App is created
4. Repeat steps 1-3 as many times as
needed
Big Data Repository
Selection
App
Summary
Data
1
Analysis App
Request
Request
2
2
3

35
ODAG – Selection App
Aggregated Data Dictionary

36
ODAG – Analysis App
2. Where-Statement Generation
1. Binding of Selections

37
DEMO

38
Too hard? Ask a wizard to help you !
https://goo.gl/dNkdB7

40
Will you share this presentation?
https://goo.gl/ZMcGP9

Obrigado !
Luciano Assad
Solution Architect – Pre-Sales Brasil
luciano.assad@qlik.com

QMeeting 2018 - Como integrar qlik e cloudera

More Related Content

What's hot

Similar to QMeeting 2018 - Como integrar qlik e cloudera

More from Roberto Oliveira

Recently uploaded

QMeeting 2018 - Como integrar qlik e cloudera