OBJETIVOS DO EVENTO
Fortalecer os estudos na área de Business Intelligence;
Promover o desenvolvimento de técnicas, metodologias e interfaces junto a comunidade;
Gerar interação entre estudantes, profissionais e empresas aumentado a qualidade do Networking.
14. 14
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
15. 15
What is Apache Hadoop?
• Hadoop is a software framework for storing, processing, and analyzing
“big data”
- Open source
- Distributed
- Scalable
- Fault-tolerant
• Hadoop - Blocks diagram:
HDFS MapReduceYARN
A file system to
manage the storage
of data
A framework to
define a data
processing task
A framework to run
the data processing
task
18. 18
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
19. 19
CDH
Cloudera’s Distribution including Apache Hadoop - Components
Hadoop Distributed File System
YARN and MapReduce
Spark
HBase Flume
Sqoop Hive
Impala
Solr
...
Hadoop
Ecosystem
Hadoop Core
Components
CDH
20. 20
CDH - Important Components
Component Definition
Project What does itdo?
Spark In-memory execution framework
HBase NoSQL database built onHDFS
Hive SQL processing engine designed for batch workloads
Impala SQL query engine designed for BI workloads
Parquet Very efficient columnar data storage format
Sqoop Data movement to/from RDBMSs
Flume, Kafka Streaming dataingestion
Solr Enables users to find the data they need
Hue Web-based user interface for Hadoop
Oozie Workflow scheduler used to managejobs
Sentry Authorization tool, providing security for Hadoop
21. 21
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg
22. 22
How do I create a Lab Environment?
Cloudera QuickStats
https://goo.gl/zwwDRg
Recommended requirements:
4 cores - 12 GB RAM
23. 23
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
24. 24
Qlik + Cloudera
15 Points of Integration with Cloudera
Go Beyond SQL
Fast & Flexible
BI & Analytics
Enterprise Ready
Data Lake Browser
Cloudera Data Explorer
Writeback with Kudu
Interactive analytics
IoT and Kafka Integration
Event driven / Streaming analytics
App on Demand w/ Impala
In memory user generated slices
Direct Query w/ Impala
Data stored in Parquet or Kudu
Complex Data Types with Impala
Maps, arrays, and structures
Data Science Workbench
Powered by Qlik Associative Engine
Advanced Analytics
Integration with Spark/Python/R
Solr Integration
In-memory apps built on Solr Data
Qlik Solr-API App on Demand
Search + QAP + D3js
Cloudera Altus
Analytic DB Integration
Cloudera Metadata Miner
Impala, Cloudera Manager, Navigator
SAP Offload with Attunity
SAP S&D Module into HDFS/Impala
Security – SSO Support
Kerberos delegation/SSO pass-thru
Cloudera Metrics Dashboard
REST API based management
console for Cloudera Manager
32. 32
Agenda
1 Big Data Landscape
2 Hadoop Ecosystem
3 Cloudera
4 15 Points of Integration with Cloudera
5 Qlik Big Data Methodologies
33. 33
Qlik Big Data Methodologies
Different data volumes and complexities are best met using different methods
Different methods ensure an optimized
experience for the user for every situation
Methods can be combined to meet different
use cases
Methods vary in deployment complexity
Data Volume
• Size (rows)
• Dimensions
(columns)
• Cardinality
(uniqueness)
App Complexity
• Computational
complexity such
as set analysis
• Object density
Segmentation
Chaining
In-Memory
On-Demand
App Generation
On-Demand App
Generation (API’s)
34. 34
On Demand App Generation
1. User views summary data in
Selection App and selects a slice of
data
2. User requests the Analysis App to
be built
3. Source data is extracted and
Analysis App is created
4. Repeat steps 1-3 as many times as
needed
Big Data Repository
Selection
App
Summary
Data
1
Analysis App
Request
Request
2
2
3