© 2016 Ness SES. All Rights Reserved1
BIG DATA
advanced topics
Cloudera vs Hortonworks
MOLDOVAN Radu Adrian
Timisoara May 2016
© 2016 Ness SES. All Rights Reserved2
Who am I? :)
❏ passionate about technology
❏ 20 years of programming
using open source
❏ last 4 years in Big Data
❏ Big Data Architect @
© 2016 Ness SES. All Rights Reserved3
© 2016 Ness SES. All Rights Reserved4
Cloudera and Hortonworks: The Similarities
- set on top of Apache Hadoop
- both are mature offering security
- provide paid consulting, training and services
- strong development communities
- master-slave architecture
- support MapReduce
- YARN as resource manager
- reducing the deployment time
- set on top of Apache Hadoop
- both are mature offering security
- provide paid consulting, training
and services
- strong development
communities
- master-slave architecture
- support MapReduce
- YARN as resource manager
- reducing the deployment time
The Similarities
© 2016 Ness SES. All Rights Reserved5
Cloudera and Hortonworks: The Differences
- a commercial license
(a free 60-day trial)
- reposition as “enterprise
data hub”
- 2008, Facebook, Google,
Oracle and Yahoo in 2008
- +400 customers
- founds $1.04B
- open source license is
completely free.
- positioned as Hadoop distro
- has no proprietary software
- 2011, Teradata
- Yahoo & Microsoft
- founds $248M
https://www.crunchbase.com
© 2016 Ness SES. All Rights Reserved6
Security Solutions
http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
Hortonworks
Apache Ranger
Apache Knox
Apache Falcon
Cloudera
Project Rhino
Project Sentry
© 2016 Ness SES. All Rights Reserved7
HADOOP (HDFS) (C+H)
Res. Manager
Yarn (C+H)
Warehouse DB
Presto (H)
MapReduce
PIG(C+H)
Search Engines
SolrCloud (C+H)
Analytics
Columnar Store
Accumulo (C+H)
Impala(C)
Machine
Learning
Spark ML (C+H)
Mahout(H)
HBase(C+H)
Data Streaming
Storm(H)
Spark Streaming(C+H)
HIVE (C+H)
Tableau
Data Aggregation
Flume (C+H)
Msg Brokers +
Streams
Kafka (C+H)
COLLECT PROCESS STORE VISUALIZE
Data Loader
Sqoop (C+H)
Cluster ecosystem - VISUALIZE
In Memory
Spark (C+H)
Tez (H)
Logi
Jasper
Reports
D3
Pentaho*
Interactiv
e
Reporting
Crystal
Reports
Data
Governance
Atlas (H)
© 2016 Ness SES. All Rights Reserved8
Cloudera
© 2016 Ness SES. All Rights Reserved9
Cloudera Management Service
© 2016 Ness SES. All Rights Reserved10
Hortonworks
© 2016 Ness SES. All Rights Reserved11
Trends - Forbes report Q1 2016
http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
© 2016 Ness SES. All Rights Reserved12
Big Data - Buzz words #TAGs
FAULT
TOLERANCE
DATA
LOCALITY
LAMBDA
ARCHITECTURE
CRUD => CRUD
SHARDING
REPLICATION
RESILIENT
SYSTEMS
DISRUPTIVE
TECHNOLOGIES
Cloud Computing
Internet of Things
Data Analytics
© 2016 Ness SES. All Rights Reserved13
Thank you!
Skype: r.moldovan

Big data advance topics - part 2.pptx

  • 1.
    © 2016 NessSES. All Rights Reserved1 BIG DATA advanced topics Cloudera vs Hortonworks MOLDOVAN Radu Adrian Timisoara May 2016
  • 2.
    © 2016 NessSES. All Rights Reserved2 Who am I? :) ❏ passionate about technology ❏ 20 years of programming using open source ❏ last 4 years in Big Data ❏ Big Data Architect @
  • 3.
    © 2016 NessSES. All Rights Reserved3
  • 4.
    © 2016 NessSES. All Rights Reserved4 Cloudera and Hortonworks: The Similarities - set on top of Apache Hadoop - both are mature offering security - provide paid consulting, training and services - strong development communities - master-slave architecture - support MapReduce - YARN as resource manager - reducing the deployment time - set on top of Apache Hadoop - both are mature offering security - provide paid consulting, training and services - strong development communities - master-slave architecture - support MapReduce - YARN as resource manager - reducing the deployment time The Similarities
  • 5.
    © 2016 NessSES. All Rights Reserved5 Cloudera and Hortonworks: The Differences - a commercial license (a free 60-day trial) - reposition as “enterprise data hub” - 2008, Facebook, Google, Oracle and Yahoo in 2008 - +400 customers - founds $1.04B - open source license is completely free. - positioned as Hadoop distro - has no proprietary software - 2011, Teradata - Yahoo & Microsoft - founds $248M https://www.crunchbase.com
  • 6.
    © 2016 NessSES. All Rights Reserved6 Security Solutions http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a Hortonworks Apache Ranger Apache Knox Apache Falcon Cloudera Project Rhino Project Sentry
  • 7.
    © 2016 NessSES. All Rights Reserved7 HADOOP (HDFS) (C+H) Res. Manager Yarn (C+H) Warehouse DB Presto (H) MapReduce PIG(C+H) Search Engines SolrCloud (C+H) Analytics Columnar Store Accumulo (C+H) Impala(C) Machine Learning Spark ML (C+H) Mahout(H) HBase(C+H) Data Streaming Storm(H) Spark Streaming(C+H) HIVE (C+H) Tableau Data Aggregation Flume (C+H) Msg Brokers + Streams Kafka (C+H) COLLECT PROCESS STORE VISUALIZE Data Loader Sqoop (C+H) Cluster ecosystem - VISUALIZE In Memory Spark (C+H) Tez (H) Logi Jasper Reports D3 Pentaho* Interactiv e Reporting Crystal Reports Data Governance Atlas (H)
  • 8.
    © 2016 NessSES. All Rights Reserved8 Cloudera
  • 9.
    © 2016 NessSES. All Rights Reserved9 Cloudera Management Service
  • 10.
    © 2016 NessSES. All Rights Reserved10 Hortonworks
  • 11.
    © 2016 NessSES. All Rights Reserved11 Trends - Forbes report Q1 2016 http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
  • 12.
    © 2016 NessSES. All Rights Reserved12 Big Data - Buzz words #TAGs FAULT TOLERANCE DATA LOCALITY LAMBDA ARCHITECTURE CRUD => CRUD SHARDING REPLICATION RESILIENT SYSTEMS DISRUPTIVE TECHNOLOGIES Cloud Computing Internet of Things Data Analytics
  • 13.
    © 2016 NessSES. All Rights Reserved13 Thank you! Skype: r.moldovan