More Related Content Similar to Big data advanced topics - part I (20) Big data advanced topics - part I1. © 2016 Ness SES. All Rights Reserved1
BIG DATA
Open Source Projects
vs
Amazon Services
MOLDOVAN Radu Adrian
Iasi May 2016
2. © 2016 Ness SES. All Rights Reserved2
Who am I? :)
❏ passionate about technology
❏ 20 years of programming
using open source
❏ last 4 years in Big Data
❏ Big Data Architect @
3. © 2016 Ness SES. All Rights
Reserved
3
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
Database
search
index
Cache
← Single Point of Failure
← Limited Scalability
read read
writewrite
4. © 2016 Ness SES. All Rights
Reserved
4
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
readwrite read write
noSQL Ring
1 2
4 5
3
search
1 2
3 4
n
DFS
Resource
Manager
1
HDD
s
CPU
RAM
2
HDD
s
CPU
RAM
n
HDD
s
CPU
RAM
DFS
MPP
RES.
MANAGER
5. © 2016 Ness SES. All Rights Reserved5
INFRASTRUCTURE LAYER
Database
Analytics
Bigdata
INFORMATION LAYER
MULTI CHANNEL DELIVERY
Dashboard Laptop Mobile/Tablet Email SMS Print
ANALYTICS LAYER
Realtime
Near Realtime
Reports + Statistics Custom Tools
Data Processing
- system generated data
- dimensional data
- de/normalize data
Data Ingestion/Extraction
- external data
- reference internal data
- discovery data
Data Loading
- operational data
- business information
data
Architecture - High Level
6. © 2016 Ness SES. All Rights
Reserved
6
Big data -ETL+BI
ERP
Flat
Files
CRM
Live
Stream
RDBMS
Web
Services
Extract Transform Load
Massive
Parallel
Processing
Distributed
System
noSQL DB
warehouse
DB(OLAP)
search
engines
Business Intelligence
Web
Services
Data
Science
Data
Monetization
Data
Exploration
Data
Visualisation
ETL BI
7. © 2016 Ness SES. All Rights Reserved7
CONSISTENCY
(quorum)
AVAILABILITY
PARTITIONING
RDBMS
HP Vertica(Columnar)
Cassandra (Columnar)
Dynamo (Key-Value)
Couchbase(Document)
Riak (Document)
HDFS
HBase (Columnar)
MongoDB (Document)
Redis (Key-Value)
Memcached(Key-Value)
2
CAP Theorem
8. © 2016 Ness SES. All Rights Reserved8
Coordinator
ZooKeeper
Management
Ambari
Workflow
Oozie
???NiFi
Security
Ranger+Knox+Falcon
Kerberos
LDAP
Cluster ecosystem - components
Monitoring
Ganglia Nagios
Logs
Kibana
Logstash
9. © 2016 Ness SES. All Rights Reserved9
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - COLLECT
Data Integration
Talend
Informatica
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
Data Loader
Sqoop
Data Governance
Atlas
Amazon Simple Queue Service(SQS)
Amazon Kinesis
10. © 2016 Ness SES. All Rights Reserved10
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
MapReduce
PIG
Analytics
Impala(Drill) GRAPHs
Spark GraphX,
Neo4J, Titan
Flink Gelly
HBase
MongoDB
HIVE
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - PROCESS
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Amazon DynamoDB
Amazon EC2
Amazon EMR Amazon S3
Amazon Glacier
11. © 2016 Ness SES. All Rights Reserved11
Warehouse DB
Presto (ANSI)
HP Vertica
Search Engines
SolrCloud
Elastic Search
Columnar Store
Cassandra
Accumulo
Machine
Learning
Spark ML
FlinkML, Mahout
Key - Value
Store
Redis, Riak,
Memcached
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - STORE
Amazon Redshift
Amazon DynamoDB
Amazon ElasticCache
Amazon ElasticSearch
Amazon ML
12. © 2016 Ness SES. All Rights Reserved12
Tableau
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - components
Logi
Jasper
Reports
D3
Pentaho*
Crystal
Reports*
13. © 2016 Ness SES. All Rights Reserved13
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
Warehouse DB
Presto (ANSI)
HP Vertica
MapReduce
PIG
Search Engines
SolrCloud
Elastic Search
Data Integration
Talend
Informatica
Analytics
Columnar Store
Cassandra
Accumulo
Impala(Drill) GRAPHs
Spark GraphX,
Titan, Neo4J
Flink Gelly
Machine
Learning
Spark ML
FlinkML, Mahout
HBase
MongoDB
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
HIVE
Tableau
Key - Value
Store
Redis, Riak,
Memcached
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
COLLECT PROCESS STORE VISUALIZE
Data Loader
Sqoop
Cluster ecosystem - VISUALIZE
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Logi
Jasper
Reports
D3
Pentaho*
Interactiv
e
Reporting
Crystal
Reports
Data Governance
Atlas
14. © 2016 Ness SES. All Rights Reserved14
Trends - Forbes report Q1 2016
http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
15. © 2016 Ness SES. All Rights Reserved15
Thank you!
Skype: r.moldovan