SlideShare a Scribd company logo
1 of 15
© 2016 Ness SES. All Rights Reserved1
BIG DATA
Open Source Projects
vs
Amazon Services
MOLDOVAN Radu Adrian
Iasi May 2016
© 2016 Ness SES. All Rights Reserved2
Who am I? :)
❏ passionate about technology
❏ 20 years of programming
using open source
❏ last 4 years in Big Data
❏ Big Data Architect @
© 2016 Ness SES. All Rights
Reserved
3
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
Database
search
index
Cache
← Single Point of Failure
← Limited Scalability
read read
writewrite
© 2016 Ness SES. All Rights
Reserved
4
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
readwrite read write
noSQL Ring
1 2
4 5
3
search
1 2
3 4
n
DFS
Resource
Manager
1
HDD
s
CPU
RAM
2
HDD
s
CPU
RAM
n
HDD
s
CPU
RAM
DFS
MPP
RES.
MANAGER
© 2016 Ness SES. All Rights Reserved5
INFRASTRUCTURE LAYER
Database
Analytics
Bigdata
INFORMATION LAYER
MULTI CHANNEL DELIVERY
Dashboard Laptop Mobile/Tablet Email SMS Print
ANALYTICS LAYER
Realtime
Near Realtime
Reports + Statistics Custom Tools
Data Processing
- system generated data
- dimensional data
- de/normalize data
Data Ingestion/Extraction
- external data
- reference internal data
- discovery data
Data Loading
- operational data
- business information
data
Architecture - High Level
© 2016 Ness SES. All Rights
Reserved
6
Big data -ETL+BI
ERP
Flat
Files
CRM
Live
Stream
RDBMS
Web
Services
Extract Transform Load
Massive
Parallel
Processing
Distributed
System
noSQL DB
warehouse
DB(OLAP)
search
engines
Business Intelligence
Web
Services
Data
Science
Data
Monetization
Data
Exploration
Data
Visualisation
ETL BI
© 2016 Ness SES. All Rights Reserved7
CONSISTENCY
(quorum)
AVAILABILITY
PARTITIONING
RDBMS
HP Vertica(Columnar)
Cassandra (Columnar)
Dynamo (Key-Value)
Couchbase(Document)
Riak (Document)
HDFS
HBase (Columnar)
MongoDB (Document)
Redis (Key-Value)
Memcached(Key-Value)
2
CAP Theorem
© 2016 Ness SES. All Rights Reserved8
Coordinator
ZooKeeper
Management
Ambari
Workflow
Oozie
???NiFi
Security
Ranger+Knox+Falcon
Kerberos
LDAP
Cluster ecosystem - components
Monitoring
Ganglia Nagios
Logs
Kibana
Logstash
© 2016 Ness SES. All Rights Reserved9
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - COLLECT
Data Integration
Talend
Informatica
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
Data Loader
Sqoop
Data Governance
Atlas
Amazon Simple Queue Service(SQS)
Amazon Kinesis
© 2016 Ness SES. All Rights Reserved10
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
MapReduce
PIG
Analytics
Impala(Drill) GRAPHs
Spark GraphX,
Neo4J, Titan
Flink Gelly
HBase
MongoDB
HIVE
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - PROCESS
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Amazon DynamoDB
Amazon EC2
Amazon EMR Amazon S3
Amazon Glacier
© 2016 Ness SES. All Rights Reserved11
Warehouse DB
Presto (ANSI)
HP Vertica
Search Engines
SolrCloud
Elastic Search
Columnar Store
Cassandra
Accumulo
Machine
Learning
Spark ML
FlinkML, Mahout
Key - Value
Store
Redis, Riak,
Memcached
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - STORE
Amazon Redshift
Amazon DynamoDB
Amazon ElasticCache
Amazon ElasticSearch
Amazon ML
© 2016 Ness SES. All Rights Reserved12
Tableau
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - components
Logi
Jasper
Reports
D3
Pentaho*
Crystal
Reports*
© 2016 Ness SES. All Rights Reserved13
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
Warehouse DB
Presto (ANSI)
HP Vertica
MapReduce
PIG
Search Engines
SolrCloud
Elastic Search
Data Integration
Talend
Informatica
Analytics
Columnar Store
Cassandra
Accumulo
Impala(Drill) GRAPHs
Spark GraphX,
Titan, Neo4J
Flink Gelly
Machine
Learning
Spark ML
FlinkML, Mahout
HBase
MongoDB
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
HIVE
Tableau
Key - Value
Store
Redis, Riak,
Memcached
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
COLLECT PROCESS STORE VISUALIZE
Data Loader
Sqoop
Cluster ecosystem - VISUALIZE
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Logi
Jasper
Reports
D3
Pentaho*
Interactiv
e
Reporting
Crystal
Reports
Data Governance
Atlas
© 2016 Ness SES. All Rights Reserved14
Trends - Forbes report Q1 2016
http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
© 2016 Ness SES. All Rights Reserved15
Thank you!
Skype: r.moldovan

More Related Content

What's hot

Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
Big Data - Linked In_DEEPU
Big Data - Linked In_DEEPUBig Data - Linked In_DEEPU
Big Data - Linked In_DEEPU
Deepu M
 

What's hot (20)

Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Sharing bisnis big data v3 part1
Sharing  bisnis big data v3 part1Sharing  bisnis big data v3 part1
Sharing bisnis big data v3 part1
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data
Big DataBig Data
Big Data
 
Sharing bisnis big data v3 part2
Sharing  bisnis big data v3 part2Sharing  bisnis big data v3 part2
Sharing bisnis big data v3 part2
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management Things
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
 
Enabling Apache Spark for Hybrid Cloud
Enabling Apache Spark for Hybrid CloudEnabling Apache Spark for Hybrid Cloud
Enabling Apache Spark for Hybrid Cloud
 
Big Data - Part IV
Big Data - Part IVBig Data - Part IV
Big Data - Part IV
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Big Data - Part II
Big Data - Part IIBig Data - Part II
Big Data - Part II
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Big Data - Linked In_DEEPU
Big Data - Linked In_DEEPUBig Data - Linked In_DEEPU
Big Data - Linked In_DEEPU
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 

Similar to Big data advanced topics - part I

28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
Xanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology IntroductionXanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology Introduction
Alex G. Lee, Ph.D. Esq. CLP
 
Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction
Alex G. Lee, Ph.D. Esq. CLP
 

Similar to Big data advanced topics - part I (20)

From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 
Fast Big Data Ingest into SAP HANA
Fast Big Data Ingest into SAP HANAFast Big Data Ingest into SAP HANA
Fast Big Data Ingest into SAP HANA
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
SMACK stack and beyond
SMACK stack and beyondSMACK stack and beyond
SMACK stack and beyond
 
Cloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoTCloud Expo NYC 2017: Big Data in IoT
Cloud Expo NYC 2017: Big Data in IoT
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data Workloads
 
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Xanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology IntroductionXanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology Introduction
 
Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
963
963963
963
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 

Big data advanced topics - part I

  • 1. © 2016 Ness SES. All Rights Reserved1 BIG DATA Open Source Projects vs Amazon Services MOLDOVAN Radu Adrian Iasi May 2016
  • 2. © 2016 Ness SES. All Rights Reserved2 Who am I? :) ❏ passionate about technology ❏ 20 years of programming using open source ❏ last 4 years in Big Data ❏ Big Data Architect @
  • 3. © 2016 Ness SES. All Rights Reserved 3 … where Enterprise ends and Big Data starts www.XYZ.com Load 1 Balancer Load n Balancer Web 1.1 Server Web 1.x Server Web n.1 Server Web n.x Server Database search index Cache ← Single Point of Failure ← Limited Scalability read read writewrite
  • 4. © 2016 Ness SES. All Rights Reserved 4 … where Enterprise ends and Big Data starts www.XYZ.com Load 1 Balancer Load n Balancer Web 1.1 Server Web 1.x Server Web n.1 Server Web n.x Server readwrite read write noSQL Ring 1 2 4 5 3 search 1 2 3 4 n DFS Resource Manager 1 HDD s CPU RAM 2 HDD s CPU RAM n HDD s CPU RAM DFS MPP RES. MANAGER
  • 5. © 2016 Ness SES. All Rights Reserved5 INFRASTRUCTURE LAYER Database Analytics Bigdata INFORMATION LAYER MULTI CHANNEL DELIVERY Dashboard Laptop Mobile/Tablet Email SMS Print ANALYTICS LAYER Realtime Near Realtime Reports + Statistics Custom Tools Data Processing - system generated data - dimensional data - de/normalize data Data Ingestion/Extraction - external data - reference internal data - discovery data Data Loading - operational data - business information data Architecture - High Level
  • 6. © 2016 Ness SES. All Rights Reserved 6 Big data -ETL+BI ERP Flat Files CRM Live Stream RDBMS Web Services Extract Transform Load Massive Parallel Processing Distributed System noSQL DB warehouse DB(OLAP) search engines Business Intelligence Web Services Data Science Data Monetization Data Exploration Data Visualisation ETL BI
  • 7. © 2016 Ness SES. All Rights Reserved7 CONSISTENCY (quorum) AVAILABILITY PARTITIONING RDBMS HP Vertica(Columnar) Cassandra (Columnar) Dynamo (Key-Value) Couchbase(Document) Riak (Document) HDFS HBase (Columnar) MongoDB (Document) Redis (Key-Value) Memcached(Key-Value) 2 CAP Theorem
  • 8. © 2016 Ness SES. All Rights Reserved8 Coordinator ZooKeeper Management Ambari Workflow Oozie ???NiFi Security Ranger+Knox+Falcon Kerberos LDAP Cluster ecosystem - components Monitoring Ganglia Nagios Logs Kibana Logstash
  • 9. © 2016 Ness SES. All Rights Reserved9 COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - COLLECT Data Integration Talend Informatica Data Streaming Storm, MapR Streams Spark Streaming Flink Stream Data Aggregation Flume, Scribe Msg Brokers + Streams RabbitMQ ActiveMQ Kafka Data Loader Sqoop Data Governance Atlas Amazon Simple Queue Service(SQS) Amazon Kinesis
  • 10. © 2016 Ness SES. All Rights Reserved10 HADOOP (HDFS) Res. Manager Mesos Yarn MapReduce PIG Analytics Impala(Drill) GRAPHs Spark GraphX, Neo4J, Titan Flink Gelly HBase MongoDB HIVE COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - PROCESS In Memory Spark TEZ Cloudera, Hortonworks, MapR Amazon DynamoDB Amazon EC2 Amazon EMR Amazon S3 Amazon Glacier
  • 11. © 2016 Ness SES. All Rights Reserved11 Warehouse DB Presto (ANSI) HP Vertica Search Engines SolrCloud Elastic Search Columnar Store Cassandra Accumulo Machine Learning Spark ML FlinkML, Mahout Key - Value Store Redis, Riak, Memcached COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - STORE Amazon Redshift Amazon DynamoDB Amazon ElasticCache Amazon ElasticSearch Amazon ML
  • 12. © 2016 Ness SES. All Rights Reserved12 Tableau COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - components Logi Jasper Reports D3 Pentaho* Crystal Reports*
  • 13. © 2016 Ness SES. All Rights Reserved13 HADOOP (HDFS) Res. Manager Mesos Yarn Warehouse DB Presto (ANSI) HP Vertica MapReduce PIG Search Engines SolrCloud Elastic Search Data Integration Talend Informatica Analytics Columnar Store Cassandra Accumulo Impala(Drill) GRAPHs Spark GraphX, Titan, Neo4J Flink Gelly Machine Learning Spark ML FlinkML, Mahout HBase MongoDB Data Streaming Storm, MapR Streams Spark Streaming Flink Stream HIVE Tableau Key - Value Store Redis, Riak, Memcached Data Aggregation Flume, Scribe Msg Brokers + Streams RabbitMQ ActiveMQ Kafka COLLECT PROCESS STORE VISUALIZE Data Loader Sqoop Cluster ecosystem - VISUALIZE In Memory Spark TEZ Cloudera, Hortonworks, MapR Logi Jasper Reports D3 Pentaho* Interactiv e Reporting Crystal Reports Data Governance Atlas
  • 14. © 2016 Ness SES. All Rights Reserved14 Trends - Forbes report Q1 2016 http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
  • 15. © 2016 Ness SES. All Rights Reserved15 Thank you! Skype: r.moldovan