SlideShare a Scribd company logo
Quick dive into the
Big Data pool
without drowning
Demi Ben-Ari - VP R&D @ Panorays
About Me
Demi Ben-Ari, Co-Founder & VP R&D @ Panorays
● BS’c Computer Science – Academic College Tel-Aviv Yaffo
● Co-Founder “Big Things” Big Data Community
In the Past:
● Sr. Data Engineer - Windward
● Team Leader & Sr. Java Software Engineer,
Missile defense and Alert System - “Ofek” – IAF
Interested in almost every kind of technology – A True Geek
Agenda
● Basic Concepts
● Introduction to Big Data frameworks
● Distributed Systems => Problems
● Monitoring
● Conclusions
Say “Distributed”, Say “Big Data”,
Say….
Some basic concepts
What is Big Data (IMHO)?
● Systems involving the “3 Vs”:
What are the right questions we want to ask?
○ Volume - How much?
○ Velocity - How fast?
○ Variety - What kind? (Difference)
What is Big Data (IMHO)
● Some define it the “7 Vs”
○ Variability (constantly changing)
○ Veracity (accuracy)
○ Visualization
○ Value
What is Big Data (IMHO)
● Characteristics
○ Multi-region availability
○ Very fast and reliable response
○ No single point of failure
Why Not Relational Data
● Relational Model Provides
○ Normalized table schema
○ Cross table joins
○ ACID compliance (Atomicity, Consistency, Isolation, Durability)
● But at very high cost
○ Big Data table joins - bilions of rows - massive overhead
○ Sharding tables across systems is complex and fragile
● Modern applications have different priorities
○ Needs for speed and availability come over consistency
○ Commodity servers racks trump massive high-end systems
○ Real world need for transactional guarantees is limited
What strategies help manage Big Data?
● Distribute data across nodes
○ Replication
● Relax consistency requirements
● Relax schema requirements
● Optimize data to suit actual needs
What is the NoSQL landscape?
● 4 broad classes of non-relational databases (DB-Engines)
○ Graph: data elements each relate to N others in graph / network
○ Key-Value: keys map to arbitrary values of any data type
○ Document: document sets (JSON) queryable in whole or part
○ Wide column Store (Column Family): keys mapped to sets of
n-numbers of typed columns
● Three key factors to help understand the subject
○ Consistency: Get identical results, regardless which node is queried?
○ Availability: Respond to very high read and write volumes?
○ Partition tolerance: Still available when part of it is down?
What is the CAP theorem?
● In distributed systems, consistency, availability and partition tolerance exist in
a manually dependant relationship, Pick any two.
Availability
Partition toleranceConsistency
MySQL, PostgreSQL,
Greenplum, Vertica,
Neo4J
Cassandra,
DynamoDB, Riak,
CouchDB, Voldemort
HBase, MongoDB, Redis, BigTable, BerkeleyDB
Graph
Key-Value
Wide Column
RDBMS
DB Engines - Comparison
● http://db-engines.com/en/ranking
DB Engines - Comparison
What does DevOps really mean?
Development
Software Engineering
UX
Operations
System Admin
Database Admin
What does DevOps really mean?
DevOps
Cross-functional teams
Operators automating systems
Developers operating systems
Introduction to
Big Data
Frameworks
https://d152j5tfobgaot.cloudfront.net/wp-content/uploads/2015/02/yourstory_BigData.jpg
Characteristics of Hadoop
● A system to process very large amounts of unstructured and complex
data with wanted speed
● A system to run on a large amount of machines that don’t share any
memory or disk
● A system to run on a cluster of machines which can put together in
relatively lower cost and easier maintenance
Hadoop Principals
● “A system to move the computation, where the data is”
● Key Concepts of Hadoop
Flexibility Scalability
Low cost
Fault
Tolerant
Hadoop Core Components
● HDFS - Hadoop Distributed File System
○ Provides a distributed data storage system to store data in smaller
blocks in a fail safe manner
● MapReduce - Programming framework
○ Has the ability to take a query over a dataset, divide it and run in in
parallel on multiple nodes
● YARN - (Yet Another Resource Negotiator) MRv2
○ Splitting a MapReduce Job Tracker’s info
■ Resource Manager (Global)
■ Application Manager (Per application)
Hadoop Ecosystem
Hadoop Core
HDFS
MapReduce /
YARN
Hadoop Common
Hadoop Applications
Hive Pig HBase Oozie Zookeeper Sqoop Spark
Hadoop (+Spark) Distributions
Elastic MapReduce DataProc
New Age BI Applications
● Able to understand various types of data
● Ability to clean the data
● Process data with applied rules locally and in distributed environment
● Visualize sizeable data with speed
● Extend results by sharing within the enterprise
Big Data Analytics
● Processing large amounts of data without data movement
● Avoid data connectors if possible (run natively)
● Ability to understand vast amount of data types and and data
compressions
● Ability to process data on variety of processing frameworks
● Distributed data processing
○ In-Memory a big plus
● Super fast visualization
○ In-Memory a big plus
When to choose hadoop?
● Large volumes of data to store and process
● Semi-Structured or Unstructured data
● Data is not well categorized
● Data contains a lot of redundancy
● Data arrives in streams or large batches
● Complex batch jobs arriving in parallel
● You don’t know how the data might be useful
Distributed Systems => Problems
https://imgflip.com/i/1ap5kr
http://kingofwallpapers.com/otter/otter-004.jpg
Monolith Structure
OS CPU Memory Disk
Processes Java
Application
Server
Database
Web Server
Load
Balancer
Users - Other Applications
Monitoring
System
UI
Many times...all of this was on a single physical server!
Distributed Microservices Architecture
Service A
Queue
DB
Service B
DBCache
Cache DBService C
Web
Server
DB
Analytics Cluster
Master
Slave Slave Slave
Monitoring System???
MongoDB + Spark
Worker 1
Worker 2
….
….
…
…
Worker N
Spark
Cluster
Master
Write
Read
MasterSahrded
MongoDB
Replica Set
Cassandra + Spark
Worker 1
Worker 2
….
….
…
…
Worker N
Cassandra
Cluster
Spark
Cluster
Write
Read
Cassandra + Serving
Cassandra
Cluster
Write
Read
UI Client
UI Client
UI Client
UI Client
Web
ServiceWeb
ServiceWeb
ServiceWeb
Service
Problems
● Multiple physical servers
● Multiple logical services
● Want Scaling => More Servers
● Even if you had all of the metrics
○ You’ll have an overflow of the data
● Your monitoring becomes a “Big Data” problem itself
This is what “Distributed” really Means
The DevOps Guy
(It might be you)
Monitoring is Crucial
http://memeguy.com/photo/46871/you-are-being-monitored
Monitoring
Operation System
Metrics
Some help
from “the Cloud”
AWS’s CloudWatch / GCP StackDriver
Report to Where?
● We chose:
● Graphite (InfluxDB) + Grafana
● Can correlate System and
Application metrics in one
place :)
Monitoring
Cassandra
Monitoring Cassandra
● OpsCenter - by DataStax
Monitoring Cassandra
Monitoring Spark
Ways to Monitoring Spark
● Grafana-spark-dashboards
○ Blog:
http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/
● Spark UI - Online on each application running
● Spark History Server - Offline (After application finishes)
● Spark REST API
○ Querying via inner tools to do ad-hoc monitoring
● Back to the basics: dstat, iostat, iotop, jstack
● Blog post by Tzach Zohar - “Tips from the Trenches”
Monitoring
Your Data
https://memegenerator.net/instance/53617544
Data Questions? What should be measure
● Did all of the computation occur?
○ Are there any data layers missing?
● How much data do we have? (Volume)
● Is all of the data in the Database?
● Data Quality Assurance
Data Answers!
● The method doesn’t really matter, as long as you:
○ Can follow the results over time
○ Know what your data flow, know what might fail
○ It’s easy for anyone to add more monitoring
(For the ones that add the new data each time…)
○ It don’t trust others to add monitoring
(It will always end up the DevOps’s “fault” -> No monitoring will be
applied)
Logging?
Monitoring?
https://lh4.googleusercontent.com/DFVcH-E5XKj8cbhEtI0qabmf_wwVqWWvk0pK5H5rnC_kVxY2tXClKfzV-LvAH61YRLJUEvtO9amjWfjcY4Z57VBYCuQ9
5_hdAVEHgLAuepJiArH0wJERWuzzmgnPysCiIA
ELK - Elasticsearch + Logstash + Kibana
http://www.digitalgov.gov/2014/05/07/analyzing-search-data-in-real-time-to-drive-decisions/
Monitoring Stack
Alerting
Metrics Collection
Datastore
Dashboard
Data Monitoring
Log Monitoring
Big Data - Are we there yet?
● “3 Vs”: - What are the right questions we want to ask?
○ Volume - How much?
■ Can it run on a single machine in reasonable time?
○ Velocity - How fast?
■ Can a single machine handle the throughput?
○ Variety - What kind? (Difference)
■ Is your data not changing and varying?
● If the answer for most of the previous questions is “Yes”?
Think again if you want to add the complexity of “Big Data”
Conclusions
● Think carefully before going into the “Big Data pool”
○ See if you really have a problem that you’re trying to solve
○ It’s not a silver bullet
● Take measures to automate and monitor everything
● Having Clusters and distributed frameworks will cost a lot - eventually
● Fit your storage layer(s) to the needs
Questions?
https://www.stayathomemum.com.au/wp-content/uploads/2015/01/DDDDDD.jpg
Still feel like you’re
drowning?
● LinkedIn
● Twitter: @demibenari
● Blog:
http://progexc.blogspot.com/
● demi.benari@gmail.com
● “Big Things” Community
Meetup, YouTube, Facebook,
Twitter
● GDG Cloud
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

More Related Content

What's hot

What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
Brian Brazil
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Eric Sammer
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
Eric Sammer
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
Raghavendra Prabhu
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented data
Eric Sammer
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
Paris Data Engineers !
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
Paris Container Day
 
Open TSDB Lightning Talk
Open TSDB Lightning TalkOpen TSDB Lightning Talk
Open TSDB Lightning Talk
CloudOps2005
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
Brian Brazil
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
Amuhinda Hungai
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
Datadog
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Optimizing Spark
Optimizing SparkOptimizing Spark
Optimizing Spark
Stitch Fix Algorithms
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 
ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)
Steve Elliott
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
DataStax Academy
 

What's hot (20)

What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented data
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
 
Open TSDB Lightning Talk
Open TSDB Lightning TalkOpen TSDB Lightning Talk
Open TSDB Lightning Talk
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Optimizing Spark
Optimizing SparkOptimizing Spark
Optimizing Spark
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)ELK Wrestling (Leeds DevOps)
ELK Wrestling (Leeds DevOps)
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
 

Viewers also liked

Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
Akmal Chaudhri
 
GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016
Joshua Bae
 
Essential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalEssential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data Arsenal
MongoDB
 
O.inf 12015 parte 1 (sec 12)- opinfo ok
O.inf 12015   parte 1 (sec 12)- opinfo okO.inf 12015   parte 1 (sec 12)- opinfo ok
O.inf 12015 parte 1 (sec 12)- opinfo ok
comandantebrasil2
 
La Revista Rural 01
La Revista Rural 01La Revista Rural 01
La Revista Rural 01
larevistarural
 
Gps Company Profile 4
Gps Company Profile 4Gps Company Profile 4
Gps Company Profile 4kelahi
 
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografia
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografiaMejors Pintores del Mundo-Ortega Maila-Obras Y biografia
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografia
Arte Mundial
 
Carpeta institucional club de abuelas 2013
Carpeta institucional  club de abuelas 2013Carpeta institucional  club de abuelas 2013
Carpeta institucional club de abuelas 2013Emanuel Pagés
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Michael Peters
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
SebKMT Cable Sheath testers & Fault Locators
SebKMT Cable Sheath testers & Fault LocatorsSebKMT Cable Sheath testers & Fault Locators
SebKMT Cable Sheath testers & Fault Locators
Thorne & Derrick International
 
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
Instituto Nacional de Psiquiatria "Dr Ramon de la Fuente Muñiz"
 
Guia rotaria para poner fin a la Polio
Guia rotaria para poner fin a la PolioGuia rotaria para poner fin a la Polio
Guia rotaria para poner fin a la PolioPablo Figueroa Bresler
 
Southern Boating Media Kit 2009
Southern Boating Media Kit 2009Southern Boating Media Kit 2009
Southern Boating Media Kit 2009
CarlMischka
 
Curriculum Vitae (Espanol) Justin Scott Newberry Sisson
Curriculum Vitae (Espanol)   Justin Scott Newberry SissonCurriculum Vitae (Espanol)   Justin Scott Newberry Sisson
Curriculum Vitae (Espanol) Justin Scott Newberry Sisson
jnewberr
 
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
La Huella imagen y comunicaciones, CA
 
Estudio de la influencia de elementos parásitos en el valor del roe de una an...
Estudio de la influencia de elementos parásitos en el valor del roe de una an...Estudio de la influencia de elementos parásitos en el valor del roe de una an...
Estudio de la influencia de elementos parásitos en el valor del roe de una an...
fralbe com
 
Sniffing HTTPS Using YAMAS | Lucideus Tech
Sniffing HTTPS Using YAMAS | Lucideus TechSniffing HTTPS Using YAMAS | Lucideus Tech
Sniffing HTTPS Using YAMAS | Lucideus Tech
Rahul Tyagi
 

Viewers also liked (20)

Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016
 
Essential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalEssential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data Arsenal
 
O.inf 12015 parte 1 (sec 12)- opinfo ok
O.inf 12015   parte 1 (sec 12)- opinfo okO.inf 12015   parte 1 (sec 12)- opinfo ok
O.inf 12015 parte 1 (sec 12)- opinfo ok
 
La Revista Rural 01
La Revista Rural 01La Revista Rural 01
La Revista Rural 01
 
Gps Company Profile 4
Gps Company Profile 4Gps Company Profile 4
Gps Company Profile 4
 
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografia
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografiaMejors Pintores del Mundo-Ortega Maila-Obras Y biografia
Mejors Pintores del Mundo-Ortega Maila-Obras Y biografia
 
Carpeta institucional club de abuelas 2013
Carpeta institucional  club de abuelas 2013Carpeta institucional  club de abuelas 2013
Carpeta institucional club de abuelas 2013
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value?
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
SebKMT Cable Sheath testers & Fault Locators
SebKMT Cable Sheath testers & Fault LocatorsSebKMT Cable Sheath testers & Fault Locators
SebKMT Cable Sheath testers & Fault Locators
 
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
30-ABRIL-2010-Factores asociados a la permanencia de los pacientes en el serv...
 
Guia rotaria para poner fin a la Polio
Guia rotaria para poner fin a la PolioGuia rotaria para poner fin a la Polio
Guia rotaria para poner fin a la Polio
 
Presentacion de internet !!!!
Presentacion de internet !!!!Presentacion de internet !!!!
Presentacion de internet !!!!
 
Southern Boating Media Kit 2009
Southern Boating Media Kit 2009Southern Boating Media Kit 2009
Southern Boating Media Kit 2009
 
Segunda de ralhp
Segunda de ralhpSegunda de ralhp
Segunda de ralhp
 
Curriculum Vitae (Espanol) Justin Scott Newberry Sisson
Curriculum Vitae (Espanol)   Justin Scott Newberry SissonCurriculum Vitae (Espanol)   Justin Scott Newberry Sisson
Curriculum Vitae (Espanol) Justin Scott Newberry Sisson
 
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
La Huella Imagen y Comunicaciones, la primera boutique comunicacional de Vene...
 
Estudio de la influencia de elementos parásitos en el valor del roe de una an...
Estudio de la influencia de elementos parásitos en el valor del roe de una an...Estudio de la influencia de elementos parásitos en el valor del roe de una an...
Estudio de la influencia de elementos parásitos en el valor del roe de una an...
 
Sniffing HTTPS Using YAMAS | Lucideus Tech
Sniffing HTTPS Using YAMAS | Lucideus TechSniffing HTTPS Using YAMAS | Lucideus Tech
Sniffing HTTPS Using YAMAS | Lucideus Tech
 

Similar to Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
Kumari Surabhi
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
rajkamaltibacademy
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Codemotion
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
Priyadarshini648418
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Demi Ben-Ari
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
Jay Gordon
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
Dr Hajji Hicham
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
taimur hafeez
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
Pramit Choudhary
 

Similar to Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays (20)

BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Grandata
GrandataGrandata
Grandata
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Big Data
Big DataBig Data
Big Data
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 

More from Demi Ben-Ari

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
CTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysCTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at Panorays
Demi Ben-Ari
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Demi Ben-Ari
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Demi Ben-Ari
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- Panorays
Demi Ben-Ari
 
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysAll I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
Demi Ben-Ari
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Demi Ben-Ari
 
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriCommunity, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
Demi Ben-Ari
 
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniKnow the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Demi Ben-Ari
 
Know the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniKnow the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek Alumni
Demi Ben-Ari
 
Bootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriBootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-Ari
Demi Ben-Ari
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
Demi Ben-Ari
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
Demi Ben-Ari
 
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek AlumniSpark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
Demi Ben-Ari
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
Demi Ben-Ari
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
Demi Ben-Ari
 
Transform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @WindwardTransform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @Windward
Demi Ben-Ari
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime Domain
Demi Ben-Ari
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
Demi Ben-Ari
 

More from Demi Ben-Ari (20)

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
CTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysCTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at Panorays
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- Panorays
 
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysAll I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
 
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriCommunity, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniKnow the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek Alumni
 
Know the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniKnow the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek Alumni
 
Bootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriBootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-Ari
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek AlumniSpark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
 
Transform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @WindwardTransform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @Windward
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime Domain
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

  • 1. Quick dive into the Big Data pool without drowning Demi Ben-Ari - VP R&D @ Panorays
  • 2. About Me Demi Ben-Ari, Co-Founder & VP R&D @ Panorays ● BS’c Computer Science – Academic College Tel-Aviv Yaffo ● Co-Founder “Big Things” Big Data Community In the Past: ● Sr. Data Engineer - Windward ● Team Leader & Sr. Java Software Engineer, Missile defense and Alert System - “Ofek” – IAF Interested in almost every kind of technology – A True Geek
  • 3. Agenda ● Basic Concepts ● Introduction to Big Data frameworks ● Distributed Systems => Problems ● Monitoring ● Conclusions
  • 4. Say “Distributed”, Say “Big Data”, Say….
  • 6. What is Big Data (IMHO)? ● Systems involving the “3 Vs”: What are the right questions we want to ask? ○ Volume - How much? ○ Velocity - How fast? ○ Variety - What kind? (Difference)
  • 7. What is Big Data (IMHO) ● Some define it the “7 Vs” ○ Variability (constantly changing) ○ Veracity (accuracy) ○ Visualization ○ Value
  • 8. What is Big Data (IMHO) ● Characteristics ○ Multi-region availability ○ Very fast and reliable response ○ No single point of failure
  • 9. Why Not Relational Data ● Relational Model Provides ○ Normalized table schema ○ Cross table joins ○ ACID compliance (Atomicity, Consistency, Isolation, Durability) ● But at very high cost ○ Big Data table joins - bilions of rows - massive overhead ○ Sharding tables across systems is complex and fragile ● Modern applications have different priorities ○ Needs for speed and availability come over consistency ○ Commodity servers racks trump massive high-end systems ○ Real world need for transactional guarantees is limited
  • 10. What strategies help manage Big Data? ● Distribute data across nodes ○ Replication ● Relax consistency requirements ● Relax schema requirements ● Optimize data to suit actual needs
  • 11. What is the NoSQL landscape? ● 4 broad classes of non-relational databases (DB-Engines) ○ Graph: data elements each relate to N others in graph / network ○ Key-Value: keys map to arbitrary values of any data type ○ Document: document sets (JSON) queryable in whole or part ○ Wide column Store (Column Family): keys mapped to sets of n-numbers of typed columns ● Three key factors to help understand the subject ○ Consistency: Get identical results, regardless which node is queried? ○ Availability: Respond to very high read and write volumes? ○ Partition tolerance: Still available when part of it is down?
  • 12. What is the CAP theorem? ● In distributed systems, consistency, availability and partition tolerance exist in a manually dependant relationship, Pick any two. Availability Partition toleranceConsistency MySQL, PostgreSQL, Greenplum, Vertica, Neo4J Cassandra, DynamoDB, Riak, CouchDB, Voldemort HBase, MongoDB, Redis, BigTable, BerkeleyDB Graph Key-Value Wide Column RDBMS
  • 13. DB Engines - Comparison ● http://db-engines.com/en/ranking
  • 14. DB Engines - Comparison
  • 15. What does DevOps really mean? Development Software Engineering UX Operations System Admin Database Admin
  • 16. What does DevOps really mean? DevOps Cross-functional teams Operators automating systems Developers operating systems
  • 18. Characteristics of Hadoop ● A system to process very large amounts of unstructured and complex data with wanted speed ● A system to run on a large amount of machines that don’t share any memory or disk ● A system to run on a cluster of machines which can put together in relatively lower cost and easier maintenance
  • 19. Hadoop Principals ● “A system to move the computation, where the data is” ● Key Concepts of Hadoop Flexibility Scalability Low cost Fault Tolerant
  • 20. Hadoop Core Components ● HDFS - Hadoop Distributed File System ○ Provides a distributed data storage system to store data in smaller blocks in a fail safe manner ● MapReduce - Programming framework ○ Has the ability to take a query over a dataset, divide it and run in in parallel on multiple nodes ● YARN - (Yet Another Resource Negotiator) MRv2 ○ Splitting a MapReduce Job Tracker’s info ■ Resource Manager (Global) ■ Application Manager (Per application)
  • 21. Hadoop Ecosystem Hadoop Core HDFS MapReduce / YARN Hadoop Common Hadoop Applications Hive Pig HBase Oozie Zookeeper Sqoop Spark
  • 23. New Age BI Applications ● Able to understand various types of data ● Ability to clean the data ● Process data with applied rules locally and in distributed environment ● Visualize sizeable data with speed ● Extend results by sharing within the enterprise
  • 24. Big Data Analytics ● Processing large amounts of data without data movement ● Avoid data connectors if possible (run natively) ● Ability to understand vast amount of data types and and data compressions ● Ability to process data on variety of processing frameworks ● Distributed data processing ○ In-Memory a big plus ● Super fast visualization ○ In-Memory a big plus
  • 25. When to choose hadoop? ● Large volumes of data to store and process ● Semi-Structured or Unstructured data ● Data is not well categorized ● Data contains a lot of redundancy ● Data arrives in streams or large batches ● Complex batch jobs arriving in parallel ● You don’t know how the data might be useful
  • 26. Distributed Systems => Problems https://imgflip.com/i/1ap5kr http://kingofwallpapers.com/otter/otter-004.jpg
  • 27. Monolith Structure OS CPU Memory Disk Processes Java Application Server Database Web Server Load Balancer Users - Other Applications Monitoring System UI Many times...all of this was on a single physical server!
  • 28. Distributed Microservices Architecture Service A Queue DB Service B DBCache Cache DBService C Web Server DB Analytics Cluster Master Slave Slave Slave Monitoring System???
  • 29. MongoDB + Spark Worker 1 Worker 2 …. …. … … Worker N Spark Cluster Master Write Read MasterSahrded MongoDB Replica Set
  • 30. Cassandra + Spark Worker 1 Worker 2 …. …. … … Worker N Cassandra Cluster Spark Cluster Write Read
  • 31. Cassandra + Serving Cassandra Cluster Write Read UI Client UI Client UI Client UI Client Web ServiceWeb ServiceWeb ServiceWeb Service
  • 32. Problems ● Multiple physical servers ● Multiple logical services ● Want Scaling => More Servers ● Even if you had all of the metrics ○ You’ll have an overflow of the data ● Your monitoring becomes a “Big Data” problem itself
  • 33. This is what “Distributed” really Means The DevOps Guy (It might be you)
  • 37. AWS’s CloudWatch / GCP StackDriver
  • 38. Report to Where? ● We chose: ● Graphite (InfluxDB) + Grafana ● Can correlate System and Application metrics in one place :)
  • 43. Ways to Monitoring Spark ● Grafana-spark-dashboards ○ Blog: http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/ ● Spark UI - Online on each application running ● Spark History Server - Offline (After application finishes) ● Spark REST API ○ Querying via inner tools to do ad-hoc monitoring ● Back to the basics: dstat, iostat, iotop, jstack ● Blog post by Tzach Zohar - “Tips from the Trenches”
  • 45. Data Questions? What should be measure ● Did all of the computation occur? ○ Are there any data layers missing? ● How much data do we have? (Volume) ● Is all of the data in the Database? ● Data Quality Assurance
  • 46. Data Answers! ● The method doesn’t really matter, as long as you: ○ Can follow the results over time ○ Know what your data flow, know what might fail ○ It’s easy for anyone to add more monitoring (For the ones that add the new data each time…) ○ It don’t trust others to add monitoring (It will always end up the DevOps’s “fault” -> No monitoring will be applied)
  • 48. ELK - Elasticsearch + Logstash + Kibana http://www.digitalgov.gov/2014/05/07/analyzing-search-data-in-real-time-to-drive-decisions/
  • 50. Big Data - Are we there yet? ● “3 Vs”: - What are the right questions we want to ask? ○ Volume - How much? ■ Can it run on a single machine in reasonable time? ○ Velocity - How fast? ■ Can a single machine handle the throughput? ○ Variety - What kind? (Difference) ■ Is your data not changing and varying? ● If the answer for most of the previous questions is “Yes”? Think again if you want to add the complexity of “Big Data”
  • 51. Conclusions ● Think carefully before going into the “Big Data pool” ○ See if you really have a problem that you’re trying to solve ○ It’s not a silver bullet ● Take measures to automate and monitor everything ● Having Clusters and distributed frameworks will cost a lot - eventually ● Fit your storage layer(s) to the needs
  • 53. ● LinkedIn ● Twitter: @demibenari ● Blog: http://progexc.blogspot.com/ ● demi.benari@gmail.com ● “Big Things” Community Meetup, YouTube, Facebook, Twitter ● GDG Cloud