SlideShare a Scribd company logo
1©2017 Check Point Software Technologies Ltd.
BIG DATA FORUM
ALEXANDER FOK, BIG DATA
ARCHITECT
MARCH 2017 MEETING
2©2017 Check Point Software Technologies Ltd.
• Big Data Forum Scope
• Big Data Projects in CP
• No SQL Overview
Agenda
3©2017 Check Point Software Technologies Ltd.
Big Data Forum Scope
• Engineers talk about Engineering
• Bring ideas, dilemmas, problems, technologies, reference architectures
• Ask QUESTIONS
• What is Big Data Problem?
4©2017 Check Point Software Technologies Ltd.
Simple Exercise
• What is more expensive
• Network IO
• Disk IO
• RAM Memory IO
• CPU Cache IO
• CPU Context switch
• Google interview question
5©2017 Check Point Software Technologies Ltd.
How long would it take to copy 1TB file?
• USB 2 ~480 Mbps
• Network 1 Gbps
• SATA drive 6 Gbps
• SAS 12 Gbps
• SSD PCIe 10 Gbps
• Disk to disk
1TB/500Mbps = ~5h
• Over LAN 1TB/250Mbps
= ~10h
• Over WAN
1TB/100Mbps = ~25h
6©2017 Check Point Software Technologies Ltd.
Some World Numbers
• NYSE – 1TB/day
• Airbus A380 640TB/flight (10GB/s)
• Verizon – 1M Events/s, 20 B/day
7©2017 Check Point Software Technologies Ltd.
Big Data Forum for You
• What is Big Data Forum for you?
• What is Big Data for you?
• What do you expect from this forum?
• What are you willing to contribute to this forum?
8©2017 Check Point Software Technologies Ltd.
Some CP Projects
• Threat Cloud infrastructure – distributed threats DBs handles hundreds
of millions of threat indicators (IPs, files, domains, etc)
• Threat Cloud DataLake
• MTP infrastructure – AWS based applications store
• GWs TI Access Logs (AV, URL filtering, AB, etc) – 300 GB/day, 20K
events/s
• CPDiag 300GB/day diagnostic data from CPDiag enabled GWs
9©2017 Check Point Software Technologies Ltd.
Rule of Thumb
• If you can solve the problem with more RAM – do it
• If you can solve the problem with more CPUs (threads) – do it
• Else – you have to distribute the solution
10©2017 Check Point Software Technologies Ltd.
Machine Learning vs Big Data Problems
Big Data
Machine
Learning
11©2017 Check Point Software Technologies Ltd.
• Volume – a lot of data to collect and make accessible
• Velocity – processed quickly at a rapid pace (correlations, enrichments,
etc)
• Variety – no predefined data schema (veracity)
Big Data Three V’s Challenges
12©2017 Check Point Software Technologies Ltd.
• Sizing
• Duplication prevention
• Correlation
• Data Integrity, Consistency
• Visualization
• Integration with other systems
• Retention Policy, Distribution, Regulations, Security
• Storage and Backup
Big Data Three V’s Challenges in Real Life
Main Challenge - How can we
handle x10, 100, 1000 times more
load?
13©2017 Check Point Software Technologies Ltd.
• Scale out architecture
• Cheap storage
• Reasonable computation times for various data exploration scenarios:
̶ Key-value lookup
̶ Documents search
̶ Generic filtering and aggregation analysis - batch processing
̶ Interactive queries
• Stream Processing
̶ (Near) Real Time Complex Events Processing
NoSQL Selected Problems
14©2017 Check Point Software Technologies Ltd.
NoSQL Trend Overview
• Database providing mechanism for storage and retrieval of data other than
tabular relations used by RDBMS
• Data collection, visualization, access management – not always are part of
the solution ecosystem
• What now – everybody is looking for SQL over noSQL, or NewSQL
• Compromised Consistency (CAP Theorem)
• How many NoSQL DBs are there?
15©2017 Check Point Software Technologies Ltd.
CAP (Brewer's)
Theorem
16©2017 Check Point Software Technologies Ltd.
NoSQL – Where To?
• We have a lot of data
• What next?
• Emerging Computational Models
17©2017 Check Point Software Technologies Ltd.
NoSQL Types
• Key-Value stores
• Document stores
• Search Engines
• Column stores
• Graph DBs
• Time Series
• RDF, Object Oriented, Multivalue
• Cloud Provided Solutions
18©2017 Check Point Software Technologies Ltd.
Key-Value Stores
• map concept – key->value
̶ Value can be any object (blob)
• Cache – in memory
• Store – have solid persistency model
• Additional query mechanisms (secondary indexes, time series)
• Abstract Data Types (data structure and algorithms on them) – sets,
hashes, lists, queues, etc
• Distribution
• Aerospike, Couchbase, Dynamo, Redis, Riak, Hbase, Cassandra
19©2017 Check Point Software Technologies Ltd.
Document stores
• Semi structured data store
• key-value retrieval and APIs based on document properties
• Collections, tags, metadata, etc
• Search engines
• Couchbase (CouchDB + Memcached), MongoDB, , Riak, ElasticSearch
20©2017 Check Point Software Technologies Ltd.
Search Engines
• Limited Document Stores with poor raw data storage capabilities
• Strong indexing mechanisms at ingestion time
• Complex query capabilities
• ElasticSearch, SolR, Splunk
21©2017 Check Point Software Technologies Ltd.
Column stores
• Good for append only scenarios
• Good for batch data insertion
• Good for analytics queries requiring massive data partial reads
• Bad for updates
• Bad for specific objects searches
• Bad for real time analytics
• Vertica, GreenPlum, Cassandra, File systems based solutions
• OLAP usage, except Cassandra
22©2017 Check Point Software Technologies Ltd.
Column stores
23©2017 Check Point Software Technologies Ltd.
Graph DBs
• Graph traversal queries
̶ Social Networks Recommendation engines
• Data Modeling - not about implementation mechanism
• Scale problems
• Query Standardization issues – Cypher, SPARQL, XQuery, Gremlin
• Neo4j, TITAN
• Modern approach – computational engine oriented solutions – Spark
Graph, ELK Graph
24©2017 Check Point Software Technologies Ltd.
Time Series
• Used for meters performance counters analysis – e.g. monitoring systems
(CPU, RAM, etc)
• Riak TS, TSDB,
• https://prometheus.io/
• http://influxdb.com/
• http://opentsdb.net/
• https://github.com/kairosdb/kairosdb
25©2017 Check Point Software Technologies Ltd.
Cloud Provided Solutions
• Azure Tables
• Amazon DynamoDB
• Google BigTable, Spanner
26©2017 Check Point Software Technologies Ltd.
Tip of The Day
• Apache is cemetery full of dead bodies
̶ No one size fits all solutiontechnologyarchitecture
̶ COOL is not always RIGHT
• http://nosql-database.org/
• http://db-engines.com
27©2017 Check Point Software Technologies Ltd.
• Use of low language query languages (C++, Java, Python) and tools
(Kibana, Jupyter Notebooks)
• Lack of standardization
• Integrative challenge
• No all size fits all solutions – REAL NEED to KNOW and UNDERSTAND the
underlying technology
NoSQL Adaptation Barriers
28©2017 Check Point Software Technologies Ltd.
Yuval Noah Harari on Big Data, Google and the end of
free will
http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html
29©2017 Check Point Software Technologies Ltd.
“Listen, Google,”
• “Well, I know you from the day you were born. I have read all your emails, recorded all your phone calls, and
know your favourite films, your DNA and the entire biometric history of your heart. I have exact data about each
date you went on, and I can show you second-by-second graphs of your heart rate, blood pressure and sugar
levels whenever you went on a date with John or Paul. And, naturally enough, I know them as well as I know you.
Based on all this information, on my superb algorithms and on decades’ worth of statistics about millions of
relationships — I advise you to go with John, with an 87 per cent probability of being more satisfied with him in
the long run.
• “Indeed, I know you so well that I even know you don’t like this answer. Paul is much more handsome than John
and, because you give external appearances too much weight, you secretly wanted me to say ‘Paul’. Looks
matter, of course, but not as much as you think. Your biochemical algorithms — which evolved tens of thousands
of years ago in the African savannah — give external beauty a weight of 35 per cent in their overall rating of
potential mates. My algorithms — which are based on the most up-to-date studies and statistics — say that looks
have only a 14 per cent impact on the long-term success of romantic relationships. So, even though I took Paul’s
beauty into account, I still tell you that you would be better off with John.”
30©2017 Check Point Software Technologies Ltd.
Alexander Fok, Big Data Architect
THANK YOU

More Related Content

What's hot

Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
Kinetica
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
Databricks
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Cesare Cugnasco
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
Hari Shreedharan
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
Kinetica
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Databricks
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep Dive
Splunk
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Kinetica
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
Databricks
 
Spline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture OverviewSpline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture Overview
Vaclav Kosar
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
Lucidworks
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
Hari Shreedharan
 

What's hot (20)

Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep Dive
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
Spline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture OverviewSpline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture Overview
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 

Similar to Check Point Big Data Forum m3

Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
Saurabh K. Gupta
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
Big Data Boom
Big Data BoomBig Data Boom
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
Arnon Shimoni
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
Amazon Web Services
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
Trevor Parsons
 

Similar to Check Point Big Data Forum m3 (20)

Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Check Point Big Data Forum m3

  • 1. 1©2017 Check Point Software Technologies Ltd. BIG DATA FORUM ALEXANDER FOK, BIG DATA ARCHITECT MARCH 2017 MEETING
  • 2. 2©2017 Check Point Software Technologies Ltd. • Big Data Forum Scope • Big Data Projects in CP • No SQL Overview Agenda
  • 3. 3©2017 Check Point Software Technologies Ltd. Big Data Forum Scope • Engineers talk about Engineering • Bring ideas, dilemmas, problems, technologies, reference architectures • Ask QUESTIONS • What is Big Data Problem?
  • 4. 4©2017 Check Point Software Technologies Ltd. Simple Exercise • What is more expensive • Network IO • Disk IO • RAM Memory IO • CPU Cache IO • CPU Context switch • Google interview question
  • 5. 5©2017 Check Point Software Technologies Ltd. How long would it take to copy 1TB file? • USB 2 ~480 Mbps • Network 1 Gbps • SATA drive 6 Gbps • SAS 12 Gbps • SSD PCIe 10 Gbps • Disk to disk 1TB/500Mbps = ~5h • Over LAN 1TB/250Mbps = ~10h • Over WAN 1TB/100Mbps = ~25h
  • 6. 6©2017 Check Point Software Technologies Ltd. Some World Numbers • NYSE – 1TB/day • Airbus A380 640TB/flight (10GB/s) • Verizon – 1M Events/s, 20 B/day
  • 7. 7©2017 Check Point Software Technologies Ltd. Big Data Forum for You • What is Big Data Forum for you? • What is Big Data for you? • What do you expect from this forum? • What are you willing to contribute to this forum?
  • 8. 8©2017 Check Point Software Technologies Ltd. Some CP Projects • Threat Cloud infrastructure – distributed threats DBs handles hundreds of millions of threat indicators (IPs, files, domains, etc) • Threat Cloud DataLake • MTP infrastructure – AWS based applications store • GWs TI Access Logs (AV, URL filtering, AB, etc) – 300 GB/day, 20K events/s • CPDiag 300GB/day diagnostic data from CPDiag enabled GWs
  • 9. 9©2017 Check Point Software Technologies Ltd. Rule of Thumb • If you can solve the problem with more RAM – do it • If you can solve the problem with more CPUs (threads) – do it • Else – you have to distribute the solution
  • 10. 10©2017 Check Point Software Technologies Ltd. Machine Learning vs Big Data Problems Big Data Machine Learning
  • 11. 11©2017 Check Point Software Technologies Ltd. • Volume – a lot of data to collect and make accessible • Velocity – processed quickly at a rapid pace (correlations, enrichments, etc) • Variety – no predefined data schema (veracity) Big Data Three V’s Challenges
  • 12. 12©2017 Check Point Software Technologies Ltd. • Sizing • Duplication prevention • Correlation • Data Integrity, Consistency • Visualization • Integration with other systems • Retention Policy, Distribution, Regulations, Security • Storage and Backup Big Data Three V’s Challenges in Real Life Main Challenge - How can we handle x10, 100, 1000 times more load?
  • 13. 13©2017 Check Point Software Technologies Ltd. • Scale out architecture • Cheap storage • Reasonable computation times for various data exploration scenarios: ̶ Key-value lookup ̶ Documents search ̶ Generic filtering and aggregation analysis - batch processing ̶ Interactive queries • Stream Processing ̶ (Near) Real Time Complex Events Processing NoSQL Selected Problems
  • 14. 14©2017 Check Point Software Technologies Ltd. NoSQL Trend Overview • Database providing mechanism for storage and retrieval of data other than tabular relations used by RDBMS • Data collection, visualization, access management – not always are part of the solution ecosystem • What now – everybody is looking for SQL over noSQL, or NewSQL • Compromised Consistency (CAP Theorem) • How many NoSQL DBs are there?
  • 15. 15©2017 Check Point Software Technologies Ltd. CAP (Brewer's) Theorem
  • 16. 16©2017 Check Point Software Technologies Ltd. NoSQL – Where To? • We have a lot of data • What next? • Emerging Computational Models
  • 17. 17©2017 Check Point Software Technologies Ltd. NoSQL Types • Key-Value stores • Document stores • Search Engines • Column stores • Graph DBs • Time Series • RDF, Object Oriented, Multivalue • Cloud Provided Solutions
  • 18. 18©2017 Check Point Software Technologies Ltd. Key-Value Stores • map concept – key->value ̶ Value can be any object (blob) • Cache – in memory • Store – have solid persistency model • Additional query mechanisms (secondary indexes, time series) • Abstract Data Types (data structure and algorithms on them) – sets, hashes, lists, queues, etc • Distribution • Aerospike, Couchbase, Dynamo, Redis, Riak, Hbase, Cassandra
  • 19. 19©2017 Check Point Software Technologies Ltd. Document stores • Semi structured data store • key-value retrieval and APIs based on document properties • Collections, tags, metadata, etc • Search engines • Couchbase (CouchDB + Memcached), MongoDB, , Riak, ElasticSearch
  • 20. 20©2017 Check Point Software Technologies Ltd. Search Engines • Limited Document Stores with poor raw data storage capabilities • Strong indexing mechanisms at ingestion time • Complex query capabilities • ElasticSearch, SolR, Splunk
  • 21. 21©2017 Check Point Software Technologies Ltd. Column stores • Good for append only scenarios • Good for batch data insertion • Good for analytics queries requiring massive data partial reads • Bad for updates • Bad for specific objects searches • Bad for real time analytics • Vertica, GreenPlum, Cassandra, File systems based solutions • OLAP usage, except Cassandra
  • 22. 22©2017 Check Point Software Technologies Ltd. Column stores
  • 23. 23©2017 Check Point Software Technologies Ltd. Graph DBs • Graph traversal queries ̶ Social Networks Recommendation engines • Data Modeling - not about implementation mechanism • Scale problems • Query Standardization issues – Cypher, SPARQL, XQuery, Gremlin • Neo4j, TITAN • Modern approach – computational engine oriented solutions – Spark Graph, ELK Graph
  • 24. 24©2017 Check Point Software Technologies Ltd. Time Series • Used for meters performance counters analysis – e.g. monitoring systems (CPU, RAM, etc) • Riak TS, TSDB, • https://prometheus.io/ • http://influxdb.com/ • http://opentsdb.net/ • https://github.com/kairosdb/kairosdb
  • 25. 25©2017 Check Point Software Technologies Ltd. Cloud Provided Solutions • Azure Tables • Amazon DynamoDB • Google BigTable, Spanner
  • 26. 26©2017 Check Point Software Technologies Ltd. Tip of The Day • Apache is cemetery full of dead bodies ̶ No one size fits all solutiontechnologyarchitecture ̶ COOL is not always RIGHT • http://nosql-database.org/ • http://db-engines.com
  • 27. 27©2017 Check Point Software Technologies Ltd. • Use of low language query languages (C++, Java, Python) and tools (Kibana, Jupyter Notebooks) • Lack of standardization • Integrative challenge • No all size fits all solutions – REAL NEED to KNOW and UNDERSTAND the underlying technology NoSQL Adaptation Barriers
  • 28. 28©2017 Check Point Software Technologies Ltd. Yuval Noah Harari on Big Data, Google and the end of free will http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html
  • 29. 29©2017 Check Point Software Technologies Ltd. “Listen, Google,” • “Well, I know you from the day you were born. I have read all your emails, recorded all your phone calls, and know your favourite films, your DNA and the entire biometric history of your heart. I have exact data about each date you went on, and I can show you second-by-second graphs of your heart rate, blood pressure and sugar levels whenever you went on a date with John or Paul. And, naturally enough, I know them as well as I know you. Based on all this information, on my superb algorithms and on decades’ worth of statistics about millions of relationships — I advise you to go with John, with an 87 per cent probability of being more satisfied with him in the long run. • “Indeed, I know you so well that I even know you don’t like this answer. Paul is much more handsome than John and, because you give external appearances too much weight, you secretly wanted me to say ‘Paul’. Looks matter, of course, but not as much as you think. Your biochemical algorithms — which evolved tens of thousands of years ago in the African savannah — give external beauty a weight of 35 per cent in their overall rating of potential mates. My algorithms — which are based on the most up-to-date studies and statistics — say that looks have only a 14 per cent impact on the long-term success of romantic relationships. So, even though I took Paul’s beauty into account, I still tell you that you would be better off with John.”
  • 30. 30©2017 Check Point Software Technologies Ltd. Alexander Fok, Big Data Architect THANK YOU

Editor's Notes

  1. OLTP vs OLAP?
  2. ~200 NoSQL DBs