SlideShare a Scribd company logo
1 of 24
In-Memory Computing 
Srinath Perera 
Director, Research 
WSO2 Inc.
Performance Numbers (based on Jeff Dean’s 
numbers ) 
Mem Ops 
/ Sec 
If Memory access 
is a Second 
L1 cache reference 0.05 1/20th sec 
Main memory reference 1 1 sec 
Send 2K bytes over 1 
Gbps network 200 3 min 
Read 1 MB sequentially 
from memory 2500 41 min 
Disk seek 1*10^5 27 hours 
Read 1 MB sequentially 
from disk 2*10^5 2 days 
Send packet CA- 
>Netherlands->CA 1.5*10^6 17 days 
Operation 
Speed 
MB/sec 
Hadoop Select 3 
Terasort Bench 
mark 18 
Complex Query 
Hadoop 0.2 
CEP 60 
CEP Complex 2.5 
SSD 300-500 
Disk 50-100
Performance Numbers (based on Jeff Dean’s 
numbers ) 
Mem Ops 
/ Sec 
If Memory access 
is a Second 
L1 cache reference 0.05 1/20th sec 
Main memory reference 1 1 sec 
Send 2K bytes over 1 
Gbps network 200 3 min 
Read 1 MB sequentially 
from memory 2500 41 min 
Disk seek 1*10^5 27 hours 
Read 1 MB sequentially 
from disk 2*10^5 2 days 
Send packet CA- 
>Netherlands->CA 1.5*10^6 17 days 
Operation 
Speed 
MB/sec 
Hadoop Select 3 
Terasort Bench 
mark 18 
Complex Query 
Hadoop 0.2 
CEP 60 
CEP Complex 2.5 
SSD 300-500 
Disk 50-100
Latency Lags Bandwidth 
• Observation in prof. Patterson’s 
Keynote at 2004 
• Bandwidth improves, but not latency 
• Same holds now, and the gap is 
widening with new systems
Handling Speed Differences in Memory 
Hierarchy 
1. Caching 
– E.g. Processor caches, file cache, 
disk cache, permission cache 
2. Replication 
– E.g. RAID, Content Distribution 
Networks (CDN), Web Cache 
3. Prediction – Predict what data 
will be needed and prefect 
– Tradeoff bandwidth 
– E.g. disk caches, Google Earth
Above three does not always work 
• Limitations 
– Caching works only if working set is small 
– Prefetching only works when access patterns are predictable 
– Replication is expensive and limited by receiving side machines 
• Lets assume you are reading and filtering 10G data 
(assuming 6b per record that is 17Billion records) 
– 3 minutes to read the data from disk 
– 35ms to filter 10M in my laptop => 1 minutes to process all data 
– Keeping data in memory can give about 30X more
Data Access Patterns in Big Data Applications 
• Read from Disk, process once (Basic Analytics) 
– Data can be perfected, batch load is only about 100 times faster. 
– OK if processing time > data read time 
• Read from Disk, iteratively Process (Machine Learning Algos, e.g. 
KMean) 
– Need to load data from disk once and process (e.g. Spark supports this) 
• Interactive (OLAP) 
– Queries are random, data may be scattered. Once query started, can load 
data to memory and process 
• Random Access (e.g. Graph Processing) 
– Very hard to optimize 
• Realtime Access 
– As data comes in
In-Memory Computing
Four Myths 
• Myths 
– Too expensive 1TB RAM cluster for 20-40k (about 1$/GB) 
– It is not durable 
– Flash is fast enough 
– It is about In-Memory DBs 
• From Nikita Ivanov’s post 
– http://gridgaintech.wordpress.com/2013/09/18/four-myths- 
of-in-memory-computing/
Let us look at each Big data access 
pattern and where In-Memory 
Computing can make a difference
Access Pattern 1:Read from Disk, Process Once 
• If Tp = 35ms vs 
Td=1.2 sec with 
60MB chunks, it 
will give about 
30X to keep all 
data in Memory 
• However, this 
benefit is less if 
computation is 
more complex 
(e.g. Sort)
Access Pattern 2: Read from Disk, iteratively Process 
• Very common pattern for machine learning 
algorithms (e.g. KMean) 
• On this case, advantages are greater 
– If we cannot hold data in memory fully, we need to offload 
– Then we need to read again 
– Then cost is very high to load and process and much faster 
with in memory computing 
• Spark let you load to memory fully and process
Spark 
• New Programming Model 
built on functional 
programming concepts 
• Can be much faster for 
recursive usecases 
• Have a complete stack of 
products 
file = 
spark.textFile("hdfs://...”) 
file.flatMap( 
line => line.split(" ")) 
.map(word => (word, 1)) 
.reduceByKey(_ + _)
Access Pattern 3: Interactive Queries 
• Need to be responsive, < 10 sec 
• Harder to predict what data is needed 
• Queries tend to be simpler 
• Can be made faster by a RAM Cloud 
– SAP Hana 
– Volt DB 
• With smaller queries, disk may still be OK. Apache 
Drill as an Alternative
VoltDB Story 
• VoltDB Team (Michael 
Stonebraker et al.) 
observed 92% of work in a 
DB related to Disk 
• By building complete in-memory 
database cluster 
they made it 20x faster!
Distributed Cloud (e.g. Hazelcast) 
• Store the data portioned and replicated across many 
machines 
• Used as a cache that span multipme machines 
• Key value access
Access Pattern 4: Random Accesses 
• E.g. Graph Traversal 
• This is the hardest usecase 
• In easy cases, there is a small working set and can be 
solved with a cache ( checking users against a black 
list), not the case with Graph most graph operations 
like traversal 
• Hard cases, In Memory Computing is only real 
solution 
• Can be as fast as 1000x or more
Access Pattern 5: Realtime Processing 
• This is already In-Memory technology using tools like 
Complex Event Processing (e.g. WSO2 CEP) or stream 
processing (e.g. Apache Storm)
Faster Access to Data 
• In-Memory databases (e.g. VoltDB, MemSQL) 
– Provide Same SQL interface 
– Can think as fast database 
– VoltDB has shown to about 20X faster than MySQL 
• Distributed Cache 
– Can Integrated as a Large Cache
Load Data Set to Memory and Analyze 
• Used with Interactive and Random access usecases 
• Can be as 1000x faster for some usecases 
• Tools 
– Spark 
– Hazelcast 
– SAP Hana
Realtime Processing 
• Realtime analytics tools 
– CEP (WSO2 CEP) 
– Stream Processing (e.g. Storm) 
• Can generate results within few 
milliseconds to seconds 
• Can process 10ks-millions of 
events per second 
• Not all algorithms can be 
implemented
In Memory Computing with WSO2 Platform
Thank You

More Related Content

What's hot

Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data BaseSiva Rushi
 
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingIntroduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingYanpingWang
 
Propelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsSingleStore
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseСергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseVolha Banadyseva
 
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...Insight Technology, Inc.
 
Brian Bulkowski. Aerospike
Brian Bulkowski. AerospikeBrian Bulkowski. Aerospike
Brian Bulkowski. AerospikeVolha Banadyseva
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioThai Bui
 
Vertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewVertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewAndrey Karpov
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsBrillix
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Welcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureWelcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureMariaDB plc
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913jasonfrantz
 

What's hot (20)

Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingIntroduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
 
Propelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive Analytics
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseСергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
 
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
 
Brian Bulkowski. Aerospike
Brian Bulkowski. AerospikeBrian Bulkowski. Aerospike
Brian Bulkowski. Aerospike
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
Vertica 7.0 Architecture Overview
Vertica 7.0 Architecture OverviewVertica 7.0 Architecture Overview
Vertica 7.0 Architecture Overview
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Welcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureWelcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the future
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 

Viewers also liked

In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
In-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataIn-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataSAP Technology
 
Data Migration Between MongoDB and Oracle
Data Migration Between MongoDB and OracleData Migration Between MongoDB and Oracle
Data Migration Between MongoDB and OracleChihYung(Raymond) Wu
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Intro to In-memory Computing and Gigaspaces
Intro to In-memory Computing and GigaspacesIntro to In-memory Computing and Gigaspaces
Intro to In-memory Computing and Gigaspacesinside-BigData.com
 
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...NoSQLmatters
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisScaleOut Software
 
Building a worldclass workplace
Building a worldclass workplaceBuilding a worldclass workplace
Building a worldclass workplaceCharles Okeibunor
 
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.George Joseph
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a surveyredpel dot com
 
Oracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийOracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийAndrey Akulov
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration processharirk1986
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-featuresNavneet Upneja
 
In Memory Computing for Agile Business Intelligence
In Memory Computing for Agile Business IntelligenceIn Memory Computing for Agile Business Intelligence
In Memory Computing for Agile Business IntelligenceMarkus Alsleben, DBA
 

Viewers also liked (20)

In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
In-Memory DataBase
In-Memory DataBaseIn-Memory DataBase
In-Memory DataBase
 
In-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataIn-Memory Database Platform for Big Data
In-Memory Database Platform for Big Data
 
Data Migration Between MongoDB and Oracle
Data Migration Between MongoDB and OracleData Migration Between MongoDB and Oracle
Data Migration Between MongoDB and Oracle
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Intro to In-memory Computing and Gigaspaces
Intro to In-memory Computing and GigaspacesIntro to In-memory Computing and Gigaspaces
Intro to In-memory Computing and Gigaspaces
 
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data Analysis
 
Building a worldclass workplace
Building a worldclass workplaceBuilding a worldclass workplace
Building a worldclass workplace
 
Ibm aix
Ibm aixIbm aix
Ibm aix
 
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
 
Dell server basics v5 0713
Dell server basics v5 0713Dell server basics v5 0713
Dell server basics v5 0713
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a survey
 
Oracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологийOracle Big Data. Обзор технологий
Oracle Big Data. Обзор технологий
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Unix Administration 1
Unix Administration 1Unix Administration 1
Unix Administration 1
 
Installing Aix
Installing AixInstalling Aix
Installing Aix
 
In Memory Computing for Agile Business Intelligence
In Memory Computing for Agile Business IntelligenceIn Memory Computing for Agile Business Intelligence
In Memory Computing for Agile Business Intelligence
 

Similar to In-Memory Computing: How, Why? and common Patterns

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureHaris456
 
Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Huy Do
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)HyoungEun Kim
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMShaoshan Liu
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file systemLalit Rastogi
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbIDERA Software
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & StrategiesTiệp Vũ
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategiesTiep Vu
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 

Similar to In-Memory Computing: How, Why? and common Patterns (20)

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)
 
4 use cases for C* to Scylla
4 use cases for C*  to Scylla4 use cases for C*  to Scylla
4 use cases for C* to Scylla
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
Operating System
Operating SystemOperating System
Operating System
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring Tempdb
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 

More from Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

More from Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

In-Memory Computing: How, Why? and common Patterns

  • 1. In-Memory Computing Srinath Perera Director, Research WSO2 Inc.
  • 2. Performance Numbers (based on Jeff Dean’s numbers ) Mem Ops / Sec If Memory access is a Second L1 cache reference 0.05 1/20th sec Main memory reference 1 1 sec Send 2K bytes over 1 Gbps network 200 3 min Read 1 MB sequentially from memory 2500 41 min Disk seek 1*10^5 27 hours Read 1 MB sequentially from disk 2*10^5 2 days Send packet CA- >Netherlands->CA 1.5*10^6 17 days Operation Speed MB/sec Hadoop Select 3 Terasort Bench mark 18 Complex Query Hadoop 0.2 CEP 60 CEP Complex 2.5 SSD 300-500 Disk 50-100
  • 3. Performance Numbers (based on Jeff Dean’s numbers ) Mem Ops / Sec If Memory access is a Second L1 cache reference 0.05 1/20th sec Main memory reference 1 1 sec Send 2K bytes over 1 Gbps network 200 3 min Read 1 MB sequentially from memory 2500 41 min Disk seek 1*10^5 27 hours Read 1 MB sequentially from disk 2*10^5 2 days Send packet CA- >Netherlands->CA 1.5*10^6 17 days Operation Speed MB/sec Hadoop Select 3 Terasort Bench mark 18 Complex Query Hadoop 0.2 CEP 60 CEP Complex 2.5 SSD 300-500 Disk 50-100
  • 4. Latency Lags Bandwidth • Observation in prof. Patterson’s Keynote at 2004 • Bandwidth improves, but not latency • Same holds now, and the gap is widening with new systems
  • 5. Handling Speed Differences in Memory Hierarchy 1. Caching – E.g. Processor caches, file cache, disk cache, permission cache 2. Replication – E.g. RAID, Content Distribution Networks (CDN), Web Cache 3. Prediction – Predict what data will be needed and prefect – Tradeoff bandwidth – E.g. disk caches, Google Earth
  • 6. Above three does not always work • Limitations – Caching works only if working set is small – Prefetching only works when access patterns are predictable – Replication is expensive and limited by receiving side machines • Lets assume you are reading and filtering 10G data (assuming 6b per record that is 17Billion records) – 3 minutes to read the data from disk – 35ms to filter 10M in my laptop => 1 minutes to process all data – Keeping data in memory can give about 30X more
  • 7. Data Access Patterns in Big Data Applications • Read from Disk, process once (Basic Analytics) – Data can be perfected, batch load is only about 100 times faster. – OK if processing time > data read time • Read from Disk, iteratively Process (Machine Learning Algos, e.g. KMean) – Need to load data from disk once and process (e.g. Spark supports this) • Interactive (OLAP) – Queries are random, data may be scattered. Once query started, can load data to memory and process • Random Access (e.g. Graph Processing) – Very hard to optimize • Realtime Access – As data comes in
  • 9. Four Myths • Myths – Too expensive 1TB RAM cluster for 20-40k (about 1$/GB) – It is not durable – Flash is fast enough – It is about In-Memory DBs • From Nikita Ivanov’s post – http://gridgaintech.wordpress.com/2013/09/18/four-myths- of-in-memory-computing/
  • 10. Let us look at each Big data access pattern and where In-Memory Computing can make a difference
  • 11. Access Pattern 1:Read from Disk, Process Once • If Tp = 35ms vs Td=1.2 sec with 60MB chunks, it will give about 30X to keep all data in Memory • However, this benefit is less if computation is more complex (e.g. Sort)
  • 12. Access Pattern 2: Read from Disk, iteratively Process • Very common pattern for machine learning algorithms (e.g. KMean) • On this case, advantages are greater – If we cannot hold data in memory fully, we need to offload – Then we need to read again – Then cost is very high to load and process and much faster with in memory computing • Spark let you load to memory fully and process
  • 13. Spark • New Programming Model built on functional programming concepts • Can be much faster for recursive usecases • Have a complete stack of products file = spark.textFile("hdfs://...”) file.flatMap( line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
  • 14. Access Pattern 3: Interactive Queries • Need to be responsive, < 10 sec • Harder to predict what data is needed • Queries tend to be simpler • Can be made faster by a RAM Cloud – SAP Hana – Volt DB • With smaller queries, disk may still be OK. Apache Drill as an Alternative
  • 15. VoltDB Story • VoltDB Team (Michael Stonebraker et al.) observed 92% of work in a DB related to Disk • By building complete in-memory database cluster they made it 20x faster!
  • 16. Distributed Cloud (e.g. Hazelcast) • Store the data portioned and replicated across many machines • Used as a cache that span multipme machines • Key value access
  • 17. Access Pattern 4: Random Accesses • E.g. Graph Traversal • This is the hardest usecase • In easy cases, there is a small working set and can be solved with a cache ( checking users against a black list), not the case with Graph most graph operations like traversal • Hard cases, In Memory Computing is only real solution • Can be as fast as 1000x or more
  • 18. Access Pattern 5: Realtime Processing • This is already In-Memory technology using tools like Complex Event Processing (e.g. WSO2 CEP) or stream processing (e.g. Apache Storm)
  • 19.
  • 20. Faster Access to Data • In-Memory databases (e.g. VoltDB, MemSQL) – Provide Same SQL interface – Can think as fast database – VoltDB has shown to about 20X faster than MySQL • Distributed Cache – Can Integrated as a Large Cache
  • 21. Load Data Set to Memory and Analyze • Used with Interactive and Random access usecases • Can be as 1000x faster for some usecases • Tools – Spark – Hazelcast – SAP Hana
  • 22. Realtime Processing • Realtime analytics tools – CEP (WSO2 CEP) – Stream Processing (e.g. Storm) • Can generate results within few milliseconds to seconds • Can process 10ks-millions of events per second • Not all algorithms can be implemented
  • 23. In Memory Computing with WSO2 Platform