In-Memory Computing Essentials

•

0 likes•64 views

and databases boost application performance and solve scalability problems by storing and processing large datasets across a cluster of interconnected machines. This session is for software engineers and architects who build data-intensive applications and want practical experience with in-memory computing. You will be introduced to the fundamental capabilities of distributed, in-memory systems and will learn how to tap into your cluster’s resources and how to negate any negative impact that the network might have on the performance of your applications.

Software

In-Memory Computing Essentials
for Java Developers and Architects

Your Speaker: Denis Magda
➔ Distributed in-memory system
◆ Apache Ignite Committer and PMC Member
◆ Head of DevRel at GridGain
➔ Java engineering and architecture
◆ Java engineering at Oracle
◆ Technology evangelism at Sun Microsystems

Agenda
• Introduction
– Why In-Memory Computing?
– Apache Ignite, Brief Overview
• Essentials
– Data Partitioning
– Affinity Co-location
– Co-located Computations

In-Memory Software Scales Horizontally
Application
Cluster

Comparing System-Event Latencies
System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel Optane DC persistent memory access ~350 ns 15 min
Intel Optane DC SSD I/O <10 µs 7 hrs
NVMe SSD I/O ~25 µs 17 hrs
SSD I/O 50 – 150 µs 1.5 – 4 days
Rotational disk I/O 1 – 10 ms 1 – 9 months
Internet: SF to NY 65 ms 5 years

Memory Versus Disk Latency
System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel Optane DC persistent memory access ~350 ns 15 min
Intel Optane DC SSD I/O < 10 µs 7 hrs
NVMe SSD I/O ~25 µs 17 hrs
SSD I/O 50 – 150 µs 1.5 – 4 days
Rotational disk I/O 1 – 10 ms 1 – 9 months
Internet: SF to NY 65 ms 5 years

Apache Ignite In-Memory Computing Platform
Mainframe NoSQL HadoopDistributed Ignite Persistence
Disk Tier
RDBMS
Machine and Deep Learning
EventsStreamingMessaging
Transaction
s
SQLKey-Value
Service GridCompute Grid
Application Layer
Web SaaS SocialMobile IoT
RollingUpgrades
Security&Auditing
Monitoring&Management
SegmentationProtection
DataCenterReplication
NetworkBackups
Full,Incremental,ContinuousBackups
Point-in-TimeRecovery
HeterogeneousRecovery
Distributed In-Memory Tier

Apache Ignite as a Cache or as a Database
Ignite as a Cache and Data Grid
Ignite as a Database

Apache Ignite Is In Top 5 Apache Projects
...
8.

Data Partitioning
Using all of the cluster’s memory and CPUs

Partitioning and Horizontal Scalability
RDBMS
Partitioned Database
record#1
record#2
record#3
record#4
record#1
record#2
record#3
record#4
partitioning

The World Database Schema
Country
City CountryLanguage
https://dev.mysql.com/doc/world-setup/en/

Partitioning
Country
p0
...
p1 p2
p3 p4 p5
p1022 p1023 p1024
p0
p1024
p4
p2
p1022
p5

Record -> Partition -> Node Mapping
Country
p0
...
p1
p3 p4
p1022 p1023 p1024
p0
p1024
p4
p2
p1022
p5
p2
p5
1. primary key to partition
2.partition to node

Record -> Partition -> Node Mapping
Country
p0
...
p1
p3 p4
p1022 p1023 p1024
p0
p1024
p4
p2
p1022
p5
p2
p5
1. primary key to partition
Code = ‘USA’
2.partition to node

Affinity Co-Location
Reducing network utilization for complex requests

Default Data Distribution
Canada
Toronto
Calgary
Paris
France
Marseille
Montreal
Ottawa
Country Table City Table

Data is Shuffled During the JOIN phase
Thick Client
Canada
Toronto
Calgary
Paris
France
Marseille
Ottawa
Montreal
Paris
Ottawa
Montreal
1 & 4
2
2
3
1. Initiating Execution
2. Execution on Servers (map phase)
3. Data Shuﬄing
4. Reduce Phase

Disk Versus Network Latency Latency
System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel Optane DC persistent memory access ~350 ns 15 min
Intel Optane DC SSD I/O < 10 µs 7 hrs
NVMe SSD I/O ~25 µs 17 hrs
SSD I/O 50 – 150 µs 1.5 – 4 days
Rotational disk I/O 1 – 10 ms 1 – 9 months
Internet: SF to NY 65 ms 5 years

Co-Located Data Distribution
Canada
Toronto
Calgary
France
Marseille
Country Table City Table
Montreal
Ottawa Paris

JOINs with Co-Located Data
Thick Client
Canada
Toronto
Calgary
France
Marseille
1 & 3
2
2
1. Initiating Execution
2. Execution on Servers (map phase)
3. Reduce Phase
Ottawa
Paris

Co-Located Computations
Executing data-intensive logic on cluster nodes

Story About Millions Savings Accounts
All
savings accounts
RDBMS
Application
1. Read all accounts
3. Write changes back
2. Interest
calculation

Story About Millions Savings Accounts
Ignite Cluster
Application
1. Send compute task
2 2
2

What's hot

Introduction to AWS ServicesKlearchos Klearchou

Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Databricks

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit

BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...Databricks

How to Use Telegraf and Its Plugin EcosystemInfluxData

Novinky v Oracle Database 18cMarketingArrowECS_CZ

Elastify Cloud-Native Spark Application with Persistent MemoryDatabricks

Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Big Data Spain

Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...DataWorks Summit

Bring Your Own Container: Using Docker Images In ProductionDatabricks

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit

YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit

Building A Diverse Geo-Architecture For Cloud Native Applications In One DayVMware Tanzu

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks

Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Databricks

How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit

Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit

Best Practices for Using Alluxio with Apache Spark with Gene PangSpark Summit

Apache Ignite - Distributed Database OrchestrationAriel Jatib

What's hot (20)

Introduction to AWS Services

Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes

BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...

How to Use Telegraf and Its Plugin Ecosystem

Novinky v Oracle Database 18c

Elastify Cloud-Native Spark Application with Persistent Memory

Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...

Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...

Bring Your Own Container: Using Docker Images In Production

Extending Twitter's Data Platform to Google Cloud

Bringing Real-Time to the Enterprise with Hortonworks DataFlow

YARN Containerized Services: Fading The Lines Between On-Prem And Cloud

Building A Diverse Geo-Architecture For Cloud Native Applications In One Day

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...

Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...

How to Boost 100x Performance for Real World Application with Apache Spark-(G...

Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...

Best Practices for Using Alluxio with Apache Spark with Gene Pang

Apache Ignite - Distributed Database Orchestration

Similar to In-Memory Computing Essentials

SQL Server It Just Runs FasterBob Ward

Fallacies of Distributed Computing Arnon Rotem-Gal-Oz

Sql server 2016 it just runs faster sql bits 2017 editionBob Ward

Building a High Performance Analytics PlatformSantanu Dey

G rpc talk with intel (3)Intel

EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion

Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community

Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community

IO Dubi Lebelsqlserver.co.il

All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...Ernie Souhrada

MYSQLgilashikwa

Building an open memory-centric computing architecture using intel optaneUniFabric

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez

Hardware planning & sizing for sql serverDavide Mauri

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage

Impact of Intel Optane Technology on HPCMemVerge

Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Haidee McMahon

IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit

Application Caching: The Hidden Microservice (SAConf)Scott Mansfield

Similar to In-Memory Computing Essentials (20)

SQL Server It Just Runs Faster

Fallacies of Distributed Computing

Sql server 2016 it just runs faster sql bits 2017 edition

Building a High Performance Analytics Platform

G rpc talk with intel (3)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

Ceph Community Talk on High-Performance Solid Sate Ceph

Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance

IO Dubi Lebel

All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...

MYSQL

Building an open memory-centric computing architecture using intel optane

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

Hardware planning & sizing for sql server

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware

Impact of Intel Optane Technology on HPC

Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...

IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...

Application Caching: The Hidden Microservice (SAConf)

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...Technogeeks

EY_Graph Database Powered SustainabilityNeo4j

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

What are the key points to focus on before starting to learn ETL Development....kzayra69

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

MYjobs Presentation Django-based projectAnoyGreter

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...

EY_Graph Database Powered Sustainability

Der Spagat zwischen BIAS und FAIRNESS (2024)

What are the key points to focus on before starting to learn ETL Development....

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Implementing Zero Trust strategy with Azure

英国UN学位证,北安普顿大学毕业证书1:1制作

Folding Cheat Sheet #4 - fourth in a series

Software Project Health Check: Best Practices and Techniques for Your Product...

Recruitment Management Software Benefits (Infographic)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Cloud Management Software Platforms: OpenStack

Intelligent Home Wi-Fi Solutions | ThinkPalm

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Unveiling the Future: Sylius 2.0 New Features

Buds n Tech IT Solutions: Top-Notch Web Services in Noida

SpotFlow: Tracking Method Calls and States at Runtime

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

MYjobs Presentation Django-based project

In-Memory Computing Essentials

1. In-Memory Computing Essentials for Java Developers and Architects

2. Your Speaker: Denis Magda ➔ Distributed in-memory system ◆ Apache Ignite Committer and PMC Member ◆ Head of DevRel at GridGain ➔ Java engineering and architecture ◆ Java engineering at Oracle ◆ Technology evangelism at Sun Microsystems

3. Agenda • Introduction – Why In-Memory Computing? – Apache Ignite, Brief Overview • Essentials – Data Partitioning – Affinity Co-location – Co-located Computations

4. Why In-Memory Computing?

5. Speed and Scale

6. In-Memory Software Scales Horizontally Application Cluster

7. Comparing System-Event Latencies System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel Optane DC persistent memory access ~350 ns 15 min Intel Optane DC SSD I/O <10 µs 7 hrs NVMe SSD I/O ~25 µs 17 hrs SSD I/O 50 – 150 µs 1.5 – 4 days Rotational disk I/O 1 – 10 ms 1 – 9 months Internet: SF to NY 65 ms 5 years

8. Memory Versus Disk Latency System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel Optane DC persistent memory access ~350 ns 15 min Intel Optane DC SSD I/O < 10 µs 7 hrs NVMe SSD I/O ~25 µs 17 hrs SSD I/O 50 – 150 µs 1.5 – 4 days Rotational disk I/O 1 – 10 ms 1 – 9 months Internet: SF to NY 65 ms 5 years

9. Apache Ignite Brief Overview

10. Apache Ignite In-Memory Computing Platform Mainframe NoSQL HadoopDistributed Ignite Persistence Disk Tier RDBMS Machine and Deep Learning EventsStreamingMessaging Transaction s SQLKey-Value Service GridCompute Grid Application Layer Web SaaS SocialMobile IoT RollingUpgrades Security&Auditing Monitoring&Management SegmentationProtection DataCenterReplication NetworkBackups Full,Incremental,ContinuousBackups Point-in-TimeRecovery HeterogeneousRecovery Distributed In-Memory Tier

11. Apache Ignite as a Cache or as a Database Ignite as a Cache and Data Grid Ignite as a Database

12. Apache Ignite Is In Top 5 Apache Projects ... 8.

13. Data Partitioning Using all of the cluster’s memory and CPUs

14. Partitioning and Horizontal Scalability RDBMS Partitioned Database record#1 record#2 record#3 record#4 record#1 record#2 record#3 record#4 partitioning

15. The World Database Schema Country City CountryLanguage https://dev.mysql.com/doc/world-setup/en/

16. Partitioning Country p0 ... p1 p2 p3 p4 p5 p1022 p1023 p1024 p0 p1024 p4 p2 p1022 p5

17. Record -> Partition -> Node Mapping Country p0 ... p1 p3 p4 p1022 p1023 p1024 p0 p1024 p4 p2 p1022 p5 p2 p5 1. primary key to partition 2.partition to node

18. Record -> Partition -> Node Mapping Country p0 ... p1 p3 p4 p1022 p1023 p1024 p0 p1024 p4 p2 p1022 p5 p2 p5 1. primary key to partition Code = ‘USA’ 2.partition to node

19. Affinity Co-Location Reducing network utilization for complex requests

20. Default Data Distribution Canada Toronto Calgary Paris France Marseille Montreal Ottawa Country Table City Table

21. Data is Shuffled During the JOIN phase Thick Client Canada Toronto Calgary Paris France Marseille Ottawa Montreal Paris Ottawa Montreal 1 & 4 2 2 3 1. Initiating Execution 2. Execution on Servers (map phase) 3. Data Shuﬄing 4. Reduce Phase

22. Disk Versus Network Latency Latency System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel Optane DC persistent memory access ~350 ns 15 min Intel Optane DC SSD I/O < 10 µs 7 hrs NVMe SSD I/O ~25 µs 17 hrs SSD I/O 50 – 150 µs 1.5 – 4 days Rotational disk I/O 1 – 10 ms 1 – 9 months Internet: SF to NY 65 ms 5 years

23. Co-Located Data Distribution Canada Toronto Calgary France Marseille Country Table City Table Montreal Ottawa Paris

24. How to Group Related Data

25. JOINs with Co-Located Data Thick Client Canada Toronto Calgary France Marseille 1 & 3 2 2 1. Initiating Execution 2. Execution on Servers (map phase) 3. Reduce Phase Ottawa Paris

26. Co-Located Computations Executing data-intensive logic on cluster nodes

27. Executing Custom Logic in Cluster

28. Story About Millions Savings Accounts All savings accounts RDBMS Application 1. Read all accounts 3. Write changes back 2. Interest calculation

29. Story About Millions Savings Accounts Ignite Cluster Application 1. Send compute task 2 2 2

30. Summary In-Memory Computing Essentials

In-Memory Computing Essentials

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to In-Memory Computing Essentials

Similar to In-Memory Computing Essentials (20)

More from Denis Magda

More from Denis Magda (6)

Recently uploaded

Recently uploaded (20)

In-Memory Computing Essentials