SlideShare a Scribd company logo
1 of 26
© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk
© 2017 GridGain Systems, Inc.
Apache Ignite
the in-memory hammer in your data science toolkit
Denis Magda
Ignite PMC Chair
GridGain Director of Product Management
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
Agenda
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
© 2017 GridGain Systems, Inc.
Clustering and Deployment
© 2017 GridGain Systems, Inc.
Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code
© 2017 GridGain Systems, Inc.
Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN
© 2017 GridGain Systems, Inc.
Distributed Storage
© 2017 GridGain Systems, Inc.
Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
© 2017 GridGain Systems, Inc.
Partitions Distribution
Ignite Node
A
C
Ignite Node
B
A
Ignite Node
C
D
Ignite Node
D
B
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node
© 2017 GridGain Systems, Inc.
Distributed SQL
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
© 2017 GridGain Systems, Inc.
Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
© 2017 GridGain Systems, Inc.
Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
© 2017 GridGain Systems, Inc.
Affinity Collocation
Country
Languag
e
City
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)
10
Partition
key (cityId = 11, countryId = 9) 12
Partition
© 2017 GridGain Systems, Inc.
Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23
© 2017 GridGain Systems, Inc.
Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3
Montreal
Ottawa
Mumbai
New Delhi
© 2017 GridGain Systems, Inc.
Distributed Computations
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment
© 2017 GridGain Systems, Inc.
1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 1
2
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
© 2017 GridGain Systems, Inc.
Machine Learning
© 2017 GridGain Systems, Inc.
Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random Forest
Distributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL
© 2017 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

More Related Content

What's hot

Loading data into Apache Ignite
Loading data into Apache IgniteLoading data into Apache Ignite
Loading data into Apache IgniteStephen Darlington
 
Apache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database OrchestrationApache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database OrchestrationAriel Jatib
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersDenis Magda
 
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...In-Memory Computing Summit
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Stephen Darlington
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and DjangoEDB
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
In-Memory Computing Essentials
In-Memory Computing EssentialsIn-Memory Computing Essentials
In-Memory Computing EssentialsDenis Magda
 
Online Upgrade Using Logical Replication
 Online Upgrade Using Logical Replication Online Upgrade Using Logical Replication
Online Upgrade Using Logical ReplicationEDB
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with BarmanEDB
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
 
Keynote: The Postgres Ecosystem
Keynote: The Postgres EcosystemKeynote: The Postgres Ecosystem
Keynote: The Postgres EcosystemEDB
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres DayZero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres DayEDB
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Spark Summit
 

What's hot (20)

Loading data into Apache Ignite
Loading data into Apache IgniteLoading data into Apache Ignite
Loading data into Apache Ignite
 
Apache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database OrchestrationApache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database Orchestration
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
 
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and Django
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
In-Memory Computing Essentials
In-Memory Computing EssentialsIn-Memory Computing Essentials
In-Memory Computing Essentials
 
Online Upgrade Using Logical Replication
 Online Upgrade Using Logical Replication Online Upgrade Using Logical Replication
Online Upgrade Using Logical Replication
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with Barman
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
Keynote: The Postgres Ecosystem
Keynote: The Postgres EcosystemKeynote: The Postgres Ecosystem
Keynote: The Postgres Ecosystem
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres DayZero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
 

Similar to Apache Ignite In-Memory Computing Platform Overview

OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricNETWAYS
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteDani Traphagen
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianData Con LA
 
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Provectus
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™Tom Diederich
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
How we broke Apache Ignite by adding persistence
How we broke Apache Ignite by adding persistenceHow we broke Apache Ignite by adding persistence
How we broke Apache Ignite by adding persistenceStephen Darlington
 
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...Altinity Ltd
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...Amazon Web Services
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Data relay introduction to big data clusters
Data relay introduction to big data clustersData relay introduction to big data clusters
Data relay introduction to big data clustersChris Adkin
 

Similar to Apache Ignite In-Memory Computing Platform Overview (20)

OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
 
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
How we broke Apache Ignite by adding persistence
How we broke Apache Ignite by adding persistenceHow we broke Apache Ignite by adding persistence
How we broke Apache Ignite by adding persistence
 
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Data relay introduction to big data clusters
Data relay introduction to big data clustersData relay introduction to big data clusters
Data relay introduction to big data clusters
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Apache Ignite In-Memory Computing Platform Overview

  • 1. © 2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
  • 2. © 2017 GridGain Systems, Inc. Apache Ignite the in-memory hammer in your data science toolkit Denis Magda Ignite PMC Chair GridGain Director of Product Management
  • 3. © 2017 GridGain Systems, Inc. • Apache Ignite Overview • Clustering and Deployment • Distributed Storage • Distributed SQL • Distributed Computations • Machine Learning Agenda
  • 4. © 2017 GridGain Systems, Inc. Apache Ignite In-Memory Computing Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 5. © 2017 GridGain Systems, Inc. Clustering and Deployment
  • 6. © 2017 GridGain Systems, Inc. Clustering • Server Nodes • Act as containers for data and computations • Generally started as standalone processes • Client Nodes • Provide a cluster entry point to run operations • Embedded in applications code
  • 7. © 2017 GridGain Systems, Inc. Deployment • Nodes are logical entities • Runs in a JVM process • Many nodes in a single JVM process • On-Premise and Cloud • Physical server or VM • AWS, Azure, Google Compute Engine • Kubernetes, Mesos, YARN
  • 8. © 2017 GridGain Systems, Inc. Distributed Storage
  • 9. © 2017 GridGain Systems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
  • 10. © 2017 GridGain Systems, Inc. Partitions Distribution Ignite Node A C Ignite Node B A Ignite Node C D Ignite Node D B
  • 11. © 2017 GridGain Systems, Inc. Durable Memory Off-heap Removes noticeable GC pauses Automatic Defragmentation Stores Superset of Data Predictable memory consumption Fully Transactional (Write-Ahead Log) DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Server Node Server Node Server Node Ignite Cluster Instantaneous Restarts
  • 12. © 2017 GridGain Systems, Inc. Ignite Native Persistence 1. Update RAM 2. Persist Write-Ahead Log Partition File 1 3. Ack 4. Checkpointing Partition File N Server Node
  • 13. © 2017 GridGain Systems, Inc. Distributed SQL
  • 14. © 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
  • 15. © 2017 GridGain Systems, Inc. Data Definition Language • CREATE/DROP TABLE • CREATE/DROP INDEX • ALTER TABLE • Changes Durability • Ignite Native Persistence CREATE TABLE `city` ( `ID` INT(11), `Name` CHAR(35), `CountryCode` CHAR(3), `District` CHAR(20), `Population` INT(11), PRIMARY KEY (`ID`, `CountryCode`) ) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
  • 16. © 2017 GridGain Systems, Inc. Data Manipulation Language • ANSI-99 specification • Fault-tolerant and consistent • INSERT, UPDATE, DELETE • SELECT • JOINs • Subqueries SELECT country.name, city.name, MAX(city.population) as max_pop FROM country JOIN city ON city.countrycode = country.code WHERE country.code IN ('USA','RUS','CHN') GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
  • 17. © 2017 GridGain Systems, Inc. Affinity Collocation Country Languag e City Server Node ON-DISK Server Node ON-DISK key (country = 5) 10 Partition key (cityId = 10, countryId = 5) 10 Partition key (cityId = 11, countryId = 9) 12 Partition
  • 18. © 2017 GridGain Systems, Inc. Collocated Joins 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one Ignite Node Canada Toronto Ottawa Montreal Calgary Ignite Node India Mumbai New Delhi 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 23
  • 19. © 2017 GridGain Systems, Inc. Non-Collocated Joins 1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one Ignite Node Canad a Toronto Calgary 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 24 Ignite Node India Montreal Ottawa 3 Montreal Ottawa Mumbai New Delhi
  • 20. © 2017 GridGain Systems, Inc. Distributed Computations
  • 21. © 2017 GridGain Systems, Inc. Compute Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster C1 R1 C2 R2 C = C1 + C2 R = R1 + R2 C = Compute R = Result in T/2 time Automatic Failover Load Balancing Zero Deployment
  • 22. © 2017 GridGain Systems, Inc. 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 3 1 Data 1 2 2 Data 2 Client-Server Processing Co-located Processing Server Node ON-DISK Server Node ON-DISK 1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one 2 2 1Client Node Server Node ON-DISK Server Node ON-DISK Client Node 3
  • 23. © 2017 GridGain Systems, Inc. Machine Learning
  • 24. © 2017 GridGain Systems, Inc. Genetic Algorithm Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster F2, C2, M2 F = F1 + F2 C = C1 + C2 Collocated Computation Biological Evolution Simulation Chromosome and Genes Cluster M = M1 + M2 F1, C1, M1 F = Fitness Calculation C = Crossover M = Mutation
  • 25. © 2017 GridGain Systems, Inc. Machine Learning Grid K-Means Regressions Decision Trees R C++ Python Java Server Node Server NodeServer Node Distributed Core Algebra DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Scala REST Random Forest Distributed Algorithms Dense and Sparse Algebra Large Scale Parallelization Multi-Language Support Dense and Sparse Algebra No ETL
  • 26. © 2017 GridGain Systems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite #denismagda

Editor's Notes

  1. The Apache Ignite Platform Apache Ignite is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the Apache Ignite platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables Apache Ignite to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. The main difference between the memory-centric approach and the traditional disk-centric approach is that the memory is treated as a fully functional storage, not just as a caching layer, like most databases do. For example, Apache Ignite can function in a pure in-memory mode, in which case it can be treated as an In-Memory Database (IMDB) and In-Memory Data Grid (IMDG) in one. On the other hand, when persistence is turned on, Ignite begins to function as a memory-centric system where most of the processing happens in memory, but the data and indexes get persisted to disk. The main difference here from the traditional disk-centric RDBMS or NoSQL system is that Ignite is strongly consistent, horizontally scalable, and supports both SQL and key-value processing APIs. Apache Ignite platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The Apache Ignite platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
  2. Ignite Data Grid is a distributed key-value store that enables storing data both in memory and on disk within distributed clusters and provides extensive APIs. Ignite Data Grid can be viewed as a distributed partitioned hash map with every cluster node owning a portion of the overall data. This way the more cluster nodes we add, the more data we can store.
  3. Apache Ignite memory-centric platform is based on the Durable Memory architecture that allows storing and processing data and indexes both in memory and on disk when the Ignite Persistent Store feature is enabled. The memory architecture helps achieve in-memory performance with durability of disk using all the available resources of the cluster. Ignite's durable memory is built and operates in a way similar to the Virtual Memory of operating systems such as Linux. However, one significant difference between these two types of architectures is that Durable Memory always keeps the whole data set and indexes on disk if the Ignite Persistent Store is used, while Virtual Memory uses the disk for swapping purposes only. In-Memory • Off-Heap memory • Removes noticeable GC pauses • Automatic Defragmentation • Predictable memory consumption • Boosts SQL performance On Disk • Optional Persistence • Support of flash, SSD, Intel 3D Xpoint • Stores superset of data • Fully Transactional ◦ Write-Ahead-Log (WAL) • Instantaneous Cluster Restarts
  4. Ignite Native Persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's Durable Memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages. With the Ignite Persistence enabled, you no longer need to keep all the data and indexes in memory or warm it up after a node or cluster restart because the Durable Memory is tightly coupled with persistence and treats it as a secondary memory tier. This implies that if a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk.
  5. Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
  6. Ignite In-Memory Compute Grid allows executing distributed computations in a parallel fashion to gain high performance, low latency, and linear scalability. Ignite compute grid provides a set of simple APIs that allow users distribute computations and data processing across multiple computers in the cluster. The disk-centric systems, like RDBMS or NoSQL, generally utilize the classic client-server approach, where the data is brought from the server to the client side where it gets processed and then is usually discarded. This approach does not scale well as moving the data over the network is the most expensive operation in a distributed system. A much more scalable approach is collocated processing that reverses the flow by bringing the computations to the servers where the data actually resides. This approach allows you to execute advanced logic or distributed SQL with JOINs exactly where the data is stored avoiding expensive serialization and network trips.
  7. https://ignite.apache.org/collocatedprocessing.html Collocation of computations with data allow for minimizing data serialization within network and can significantly improve performance and scalability of your application. Whenever possible, you should always make best effort to colocate your computations with the cluster nodes caching the data that needs to be processed. Let's assume that a blizzard is approaching New York City. You, as a telecommunication company has to warn all the people sending a message to everyone with precise instructions on how to behave during such weather conditions. There are around 8 million New Yorkers in your database that have to receive the text message. With the client-server approach the company has to connect to the database, move all 8 million (!) records from there to a client application that will text to everyone. This is highly inefficient that wastes network and computational resources of company's IT infrastructure. However, if the company initially collocates all the cities it covers with the people who live there then it can send a single computation (!) to the cluster node that stores information about all New Yorkers and send the text message from there. This approach avoids 8 million records movement over the network and helps utilizing cluster resources for computation needs. That's the collocated processing in action!
  8. https://github.com/techbysample/gagrid GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more. Glossary Chromosome is a sequence of Genes. A Chromosome represents a potential solution. Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes. Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population. Gene is the discrete building blocks that make up the Chromosome. Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution. Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics. Population is the collection of potential solutions or Chromosomes. Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.
  9. DEMO: run several ML samples from the standard distribution. Main benefits: No ETL – online “in place” ML In-memory speed & scale Large scale parallelization Optimized ML/DL algorithms Last-mile GPU optimization The rationale for building ML Grid is quite simple. Many users employ Ignite as the central high-performance storage and processing systems for various data sets. If they wanted to perform ML or Deep Learning (DL) on these data sets (i.e training sets or model inference) they had to ETL them first into some other systems like Apache Mahout or Apache Spark. The roadmap for ML Grid is to start with core algebra implementation based on Ignite co-located distributed processing. The initial version was released with Ignite 2.0. Future releases will introduce custom DSLs for Python, R and Scala, growing collection of optimized ML algorithms such as Linear and Logistic Regression, Decision Tree/Random Forest, SVM, Naive Bayes, as well support for Ignite-optimized Neural Networks and integration with TensorFlow. Current beta version of Apache Ignite Machine Learning Grid (ML Grid) supports a distributed machine learning library built on top of highly optimized and scalable Apache Ignite platform and implements local and distributed vector and matrix algebra operations as well as distributed versions of widely used algorithms.