SlideShare a Scribd company logo
1 of 67
Download to read offline
 
The Data Bottleneck – grid performance is coupled to how quickly it can send and receive data General solution is to scale repository by replicating data across several machines.   But data replication does not scale... …  but data partitioning does. Coherence has evolved functions for server side processing too. These include tools for providing reliable, asynchronous, distributed work that can be collocated with data. Coherence leverages these to provide the fastest, most scalable cache on the market. Low latency access is facilitated by simplifying the contract.
Client DS Node DS Node DS Node DS Node Data Source
Client DS Node DS Node DS Node DS Node Data  Source DS Node DS Node DS Node DS Node Bottleneck
 
[object Object],[object Object],Data on Disk DB DB DB DB DB DB
Data on Disk Greatly increases bandwidth available for reading data. Copy 6 Copy 1 Copy 2 Copy 5  Copy 6 Copy 3
Data is copied to different machines.  What is wrong with this architecture? Copy 6 Copy 1 Copy 2 Copy 5  Copy 6 Copy 3
Data on Disk Record 1 = foo Record 1 = foo Record 1 = bar Client Writes Data Becomes out of date
Data on Disk Record 1 = foo Record 1 = foo Record 1 = bar Client Writes Data Lock Record 1
Copy 6 Copy 1 Copy 2 Copy 5 Copy 4 Copy 3 Client Writes Data Controller All nodes must be locked before data can be written => The Distributed Locking Problem
Client Each machine is responsible for a subset of the records. Each record exists on only one machine. 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
Node Node Node Node Node Node
 
The Post-It Note is a great example of Business Evolution – a product that starts its life in one role, but evolves into something else … otherwise known as Accidental Genius. Coherence is another good example!!
Database Node Node Node Node Node Node Client Client Client Near  Cache
 
 
All data is stored as key value pairs [key1, value1] [key2, value2] [key3, value3] [key4, value4] [key5, value5] [key6, value6]
[object Object],[object Object],Database Write Asynchronous Bulk load at start up Node Node Node Node Node Node
[object Object],[object Object],[object Object],[object Object],Coherence works to a simpler contract. It is efficient only for simple data access. As such it can do this one job quickly and scalably.
What is RAC? RAC is a clustered database which runs in parallel over several machines. It supports all the features of vanilla Oracle DB but has better scalability, fault tolerance and bandwidth. Clustered therefore fault tolerant and scalable Supports ACID, SQL etc No disk access DB DB DB DB DB DB Cache Node Cache Node Cache Node Cache Node Cache Node Cache Node
Fault Tolerant Scalable Fast
Database Async write behind Near Cache kept Coherent via Proactive update/expiry of data as it changes Queries run in parallel where possible (aggregations etc) Objects held in serialised form In-memory storage of data – no disk induced latencies (unless you want them). Node Node Node Node Node Node Client Client Client Near  Cache
In Memory Cache (Fragile) DB
Data is held on at least two machines. Thus, should one fail, the backup copy will still be available. Node 6 Node 1 Node 2 Node 5 Node 4 Node 3 Node 2 Backup The more machines, the faster failover will be!
[object Object],[object Object],[object Object],Node Node Node Node Node Node Node Node Number of nodes (n) Processing / Storage / Bandwidth
[object Object],[object Object],[object Object],Key Point: Resilience is sacrificed for speed
[object Object],[object Object],[object Object]
Summary so far…
Summary so far…
Summary so far…
 
Node Node Node Node Client UDP cache.get( “foo”) Foo Well Known Hashing Algorithm
Client Connection Proxy Data Storage  Process Primary Backup Connection Proxy myCache.put( “Key”,”Value”); Data Storage  Process Primary Backup Data Storage  Process Primary Backup Data Storage  Process Primary Backup
Data Storage  Process Primary Bar Primary  Node X Backup  Node X Backup Foo Redistribution Repartioning Node X I think Node X has died I think Node X has died Death detection is vote based – there is no central management Redistribution Repartioning Data Storage  Process Primary  Node X Backup Bar Data Storage  Process Primary… Backup… Data Storage  Process Primary… Backup… Consensus
Client A Near Cache key1, val Data Invalidation Message:  value for key1 is now invalid Client B cache.put(key1, SomethingNew) cache.get(key1) Near Cache  (in-process) Node Node Node Node Node Node
[object Object],[object Object],Client Lock(Key3) Process Unlock(Key3) 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
Client cache.invoke(EP) 765, 769… 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Class Foo extends AbstractProcessor  public Object process(Entry entry) public Map processAll(Set setEntries) EntryProcessor Key is Locked Key is Unlocked Your code goes here Analogous to: Stored Procedures
[object Object],[object Object],Client cache.invoke(..) cache.invoke("Key3", new ValueChangingEntryProcessor( “ NewVal ")); 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
[object Object],Run any arbitrary piece of code on any or all of the nodes Client service.execute(some code) service.execute(new GCAgent(), null, null); Node Node Node Node Node Node
[object Object],[object Object],Client cache.put(foo) 765, 769… 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Class MapListener  Public void entryInserted(MapEvent evt) public void entryUpdated(MapEvent evt) public void entryDeleted(MapEvent evt) MapListener Key is Locked Key is Unlocked Your code goes here
[object Object],Client cache.put(765, X) 765, 769… 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Class CacheStore  public void store(Object key, Object val) public void erase(Object key) CacheStore Retry Queue Exception Thrown Should multiple changes be made to the same key they will be coalesced Database
Invocable Backing Map  Listener CacheStore Entry Processor Locks the key Takes Parameters &  Returns Values Is Guaranteed Called by client Called by Coherence Coalesces changes Responds to a cache event
[object Object],[object Object],[object Object],[object Object],[object Object],C# Object Java Object IPofSerialiser PofSerialiser Node Node Node Node Node Node
C# MyInvocable Java Invocable  Run on Server Mapping via  pof-config file Data objects are marshalled as described in previous slide Node Node Node Node Node Node
 
[object Object],Affinity attributes define mappings between data Trades Market Data
Entry Processor Price trade based on market data Trade and market data for the same ticker are collocated Trades Market Data
 
DATA DATA ,[object Object],[object Object],[object Object],[object Object],[object Object],Send code to data ,[object Object],[object Object],[object Object],[object Object],[object Object],Send data to code
Client DS Node DS Node DS Node DS Node Data Layer: Coherence Cluster Processing Layer: DataSynapse Grid Node Node Node Node Node Node
Client Processes are performed on the node where data exists Node Node Node Node Node Node
Client DS Node (Compute) DB (Data) Client Coherence Data + Compute
Feed Server Trade Cache Cache Store Domain  Processing Update Trade Entry Processor
[object Object],[object Object],[object Object],A very enticing proposition for new applications
[object Object],[object Object],[object Object],[object Object]
 
Copy 6 Copy 1 Copy 2 Copy 5 Copy 4 Copy 3 Client Writes Data Controller
Client 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
 
In Memory Cache (Fragile) DB
Trades Market Data Data affinity Reliable Processing Client 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Cache Store DB
So in conclusion…
Summary so far…

More Related Content

What's hot

Oracle GoldenGate on Docker
Oracle GoldenGate on DockerOracle GoldenGate on Docker
Oracle GoldenGate on DockerBobby Curtis
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streamsconfluent
 
Oracle Database 12c with RAC High Availability Best Practices
Oracle Database 12c with RAC High Availability Best PracticesOracle Database 12c with RAC High Availability Best Practices
Oracle Database 12c with RAC High Availability Best PracticesMarkus Michalewicz
 
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerAndrejs Karpovs
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Automating Your Clone in E-Business Suite R12.2
Automating Your Clone in E-Business Suite R12.2Automating Your Clone in E-Business Suite R12.2
Automating Your Clone in E-Business Suite R12.2Michael Brown
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Oracle Database Appliance Workshop
Oracle Database Appliance WorkshopOracle Database Appliance Workshop
Oracle Database Appliance WorkshopMarketingArrowECS_CZ
 
Replacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGateReplacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGateStewart Bryson
 
MySQL Performance Best Practices
MySQL Performance Best PracticesMySQL Performance Best Practices
MySQL Performance Best PracticesOlivier DASINI
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
 
SQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsSQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsJeannette Browning
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 

What's hot (20)

Oracle GoldenGate on Docker
Oracle GoldenGate on DockerOracle GoldenGate on Docker
Oracle GoldenGate on Docker
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 
Oracle Database 12c with RAC High Availability Best Practices
Oracle Database 12c with RAC High Availability Best PracticesOracle Database 12c with RAC High Availability Best Practices
Oracle Database 12c with RAC High Availability Best Practices
 
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Automating Your Clone in E-Business Suite R12.2
Automating Your Clone in E-Business Suite R12.2Automating Your Clone in E-Business Suite R12.2
Automating Your Clone in E-Business Suite R12.2
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Oracle Database Appliance Workshop
Oracle Database Appliance WorkshopOracle Database Appliance Workshop
Oracle Database Appliance Workshop
 
Replacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGateReplacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGate
 
MySQL Performance Best Practices
MySQL Performance Best PracticesMySQL Performance Best Practices
MySQL Performance Best Practices
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 
Databases in the Cloud
Databases in the CloudDatabases in the Cloud
Databases in the Cloud
 
SQL Server Versions & Migration Paths
SQL Server Versions & Migration PathsSQL Server Versions & Migration Paths
SQL Server Versions & Migration Paths
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Oracle Data Guard
Oracle Data GuardOracle Data Guard
Oracle Data Guard
 

Viewers also liked

Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Joelith
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridEmiliano Pecis
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceBob Rhubart
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceOracle
 
18편흥렬 alm 20110728_v1.0(발표본)01
18편흥렬 alm 20110728_v1.0(발표본)0118편흥렬 alm 20110728_v1.0(발표본)01
18편흥렬 alm 20110728_v1.0(발표본)01mosaicnet
 
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안Suji Lee
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Ben Stopford
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Ben Stopford
 
171호 내지
171호 내지171호 내지
171호 내지Jay Kim
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Ben Stopford
 
python data model 이해하기
python data model 이해하기python data model 이해하기
python data model 이해하기Yong Joon Moon
 
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴Terry Cho
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Viewers also liked (19)

Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagrid
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle Coherence
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
 
18편흥렬 alm 20110728_v1.0(발표본)01
18편흥렬 alm 20110728_v1.0(발표본)0118편흥렬 alm 20110728_v1.0(발표본)01
18편흥렬 alm 20110728_v1.0(발표본)01
 
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안
2015 SINVAS USER CONFERENCE - SW 분리발주에의한 요구사항 및 분석설계방안
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
171호 내지
171호 내지171호 내지
171호 내지
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
 
python data model 이해하기
python data model 이해하기python data model 이해하기
python data model 이해하기
 
Coherence in Writing
Coherence in WritingCoherence in Writing
Coherence in Writing
 
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to Data Grids with Oracle Coherence

Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timeAerospike, Inc.
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud RightScale
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 

Similar to Data Grids with Oracle Coherence (20)

Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
MYSQL
MYSQLMYSQL
MYSQL
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-time
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
Handout3o
Handout3oHandout3o
Handout3o
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 

More from Ben Stopford

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven MicroservicesBen Stopford
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationBen Stopford
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBen Stopford
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Ben Stopford
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBen Stopford
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsBen Stopford
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache KafkaBen Stopford
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Ben Stopford
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Ben Stopford
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyBen Stopford
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideBen Stopford
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache KafkaBen Stopford
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming WorldBen Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 

More from Ben Stopford (20)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the Divide
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
JAX London Slides
JAX London SlidesJAX London Slides
JAX London Slides
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 

Data Grids with Oracle Coherence

  • 1.  
  • 2. The Data Bottleneck – grid performance is coupled to how quickly it can send and receive data General solution is to scale repository by replicating data across several machines. But data replication does not scale... … but data partitioning does. Coherence has evolved functions for server side processing too. These include tools for providing reliable, asynchronous, distributed work that can be collocated with data. Coherence leverages these to provide the fastest, most scalable cache on the market. Low latency access is facilitated by simplifying the contract.
  • 3. Client DS Node DS Node DS Node DS Node Data Source
  • 4. Client DS Node DS Node DS Node DS Node Data Source DS Node DS Node DS Node DS Node Bottleneck
  • 5.  
  • 6.
  • 7. Data on Disk Greatly increases bandwidth available for reading data. Copy 6 Copy 1 Copy 2 Copy 5 Copy 6 Copy 3
  • 8. Data is copied to different machines. What is wrong with this architecture? Copy 6 Copy 1 Copy 2 Copy 5 Copy 6 Copy 3
  • 9. Data on Disk Record 1 = foo Record 1 = foo Record 1 = bar Client Writes Data Becomes out of date
  • 10. Data on Disk Record 1 = foo Record 1 = foo Record 1 = bar Client Writes Data Lock Record 1
  • 11. Copy 6 Copy 1 Copy 2 Copy 5 Copy 4 Copy 3 Client Writes Data Controller All nodes must be locked before data can be written => The Distributed Locking Problem
  • 12. Client Each machine is responsible for a subset of the records. Each record exists on only one machine. 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
  • 13.
  • 14.
  • 15. Node Node Node Node Node Node
  • 16.  
  • 17. The Post-It Note is a great example of Business Evolution – a product that starts its life in one role, but evolves into something else … otherwise known as Accidental Genius. Coherence is another good example!!
  • 18. Database Node Node Node Node Node Node Client Client Client Near Cache
  • 19.  
  • 20.  
  • 21. All data is stored as key value pairs [key1, value1] [key2, value2] [key3, value3] [key4, value4] [key5, value5] [key6, value6]
  • 22.
  • 23.
  • 24. What is RAC? RAC is a clustered database which runs in parallel over several machines. It supports all the features of vanilla Oracle DB but has better scalability, fault tolerance and bandwidth. Clustered therefore fault tolerant and scalable Supports ACID, SQL etc No disk access DB DB DB DB DB DB Cache Node Cache Node Cache Node Cache Node Cache Node Cache Node
  • 26. Database Async write behind Near Cache kept Coherent via Proactive update/expiry of data as it changes Queries run in parallel where possible (aggregations etc) Objects held in serialised form In-memory storage of data – no disk induced latencies (unless you want them). Node Node Node Node Node Node Client Client Client Near Cache
  • 27. In Memory Cache (Fragile) DB
  • 28. Data is held on at least two machines. Thus, should one fail, the backup copy will still be available. Node 6 Node 1 Node 2 Node 5 Node 4 Node 3 Node 2 Backup The more machines, the faster failover will be!
  • 29.
  • 30.
  • 31.
  • 35.  
  • 36. Node Node Node Node Client UDP cache.get( “foo”) Foo Well Known Hashing Algorithm
  • 37. Client Connection Proxy Data Storage Process Primary Backup Connection Proxy myCache.put( “Key”,”Value”); Data Storage Process Primary Backup Data Storage Process Primary Backup Data Storage Process Primary Backup
  • 38. Data Storage Process Primary Bar Primary Node X Backup Node X Backup Foo Redistribution Repartioning Node X I think Node X has died I think Node X has died Death detection is vote based – there is no central management Redistribution Repartioning Data Storage Process Primary Node X Backup Bar Data Storage Process Primary… Backup… Data Storage Process Primary… Backup… Consensus
  • 39. Client A Near Cache key1, val Data Invalidation Message: value for key1 is now invalid Client B cache.put(key1, SomethingNew) cache.get(key1) Near Cache (in-process) Node Node Node Node Node Node
  • 40.
  • 41. Client cache.invoke(EP) 765, 769… 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Class Foo extends AbstractProcessor public Object process(Entry entry) public Map processAll(Set setEntries) EntryProcessor Key is Locked Key is Unlocked Your code goes here Analogous to: Stored Procedures
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. Invocable Backing Map Listener CacheStore Entry Processor Locks the key Takes Parameters & Returns Values Is Guaranteed Called by client Called by Coherence Coalesces changes Responds to a cache event
  • 47.
  • 48. C# MyInvocable Java Invocable Run on Server Mapping via pof-config file Data objects are marshalled as described in previous slide Node Node Node Node Node Node
  • 49.  
  • 50.
  • 51. Entry Processor Price trade based on market data Trade and market data for the same ticker are collocated Trades Market Data
  • 52.  
  • 53.
  • 54. Client DS Node DS Node DS Node DS Node Data Layer: Coherence Cluster Processing Layer: DataSynapse Grid Node Node Node Node Node Node
  • 55. Client Processes are performed on the node where data exists Node Node Node Node Node Node
  • 56. Client DS Node (Compute) DB (Data) Client Coherence Data + Compute
  • 57. Feed Server Trade Cache Cache Store Domain Processing Update Trade Entry Processor
  • 58.
  • 59.
  • 60.  
  • 61. Copy 6 Copy 1 Copy 2 Copy 5 Copy 4 Copy 3 Client Writes Data Controller
  • 62. Client 765, 769… 1, 2, 3… 97, 98, 99… 333, 334… 244, 245… 169, 170…
  • 63.  
  • 64. In Memory Cache (Fragile) DB
  • 65. Trades Market Data Data affinity Reliable Processing Client 1, 2, 3… 97, 98… 333, 334… 244, 245… 169, 170… Cache Store DB

Editor's Notes

  1. Welcome to the Coherence section of the Enterprise Engineering Program. The purpose of this section is to give you an understanding of what a data cache is, why one is useful for making an application both performant and scalable and how Coherence, the banks recommended data cache, works under the hood.
  2. We ’ll start off by looking at the problems arising from bottlenecking at data sources, in particular those alluded to by the previous speaker. Bottlenecks generally arise when a system accesses a data source located on a single physical machine, be it a client machine (as in the grid invocations described in previous lectures) or a database. Both these use cases do not scale as the data requirements and processing requirements increase. Having understood the bottleneck problem we ’ll see how clustering is a suitable solution as it spreads the data source across multiple machines inducing scalability. The third section looks at Coherence in detail, focussing on the various functions offered to developers. Finally we ’ll reflect on how Coherence can be used as more than just a caching technology becoming an application container that facilitates the construction of distributed, fault tolerant, inherently scalable applications easier.
  3. Here we see a common grid use case with a client invoking a task on four DataSynapse grid engines, all of which source their data from the database (most likely simultaneously). There is an obvious problem with this architecture. What is it?
  4. The database becomes a bottleneck as the number of engines in the grid increases. The bottleneck arises as database is generally located on a single physical machine. Thus there are physical constraints placed on its scalability when used in conjunction with a scalable middle tier such as a DataSynapse compute grid. These key constraints being the bandwidth and CPU of the data server.
  5. Although Coherence may have a simple interface, behind it lies a powerful technology. Unlike some simple clustered data repositories, which rely on copies of the dataset being held on each machine, Coherence spreads its data across the cluster. Thus each machine is responsible for its own portion of the data set. Thus, in the example seen here, the user requests the key “2” from the cache (note that a cache is analogous to a table in a database, it is single HashMap instance). The query for key “2” is directed to the single machine on which the data resides. In this case the node in the top left corner. A subsequent request for key “334” is routed to the machine in the bottom left corner as it is this machine which is responsible for that key.
  6. So lets introduce Coherence – A story of accidental genius. Why accidental genius? Well…
  7. … but we ’ll come back to this later…
  8. Coherence is fundamentally a data repository (or data fabric - the term data fabric is coined as the data is held in a distributed manor across a set of machines, a cluster). It is however a special kind of data repository designed with low latency, highly available, distributed systems in mind. If you require fast access to prefabricated data (that is to say data that has been pre-processed into the required form) in a distributed world, Coherence is likely to be your technology of choice. It has three important entities. The Coherence cluster itself, which is sandwiched between the client on the left and the persistent data source on the right. The client has it ’s own, in process, 2 nd level cache. The persistent data source is usually only used for data writes, it does not contribute to data retrieval (as the cluster, in the centre of the diagram, will typically be pre-populated with data, but more on that later).
  9. And of course there is the possibility that it may be more than just a clever data repository…. but we ’ll come to that later too…
  10. Having briefly introduced the technology, lets take a look at what it is and what it is not and how it relates to other data repository technologies.
  11. Key concept: Coherence is just a map. All data is stored as key value pairs.
  12. In a typical installation Coherence will be prepopulated with data so that the cluster become the primary data source rather than just a caching layer sitting above it. The read through pattern is not used in most implementations as it relies on the speed and scalability of the database tier.
  13. Coherence is not a database. It is a much lighter-weight product designed for fast data retrieval operations. Databases provide a variety of additional functionality which Coherence does not support including ACID (Atomic, Consistent, Isolated and Durable), the joining of data in different caches (or tables) and all the features of the SQL language. Coherence does however support an object based query language which is not dissimilar to SQL. However Coherence is not suited to complex data operations or long transactions. It is designed for very fast data access via lookups based on simple attributes e.g. retrieving a trade by its trade ID, writing a new trade, retrieving trades in a date range etc.
  14. Now lets compare coherence with some other prominent products in the Oracle suite (which RBS favour). Firstly lets look at the relationship between Oracle RAC (Real Application Cluster) and Coherence. RAC is a clustered database technology. Being clustering it, like Coherence, is fault tolerant and highly available - that is to say that loss of a single machine will not significantly effect the running of the application. However, unlike Coherence, RAC is durable to almost any failure as data is persisted to (potentially several different) disks. However Coherence ’s lack of disk access makes it significantly faster and thus the choice for many highly performant applications. Finally RAC supports SQL and thus can handle complex data processing.
  15. Coherence is fast, fault tolerant and scalable. Lets look at what makes it each of these things…
  16. Coherence is faster than most other data repositories for five main reasons: Firstly it stores all data solely in memory. There is no need to access disk during data retrieval. Secondly objects are always held in their serialised form (and there is a custom implementation of serialisation which outperforms the standard mechanism). Holding data in a serialised form allows Coherence to skip the serialisation step on the server meaning that data requests only have one serialisation hit, occurring when they are deserialised on the client after a response. Note that both keys and values are held in their serialised form (and in fact the hash code has to be cached as a result of this). Thirdly writes to the database are usually performed asynchronously (although this is configurable). Asynchronous persistence of data is desirable as it means Coherence does not have to wait for disk access on a potentially bottlenecked resource. As we ’ll see later it also does some clever stuff to batch writes to persistent stores to make them more efficient. The result of asynchronous database access is that writes to the Coherence cluster are fast and will stay fast as the cluster scales. The downside being that data could be lost should a critical failure occur. Fourthly Coherence includes a second level cache that sits in process on the client. This is a analogous to a typical caching layer, holding on to some defined number of objects previously requested by the client. Coherence ensures that the data in these Near Caches is kept coherent (no pun intended :) Finally queries which run across multiple data elements can be run in parallel as the data exists on different machines. This obviously significantly optimises such queries.
  17. Coherence is both fault tolerant and highly available. That is to say that the loss of a single machine will not significantly impact the operation of the cluster. The reason for this resilience is that loss of a single node will result in a seamless failover to a backup copy held elsewhere in the cluster. All operations that were running on the node when it went down will also be re-executed elsewhere. It is worth emphasising that this is one of the most powerful features of the product. It can efficiently detect node loss and deal with it. It also deals with the addition of new nodes in the same seamless manor.
  18. Coherence holds data on only one machine (two if you include the backup). Thus adding new machines to the cluster increases the storage capacity by a factor of 1/n, where n is the number of nodes. CPU and bandwidth capacity will obviously be increased too as machines are added. This allows the cluster to scale linearly through the simple addition of commodity hardware. There is no need to buy bigger an bigger boxes.
  19. So in summary: Scaling up an application inevitably implies scaling the ability to access data. Without a scalable data layer a bottleneck will inevitably occur.
  20. Coherence is a data source that will scale processing power, bandwidth and storage through the addition of commodity hardware.
  21. However the price paid is two-fold. It lacks the true durability of a database (even though it is fault-tolerant) and it lacks the ability to efficiently process complex data. But these sacrifices are made to enable very low latency, scalable data access.
  22. The Well Known Hashing Algorithm is the algorithm used to determine on which machine each hash bucket will be stored. This algorithm is distributed to all members of the cluster, hence “well known”. This has the effect that the location of all keys are known to all nodes.
  23. Now looking at writing data to the cluster, the format is similar to gets with the put travelling through a connection proxy which locates the data to be written and forwards on the write. The difference is that writes must also be written to the backup copy which will exist on a different machine. This adds two extra network hops to the transaction.
  24. Coherence includes a clever mechanism for detecting and responding to node failure. In the example given here node X suffers a critical failure due to say a network outage or machine failure. The surrounding cluster members broadcast alerts stating that they have not heard from Node X for some period of time. If several nodes raise alerts about the same machine a group decision is made to orphan the lost node from the cluster. Once Node X has been removed from the cluster the backup of its data, seen here on the node to its left, is instantly promoted to being a Primary store. This is quickly followed by the redistribution of data around the cluster to fully backup all data and to ensure there is an even distribution across the cluster. The redistribution step is throttled to ensure it does not swamp cluster communication. However this step completes more quickly on larger clusters where less data must be redistributed to each node.
  25. All client processes can configure a near cache that sits “in process”. This cache provides an in-process repository of values recently requested. Coherence takes responsibility for keeping the data in each near cache coherent. Thus in the example shown here Client A requests key1 from the cluster. This is returned and the key-value pair are stored in the client ’s in-process near cache. Next Client B writes a new value to key1 from a different process. Coherence messages all other clients that have the value near cached notifying them that the value for key1 has changed. Note that this is dynamic invalidation, the new value is not passed in the message. Should Client A make a subsequent request for key1 this will fall through to the server to retrieve the latest value. Thus Near Caching is a great way to store data which may be needed again by a client process.
  26. Locking keys directly is supported in Coherence, but it is expensive. In the example here a client locks a key, performs an action and then unlocks it again. This takes a scary 12 network hops to complete. Fortunately, there is a better way…
  27. In this example the client invokes an Entry Processor against a specific key in the cache. A serialised version of the entry processor is passed from the client to the cluster. The cluster locks the key and executes the passed Entry Processor code. The Entry Processor performs the set of actions defined in the process() method. The cluster unlocks the key. Thus an arbitrary piece of code is run against a key on the server.
  28. Here we see an example of an entry processor, the ValueChangingEntryProcessor which updates the value associated with a certain key. Note that in contrast to the locking example described on a previous slide, this execution involves only 4 rather than 12 network hops.
  29. Invocables are the second of the four primary constructs and are analogous to a DataSynapse grid task in that they allow an arbitrary piece of code to be run on the server. Invocables are similar to Entry Processors except that they are not associated with any particular key. As such they can be defined to run against a single machine or across the whole cluster. In the example here an Invocable is used to invoke a garbage collection on all nodes on the cluster. Other good examples of the use of Invocables are the bulk loading of data, with Invocables being used to parallelise the execution across the available machines.
  30. Backing Map Listeners are the third of the four primary constructs and are analogous to triggers in a database. In the example here the client writes a tuple to the cache and in response to this event a Backing Map Listener fires, executing some user defined code. The code is executed synchronously, that is to say that the key is locked for the duration of the execution.
  31. The last of the four primary constructs is the CacheStore. CacheStores are usually used to persist data to a database and contain built in retry logic should an exception be thrown during their execution. Looking at the example here: The client writes a tuple to the cache. This event causes a CacheStore to fire in an attempt to persist the tuple. Note that this may be executed synchronously or asynchronously. In this case the user defined code in the CacheStore throws an throws an exception. The CacheStore catches the exception and adds the store event to a retry queue. A defined period of time later the cache store is called again. This time the execution succeeds and the tuple is written to the database. The retry queue is fault tolerant. So long as the cluster is up it will continue to retry store events until they succeed. Should multiple values be received for the same key during the write delay of an asynchronous CacheStore the values will be coalesced, that is to say that only the most recent tuple will be persisted. This coalescing also applies to the retry queue.
  32. Thus, to summarise the four primary constructs: Both Entry Processors and Invocables are called from the client but run on the server. They both except parameters during construction and can return values after their execution. Backing Map Listeners and CacheStores both run on the cluster in response to cache events. Backing Map listeners, like Entry Processors, lock on the key for which they are executing. Synchronous cache stores also lock but their use in asynchronous mode tends to be more common. Cache stores are guaranteed, in that they will retry should execution fail and this retry logic is fault tolerant (it will retry on a different machine should the one it is running on fail). They also coalesce changes.
  33. Coherence currently supports Java and .NET as client platforms with C++ being added later this year. The communication between languages is done by converting objects into a language neutral binary format known as POF (Portable Object Format). In the example the C# client defines the POF serialisation routine which is executed by the IPofSerialiser (written in C#) to create a POF object which is stored in the cluster. When a Java client requests the same object it is inflated with the PofSerialiser (written in Java) to create a comparable Java object.
  34. The previous slide covered the marshalling of data from one language to another. However non-Java clients also need to execute code on the cluster and, as the cluster is written in Java, any executions run there must also be in Java. To solve this problem server side code, such as the Invocable shown here, is mapped from a C# implementation on the client to a Java implementation on the server. Thus calling MyInvovable in C# will result in the Java version of MyInvocable being run on the server with the objects it uses being marshalled from one language to another via POF (as described in the previous slide).
  35. Data affinity allows associations to be set up between data in different caches so that the associated data objects in the two different caches are collocated on the same machine. In the example here trade data and market data are linked via the ticker meaning that all trades for ticker ATT will be stored on the same machine as the ATT market data.
  36. Thus when an entry processor executes, say to run a trade pricing routine, it can access the trade and its market data without having to make a wire call as the market data for that particular trade will be held on the same machine (whenever possible).
  37. In a standard architecture (the upper example) data is retrieved from a data source and sent to the application tier for processing. However in the Coherence Application-Centric approach (the lower example) the code is sent to the machine that holds the data for execution. This is one of the real penny-dropping concepts that can revolutionise a systems performance. But it is important to note that Coherence is not a direct substitute for a compute grid such as DataSynapse. Application-Centric Coherence involves leveraging in the inherent distribution Coherence provides as well as its inherent collocation of processing and data.
  38. The classic Service-Centric approach to using Coherence is described in this slide. A set of DataSynapse grid nodes source their data from a Coherence data cluster. But as we have seen, Coherence allows you to run arbitrary routines in a distributed manor across the cluster, running these routines on the nodes on which the data lives.
  39. This presents the possibility of folding the compute grid and data cluster tiers so that parallel execution occurs solely across the data cluster. This has a fundamental advantage that far less data needs to be transmitted across the wire.
  40. Thus looking at the anatomy of a simple Application-Centric deployment we see: A feed server enters a trade into the Trade cache using an Entry Processor to execute some pre-processing. This in turn fires a CacheStore which reliably executes some domain processing for that trade on the same machine. The domain processing results in the trade being updated in the cache. One of the key benefits of this architecture is the inherent distribution of processing. As various feeds come in for various trades, the domain processing for each one is executed on the machine on which that trade data is held. This means that not only is the execution collocated with the data but the executions are implicitly load balanced across the Coherence cluster.
  41. So falling back to my opening comment about post-it notes and Business Evolution, Coherence has evolved from being a data repository to an application container which provides: Free distribution of processing across multiple machines Free fault tolerance Free scalability to potentially thousands of machines This is an enticing proposition for any new application.
  42. Although Coherence may have a simple interface, behind it lies a powerful technology. Unlike some simple clustered data repositories, which rely on copies of the dataset being held on each machine, Coherence spreads its data across the cluster. Thus each machine is responsible for its own portion of the data set. Thus, in the example seen here, the user requests the key “2” from the cache (note that a cache is analogous to a table in a database, it is single HashMap instance). The query for key “2” is directed to the single machine on which the data resides. In this case the node in the top left corner. A subsequent request for key “334” is routed to the machine in the bottom left corner as it is this machine which is responsible for that key.
  43. So in conclusion, we have seen that Coherence provides a variety of additional functionality to facilitate the highly available execution of domain processing across a cluster of machines. Thus a simple data caching layer has become a framework for distributed, fault tolerant processing.