Your SlideShare is downloading. ×
0
Transactions chapter will be presented                                 during another session       In Memory Data Grid in...
Speaker                          @cyrilleleclerc                          blog.xebia.fr                               Cyri...
Once upon a time...                            3Wednesday, May 25, 2011
On the Financial side                                       - Released Coherence in 2001      Needs within financial market...
Let’s define an In Memory Data Grid ...                                               5Wednesday, May 25, 2011
Let’s define an In Memory Data Grid                          eXtreme Scale                            This is an In Memory...
Let’s define an In Memory Data Grid                          This is Network Attached Memory                              ...
Let’s define an In Memory Data Grid Similarities with NoSQL document oriented   Partitioned, distributed Hastable, schema...
Use cases for this presentation                                        9Wednesday, May 25, 2011
Train Booking System                          trains, stations,                          seats, booking and               ...
eCommerce Web Site      warehouse &      customers shopping carts                                                         ...
In Memory Data Grids Key Principles                                            12Wednesday, May 25, 2011
Store Everything in a Mainframe !                                                3 To of RAM                              ...
Spread on Inexpensive Servers             http://ibm.com/                               http://1userverrack.net/          ...
Partition Data                                                      Partition gamma                                       ...
Duplicate Data                                                                                 sync synchronization       ...
Data Access Patterns                             17Wednesday, May 25, 2011
Data Access Patterns This is not traditional Java EE coding style ! Can apply very complex business logic inside the    ...
Pattern : Targeted Operation                                     19Wednesday, May 25, 2011
Pattern: Targeted Operation                     {                             "train-id": "tgv-3071-20110512",            ...
Pattern : Map Reduce Style Operation                                             21Wednesday, May 25, 2011
Pattern: Map Reduce                    {                          "departure": "Paris",                          "arrival"...
Pattern: Map Reduce                              {                                  "Paris -> Marseille : 12:15",         ...
Pattern: Map Reduce                                                   Search Trains                                       ...
Data Access Patterns This is not traditional Java EE coding style                    Change management Don’t forget “Map...
CAP Theorem & In Memory Data Grids                                           26Wednesday, May 25, 2011
CAP Theorem and In Memory Data Grid                                                                    Only 2 of these 3  ...
CAP Theorem and In Memory Data Grid                                            Data Grids              Only 2 of these 3  ...
Cross Data Center Data Consistency                                        London                          New York        ...
Cross Data Center Data Consistency          {                  "name": "Barbie Computer",                  "stock": 147,  ...
Cross Data Center Data Consistency                  set stock to 146          {                  "name": "Barbie Computer"...
Cross Data Center Data Consistency                  set stock to 146          {                  "name": "Barbie Computer"...
Cross Data Center Data Consistency                  set stock to 146          {                  "name": "Barbie Computer"...
Data Modeling                          34Wednesday, May 25, 2011
Data Modeling Dominant Question Driven Design                   Opposite to Relational which is Domain Driven Design Con...
Data Modeling                                                    Seat                                                     ...
Data Modeling                                                   Partitioning ready                                        ...
Data Modeling                                          Remove unused data                                                S...
Data Modeling                                                             Seat                                            ...
Data Modeling is Hard !                                40Wednesday, May 25, 2011
Data Modeling is Hard !                            Account                                      Account                   ...
Data Modeling is Hard !                                     Account                             Account                   ...
Data Modeling is Hard !                                    Account                            Account                     ...
Data Modeling is Hard !                                                Account                                            ...
Grid Internals                          45Wednesday, May 25, 2011
Data Serialization Used for data transfer and byte oriented storage                   Must support evolvable data structu...
Data Storage Store Java Beans in the grid                   No need to unmarshall for inprocess operations               ...
Communication Protocols UDP Multi Cast (Coherence, Gigaspaces) TCP/IP (Websphere eXtreme Scale)                         ...
Topology Partitions made of shards : 1 primary + 0..*    backups) Dynamic shards location (changes at runtime and    at ...
JVM and Memory Many editors recommend tiny 1.4 Go JVM !         Garbage collector hell More than ten JVM per server     ...
APIs                          51Wednesday, May 25, 2011
Raw Java Mapping with Oracle Coherence      public class Train extends AbstractEvolvable implements PortableObject {      ...
JPA Style Mapping with Websphere eXtremeScale               @Entity(schemaRoot=true)               public class Train {   ...
Map API with Oracle Coherence        NamedCache trainCache = CacheFactory.getCache("train-cache");        /** Save */     ...
JPA Style with Websphere eXtreme Scale       /** Save */       void persist(Train train) {           entityManager.persist...
Creating Indexes               Map reduce (without index) = Distributed Table Scan !                                      ...
Indexes with Oracle Coherenceclass Train {    String name;    Collection<String> getTrainStationsCodes() {      return Col...
Indexes with Websphere eXtreme Scale    @Entity(schemaRoot=true)    class Train {        @Index        @Basic             ...
More APIs                          Another Java EE versus Spring battle ?                           JSR 347 Data Grids vs....
Data Grid <-> Relational Database Interactions                                                       60Wednesday, May 25, ...
Data Grid <-> Relational Database   Data Grids are “In Memory” -> we need to persist data on disk !                       ...
Data Grid <-> Relational Database                            update / insert / delete                          “select dir...
Data Grid <-> Relational Database                            Data Grid -> Relational Database                             ...
Data Grid <-> Relational Database                     Data Grid -> Relational Database                                 Sea...
Data Grid <-> Relational Database   DB writes MUST succeed !            Prefer raw SQL rather than reused business logic  ...
Data Grid <-> Relational Database                              Relational Database -> Data Grid                           ...
Data Grid <-> Relational Database                          Relational Database -> Data Grid                               ...
Data Grid <-> Relational Database In Memory -> prepare for reloading after    maintenance operations !                 Ne...
Transactions                          69Wednesday, May 25, 2011
We didn’t have the time to talk about                            transaction.    Another session is planned at Paris No SQ...
Let’s go live !                          71Wednesday, May 25, 2011
Data Grids and Operations Standard packaging?           Do It Yourself (layout, scripts, etc) Limited Management        ...
Data Grids and Operations Dev / Ops collaboration is required Experts only !                                        73We...
The right tool for the right job                                         74Wednesday, May 25, 2011
The right tool for the right job Incredibly fast ! Even with transactions ! Scalable              If you solve the data ...
Questions / Answers                          ?                              76Wednesday, May 25, 2011
Upcoming SlideShare
Loading in...5
×

Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)

4,372

Published on

In Memory Data Grids in Action with Oracle Coherence presented to No SQL users.
The "transactions" chapter is missing as it has been rescheduled to another session.

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
4,372
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
48
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)"

  1. 1. Transactions chapter will be presented during another session In Memory Data Grid in Action with Oracle Coherence for Paris NoSQL User Group Cyrille Le ClercWednesday, May 25, 2011
  2. 2. Speaker @cyrilleleclerc blog.xebia.fr Cyrille Le Clerc Large Scale In Memory Data Grid Open Source (Apache CXF, ...) “you build it, you run it” 2Wednesday, May 25, 2011
  3. 3. Once upon a time... 3Wednesday, May 25, 2011
  4. 4. On the Financial side - Released Coherence in 2001 Needs within financial market : - Started as a distributed cache • Very low latency • Rich queries & transactions • Scalability - Released Gigaspaces XAP in 2001 • Data consistency - Started as a data grid 4Wednesday, May 25, 2011
  5. 5. Let’s define an In Memory Data Grid ... 5Wednesday, May 25, 2011
  6. 6. Let’s define an In Memory Data Grid eXtreme Scale This is an In Memory Data Grid 6Wednesday, May 25, 2011
  7. 7. Let’s define an In Memory Data Grid This is Network Attached Memory 7Wednesday, May 25, 2011
  8. 8. Let’s define an In Memory Data Grid Similarities with NoSQL document oriented Partitioned, distributed Hastable, schema-less, value is not opaque, scale-out scalability Very fast In memory (persistence coming), business logic inside the data Consistent and Available Transactional, redundant Written in Java, data are POJOs Not necessary Clients in Java, Microsoft, etc 8Wednesday, May 25, 2011
  9. 9. Use cases for this presentation 9Wednesday, May 25, 2011
  10. 10. Train Booking System trains, stations, seats, booking and passengers 10Wednesday, May 25, 2011
  11. 11. eCommerce Web Site warehouse & customers shopping carts 231 canon-eos: 1 ipod : 1 headphone : 1 311 iphone: 1 ... 121 ipad : 1 iphone: 1 264 2 barbie : 1 { iphone: 1 "name": "Barbie Computer", cabbage-doll: 1 "stock": 637, 637 "weigth" : 200 } 12 warehouse stocks 11Wednesday, May 25, 2011
  12. 12. In Memory Data Grids Key Principles 12Wednesday, May 25, 2011
  13. 13. Store Everything in a Mainframe ! 3 To of RAM 80 x 5.2 GHtz cores Much more than $1,000,000 http://ibm.com/ IBM z11 13Wednesday, May 25, 2011
  14. 14. Spread on Inexpensive Servers http://ibm.com/ http://1userverrack.net/ Mainframe Cheap Servers ! 14Wednesday, May 25, 2011
  15. 15. Partition Data Partition gamma Small servers Partition beta MainFrame Partition alpha Partition for scalability 15Wednesday, May 25, 2011
  16. 16. Duplicate Data sync synchronization Master Partition alpha Standby Backup Duplicate data for high availability 16Wednesday, May 25, 2011
  17. 17. Data Access Patterns 17Wednesday, May 25, 2011
  18. 18. Data Access Patterns This is not traditional Java EE coding style ! Can apply very complex business logic inside the data Stored Procedures Style Change management challenge ! 18Wednesday, May 25, 2011
  19. 19. Pattern : Targeted Operation 19Wednesday, May 25, 2011
  20. 20. Pattern: Targeted Operation { "train-id": "tgv-3071-20110512", "time" : 2011/05/12 12:15, Search Trains "departure" : "Paris", "arrival" : "Marseille", "seats" : 3, Partition gamma } Search Trains Partition beta “train-id” is indexed Search Trains Partition alpha Book Train Tickets 20Wednesday, May 25, 2011
  21. 21. Pattern : Map Reduce Style Operation 21Wednesday, May 25, 2011
  22. 22. Pattern: Map Reduce { "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, Search Trains "seats" : 3, } Partition gamma Search Trains Partition beta Search Trains Partition alpha Distributed “Search Train Ticket” 22Wednesday, May 25, 2011
  23. 23. Pattern: Map Reduce { "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15" Search Trains } Partition gamma { #NONE# } Search Trains Partition beta { "Paris -> Lyon -> Marseille : 12:40" } Search Trains Partition alpha Distributed “Search Train Ticket” 23Wednesday, May 25, 2011
  24. 24. Pattern: Map Reduce Search Trains Partition gamma Search Trains { Partition beta "Paris -> Marseille : 12:15", "Paris -> Lyon -> Marseille : 12:40", "Paris -> Marseille : 13:15" } Search Trains Partition alpha Distributed “Search Train Ticket” 24Wednesday, May 25, 2011
  25. 25. Data Access Patterns This is not traditional Java EE coding style Change management Don’t forget “Map Reduce” = “Distributed Table Scan” Use Indexes 25Wednesday, May 25, 2011
  26. 26. CAP Theorem & In Memory Data Grids 26Wednesday, May 25, 2011
  27. 27. CAP Theorem and In Memory Data Grid Only 2 of these 3 properties can be Consistency achieved at any given moment in time Brewer’s Conjecture Availability Partition Tolerance http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf 27Wednesday, May 25, 2011
  28. 28. CAP Theorem and In Memory Data Grid Data Grids Only 2 of these 3 properties can be Consistency achieved at any given moment in time Brewer’s Conjecture Availability Partition Tolerance http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf 28Wednesday, May 25, 2011
  29. 29. Cross Data Center Data Consistency London New York Tokyo World wide replication for financial market 29Wednesday, May 25, 2011
  30. 30. Cross Data Center Data Consistency { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast Warehouse stocks 30Wednesday, May 25, 2011
  31. 31. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast propagation delay ! 31Wednesday, May 25, 2011
  32. 32. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast set weight 175 reconciliation API needed ! 32Wednesday, May 25, 2011
  33. 33. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast set weight 175 Network partitioning 33Wednesday, May 25, 2011
  34. 34. Data Modeling 34Wednesday, May 25, 2011
  35. 35. Data Modeling Dominant Question Driven Design Opposite to Relational which is Domain Driven Design Constrained Tree Schema Because RPC matters Denormalized Due to dominant questions and CTS 35Wednesday, May 25, 2011
  36. 36. Data Modeling Seat Booking Passenger number reduction name price Train code type TrainStation TrainStop code date name Typical relational data model 36Wednesday, May 25, 2011
  37. 37. Data Modeling Partitioning ready entities tree e ntity Root Seat Booking Passenger number reduction name price Train code type Du R pli ca efer ted en TrainStation in ce d ea TrainStop ch ata code gri date dn name od e Find the root entity and denormalize 37Wednesday, May 25, 2011
  38. 38. Data Modeling Remove unused data Seat Booking Passenger number reduction name price booked Train code type TrainStation TrainStop code date name Partitioned Replicated 38Wednesday, May 25, 2011
  39. 39. Data Modeling Seat number price booked Train code type TrainStation TrainStop code date name Partitioned Replicated Data Grid Ready data structure 39Wednesday, May 25, 2011
  40. 40. Data Modeling is Hard ! 40Wednesday, May 25, 2011
  41. 41. Data Modeling is Hard ! Account Account number number from to CashWitdrawal MoneyTransfer CashWitdrawal date id date amount date amount amount Two root entities for the same MoneyTransfer ! 41Wednesday, May 25, 2011
  42. 42. Data Modeling is Hard ! Account Account number number CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal date id id date amount date date amount amount amount Split MoneyTransfer 42Wednesday, May 25, 2011
  43. 43. Data Modeling is Hard ! Account Account number number CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal date id id date amount date date amount amount amount Split MoneyTransfer 43Wednesday, May 25, 2011
  44. 44. Data Modeling is Hard ! Account number CashWitdrawal MoneyTransferOut MoneyTransferIn date id id amount date date amount amount Data Grid Ready data structure 44Wednesday, May 25, 2011
  45. 45. Grid Internals 45Wednesday, May 25, 2011
  46. 46. Data Serialization Used for data transfer and byte oriented storage Must support evolvable data structure Hot topic like Apache Thrift, Apache Avro, Google Protocol Buffer 46Wednesday, May 25, 2011
  47. 47. Data Storage Store Java Beans in the grid No need to unmarshall for inprocess operations Beware of garbage collector ! Store byte arrays in the grid Pay unmarshalling at each read and write Low-level / byte-oriented APIs to read data Slightly more garbage collector friendly 47Wednesday, May 25, 2011
  48. 48. Communication Protocols UDP Multi Cast (Coherence, Gigaspaces) TCP/IP (Websphere eXtreme Scale) 48Wednesday, May 25, 2011
  49. 49. Topology Partitions made of shards : 1 primary + 0..* backups) Dynamic shards location (changes at runtime and at restart) Can use dedicated “directory servers” or embed it in the “data nodes” 49Wednesday, May 25, 2011
  50. 50. JVM and Memory Many editors recommend tiny 1.4 Go JVM ! Garbage collector hell More than ten JVM per server Management hell More and more IMDG support large heaps 50Wednesday, May 25, 2011
  51. 51. APIs 51Wednesday, May 25, 2011
  52. 52. Raw Java Mapping with Oracle Coherence public class Train extends AbstractEvolvable implements PortableObject { enum Type { HIGH_SPEED, NORMAL } /** Key of the Cache */ String code; /** Indexed */ Seat String name; number Type type; price booked List<Seat> seats = new ArrayList<Seat>(); Train code int version; type List<TrainStop> trainStops = new ArrayList<TrainStop>(); TrainStop @Override date public int getImplVersion() { return 1; } @Override public void readExternal(PofReader pofReader) throws IOException { this.code = pofReader.readString(0); this.name = pofReader.readString(1); this.type = (Type) pofReader.readObject(2); pofReader.readCollection(3, this.seats); pofReader.readCollection(4, this.trainStops); this.version = pofReader.readInt(5); hand-coded serialization } @Override JUnit is your friend ! public void writeExternal(PofWriter pofWriter) throws IOException { pofWriter.writeString(0, this.code); pofWriter.writeString(1, this.name); pofWriter.writeObject(2, this.type); pofWriter.writeCollection(3, this.seats, Seat.class); pofWriter.writeCollection(4, this.trainStops, TrainStop.class); pofWriter.writeInt(5, this.version); } } 52Wednesday, May 25, 2011
  53. 53. JPA Style Mapping with Websphere eXtremeScale @Entity(schemaRoot=true) public class Train { Seat number price @Id booked String code; Train code @Index type @Basic TrainStop String name; date @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version; ... } sub entities can have cross relations 53Wednesday, May 25, 2011
  54. 54. Map API with Oracle Coherence NamedCache trainCache = CacheFactory.getCache("train-cache"); /** Save */ void persist(Train train) { trainCache.put(train.getCode(), train); } /** Find by key */ Train findByCode(String code) { return (Train) trainCache.get(code); } /** Find by Query Language */ Train findByTrainName(String name) { Filter filter = QueryHelper.createFilter("name = :name" , Collections.singletonMap("name", name)); Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter); if (trainEntrySet.isEmpty()) { return null; } else { return trainEntrySet.iterator().next().getValue(); } } Map API 54Wednesday, May 25, 2011
  55. 55. JPA Style with Websphere eXtreme Scale /** Save */ void persist(Train train) { entityManager.persist(train); } /** Find by key */ Train findByCode(String code) { return (Train) entityManager.find(Train.class, code); } /** Query Language */ Train findByTrainName(String name) { Query q = entityManager.createQuery("select t from Train t where t.name=:name"); q.setParameter("name", name); return (Train) q.getSingleResult(); } JPA Style Entity Manager 55Wednesday, May 25, 2011
  56. 56. Creating Indexes Map reduce (without index) = Distributed Table Scan ! 56Wednesday, May 25, 2011
  57. 57. Indexes with Oracle Coherenceclass Train { String name; Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } ...}{ NamedCache trainCache = CacheFactory.getCache("train-cache"); trainCache.addIndex(new ReflectionExtractor("getName"), false, null); trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);} 57Wednesday, May 25, 2011
  58. 58. Indexes with Websphere eXtreme Scale @Entity(schemaRoot=true) class Train { @Index @Basic eXtreme Scale String name; @Index Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } ... } Query query = em.createQuery("select t from Train t where t.name=:name"); query.getPlan(); This is an execution plan for q2 in Train ObjectMap using INDEX on name = ( ?name) filter ( q2.c[0] = ?name ) returning new Tuple( q2 ) 58Wednesday, May 25, 2011
  59. 59. More APIs Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data Serialization / Object to Tuple Mapping API ? Unified API ontop of NoSQL stores ? 59Wednesday, May 25, 2011
  60. 60. Data Grid <-> Relational Database Interactions 60Wednesday, May 25, 2011
  61. 61. Data Grid <-> Relational Database Data Grids are “In Memory” -> we need to persist data on disk ! 61Wednesday, May 25, 2011
  62. 62. Data Grid <-> Relational Database update / insert / delete “select directly modified in DB” 62Wednesday, May 25, 2011
  63. 63. Data Grid <-> Relational Database Data Grid -> Relational Database backend DB Highly available write behind queues + SQL batched statements 63Wednesday, May 25, 2011
  64. 64. Data Grid <-> Relational Database Data Grid -> Relational Database Seat number price booked Train code type TrainStation TrainStop code date name Constrained Tree Schema <-> Relational Impedance Mismatch 64Wednesday, May 25, 2011
  65. 65. Data Grid <-> Relational Database DB writes MUST succeed ! Prefer raw SQL rather than reused business logic Denormalize the database Remove the foreign keys, use same PKs in DB and data grid Support unordered SQL statements Align the database on the Data Grid model ! 65Wednesday, May 25, 2011
  66. 66. Data Grid <-> Relational Database Relational Database -> Data Grid select * from train where last_modif > ? backend DB Data Grid Originated Scheduled Refresh (Oracle System Change Number, etc) 66Wednesday, May 25, 2011
  67. 67. Data Grid <-> Relational Database Relational Database -> Data Grid backend DB Database Originated Push JMS = durable subscription (Oracle Database Change Notification, etc) 67Wednesday, May 25, 2011
  68. 68. Data Grid <-> Relational Database In Memory -> prepare for reloading after maintenance operations ! Need for “graceful shutdown with disk persistence” Prepare consistency checkers 68Wednesday, May 25, 2011
  69. 69. Transactions 69Wednesday, May 25, 2011
  70. 70. We didn’t have the time to talk about transaction. Another session is planned at Paris No SQL User Group for this. 70Wednesday, May 25, 2011
  71. 71. Let’s go live ! 71Wednesday, May 25, 2011
  72. 72. Data Grids and Operations Standard packaging? Do It Yourself (layout, scripts, etc) Limited Management Do It Yourself (stop/start, detecting data loss, etc) Limited debugging tools Do It Yourself (debugging consoles, troubleshooting agents) JVM pandemia Dozens of JVM to manage ! 72Wednesday, May 25, 2011
  73. 73. Data Grids and Operations Dev / Ops collaboration is required Experts only ! 73Wednesday, May 25, 2011
  74. 74. The right tool for the right job 74Wednesday, May 25, 2011
  75. 75. The right tool for the right job Incredibly fast ! Even with transactions ! Scalable If you solve the data loading issue Good at data replication (when it implements it) Reconciliation api, etc Very geeky on both dev and ops side Not an enterprise grade data store Requires very skilled people + change management “Quite” expensive 75Wednesday, May 25, 2011
  76. 76. Questions / Answers ? 76Wednesday, May 25, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×