Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)

  • 4,484 views
Uploaded on

In Memory Data Grids in Action with Oracle Coherence presented to No SQL users....

In Memory Data Grids in Action with Oracle Coherence presented to No SQL users.
The "transactions" chapter is missing as it has been rescheduled to another session.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,484
On Slideshare
4,041
From Embeds
443
Number of Embeds
7

Actions

Shares
Downloads
47
Comments
1
Likes
1

Embeds 443

http://blog.xebia.fr 314
http://www.scoop.it 94
http://heeha.wordpress.com 29
http://www.slideshare.net 3
http://127.0.0.1:8795 1
http://5.freshminutes.it 1
http://webcache.googleusercontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Transactions chapter will be presented during another session In Memory Data Grid in Action with Oracle Coherence for Paris NoSQL User Group Cyrille Le ClercWednesday, May 25, 2011
  • 2. Speaker @cyrilleleclerc blog.xebia.fr Cyrille Le Clerc Large Scale In Memory Data Grid Open Source (Apache CXF, ...) “you build it, you run it” 2Wednesday, May 25, 2011
  • 3. Once upon a time... 3Wednesday, May 25, 2011
  • 4. On the Financial side - Released Coherence in 2001 Needs within financial market : - Started as a distributed cache • Very low latency • Rich queries & transactions • Scalability - Released Gigaspaces XAP in 2001 • Data consistency - Started as a data grid 4Wednesday, May 25, 2011
  • 5. Let’s define an In Memory Data Grid ... 5Wednesday, May 25, 2011
  • 6. Let’s define an In Memory Data Grid eXtreme Scale This is an In Memory Data Grid 6Wednesday, May 25, 2011
  • 7. Let’s define an In Memory Data Grid This is Network Attached Memory 7Wednesday, May 25, 2011
  • 8. Let’s define an In Memory Data Grid Similarities with NoSQL document oriented Partitioned, distributed Hastable, schema-less, value is not opaque, scale-out scalability Very fast In memory (persistence coming), business logic inside the data Consistent and Available Transactional, redundant Written in Java, data are POJOs Not necessary Clients in Java, Microsoft, etc 8Wednesday, May 25, 2011
  • 9. Use cases for this presentation 9Wednesday, May 25, 2011
  • 10. Train Booking System trains, stations, seats, booking and passengers 10Wednesday, May 25, 2011
  • 11. eCommerce Web Site warehouse & customers shopping carts 231 canon-eos: 1 ipod : 1 headphone : 1 311 iphone: 1 ... 121 ipad : 1 iphone: 1 264 2 barbie : 1 { iphone: 1 "name": "Barbie Computer", cabbage-doll: 1 "stock": 637, 637 "weigth" : 200 } 12 warehouse stocks 11Wednesday, May 25, 2011
  • 12. In Memory Data Grids Key Principles 12Wednesday, May 25, 2011
  • 13. Store Everything in a Mainframe ! 3 To of RAM 80 x 5.2 GHtz cores Much more than $1,000,000 http://ibm.com/ IBM z11 13Wednesday, May 25, 2011
  • 14. Spread on Inexpensive Servers http://ibm.com/ http://1userverrack.net/ Mainframe Cheap Servers ! 14Wednesday, May 25, 2011
  • 15. Partition Data Partition gamma Small servers Partition beta MainFrame Partition alpha Partition for scalability 15Wednesday, May 25, 2011
  • 16. Duplicate Data sync synchronization Master Partition alpha Standby Backup Duplicate data for high availability 16Wednesday, May 25, 2011
  • 17. Data Access Patterns 17Wednesday, May 25, 2011
  • 18. Data Access Patterns This is not traditional Java EE coding style ! Can apply very complex business logic inside the data Stored Procedures Style Change management challenge ! 18Wednesday, May 25, 2011
  • 19. Pattern : Targeted Operation 19Wednesday, May 25, 2011
  • 20. Pattern: Targeted Operation { "train-id": "tgv-3071-20110512", "time" : 2011/05/12 12:15, Search Trains "departure" : "Paris", "arrival" : "Marseille", "seats" : 3, Partition gamma } Search Trains Partition beta “train-id” is indexed Search Trains Partition alpha Book Train Tickets 20Wednesday, May 25, 2011
  • 21. Pattern : Map Reduce Style Operation 21Wednesday, May 25, 2011
  • 22. Pattern: Map Reduce { "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, Search Trains "seats" : 3, } Partition gamma Search Trains Partition beta Search Trains Partition alpha Distributed “Search Train Ticket” 22Wednesday, May 25, 2011
  • 23. Pattern: Map Reduce { "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15" Search Trains } Partition gamma { #NONE# } Search Trains Partition beta { "Paris -> Lyon -> Marseille : 12:40" } Search Trains Partition alpha Distributed “Search Train Ticket” 23Wednesday, May 25, 2011
  • 24. Pattern: Map Reduce Search Trains Partition gamma Search Trains { Partition beta "Paris -> Marseille : 12:15", "Paris -> Lyon -> Marseille : 12:40", "Paris -> Marseille : 13:15" } Search Trains Partition alpha Distributed “Search Train Ticket” 24Wednesday, May 25, 2011
  • 25. Data Access Patterns This is not traditional Java EE coding style Change management Don’t forget “Map Reduce” = “Distributed Table Scan” Use Indexes 25Wednesday, May 25, 2011
  • 26. CAP Theorem & In Memory Data Grids 26Wednesday, May 25, 2011
  • 27. CAP Theorem and In Memory Data Grid Only 2 of these 3 properties can be Consistency achieved at any given moment in time Brewer’s Conjecture Availability Partition Tolerance http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf 27Wednesday, May 25, 2011
  • 28. CAP Theorem and In Memory Data Grid Data Grids Only 2 of these 3 properties can be Consistency achieved at any given moment in time Brewer’s Conjecture Availability Partition Tolerance http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf 28Wednesday, May 25, 2011
  • 29. Cross Data Center Data Consistency London New York Tokyo World wide replication for financial market 29Wednesday, May 25, 2011
  • 30. Cross Data Center Data Consistency { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast Warehouse stocks 30Wednesday, May 25, 2011
  • 31. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast propagation delay ! 31Wednesday, May 25, 2011
  • 32. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast set weight 175 reconciliation API needed ! 32Wednesday, May 25, 2011
  • 33. Cross Data Center Data Consistency set stock to 146 { "name": "Barbie Computer", "stock": 147, "weigth" : 200 } { "name": "Barbie Computer", West Coast "stock": 147, "weigth" : 200 } East Coast set weight 175 Network partitioning 33Wednesday, May 25, 2011
  • 34. Data Modeling 34Wednesday, May 25, 2011
  • 35. Data Modeling Dominant Question Driven Design Opposite to Relational which is Domain Driven Design Constrained Tree Schema Because RPC matters Denormalized Due to dominant questions and CTS 35Wednesday, May 25, 2011
  • 36. Data Modeling Seat Booking Passenger number reduction name price Train code type TrainStation TrainStop code date name Typical relational data model 36Wednesday, May 25, 2011
  • 37. Data Modeling Partitioning ready entities tree e ntity Root Seat Booking Passenger number reduction name price Train code type Du R pli ca efer ted en TrainStation in ce d ea TrainStop ch ata code gri date dn name od e Find the root entity and denormalize 37Wednesday, May 25, 2011
  • 38. Data Modeling Remove unused data Seat Booking Passenger number reduction name price booked Train code type TrainStation TrainStop code date name Partitioned Replicated 38Wednesday, May 25, 2011
  • 39. Data Modeling Seat number price booked Train code type TrainStation TrainStop code date name Partitioned Replicated Data Grid Ready data structure 39Wednesday, May 25, 2011
  • 40. Data Modeling is Hard ! 40Wednesday, May 25, 2011
  • 41. Data Modeling is Hard ! Account Account number number from to CashWitdrawal MoneyTransfer CashWitdrawal date id date amount date amount amount Two root entities for the same MoneyTransfer ! 41Wednesday, May 25, 2011
  • 42. Data Modeling is Hard ! Account Account number number CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal date id id date amount date date amount amount amount Split MoneyTransfer 42Wednesday, May 25, 2011
  • 43. Data Modeling is Hard ! Account Account number number CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal date id id date amount date date amount amount amount Split MoneyTransfer 43Wednesday, May 25, 2011
  • 44. Data Modeling is Hard ! Account number CashWitdrawal MoneyTransferOut MoneyTransferIn date id id amount date date amount amount Data Grid Ready data structure 44Wednesday, May 25, 2011
  • 45. Grid Internals 45Wednesday, May 25, 2011
  • 46. Data Serialization Used for data transfer and byte oriented storage Must support evolvable data structure Hot topic like Apache Thrift, Apache Avro, Google Protocol Buffer 46Wednesday, May 25, 2011
  • 47. Data Storage Store Java Beans in the grid No need to unmarshall for inprocess operations Beware of garbage collector ! Store byte arrays in the grid Pay unmarshalling at each read and write Low-level / byte-oriented APIs to read data Slightly more garbage collector friendly 47Wednesday, May 25, 2011
  • 48. Communication Protocols UDP Multi Cast (Coherence, Gigaspaces) TCP/IP (Websphere eXtreme Scale) 48Wednesday, May 25, 2011
  • 49. Topology Partitions made of shards : 1 primary + 0..* backups) Dynamic shards location (changes at runtime and at restart) Can use dedicated “directory servers” or embed it in the “data nodes” 49Wednesday, May 25, 2011
  • 50. JVM and Memory Many editors recommend tiny 1.4 Go JVM ! Garbage collector hell More than ten JVM per server Management hell More and more IMDG support large heaps 50Wednesday, May 25, 2011
  • 51. APIs 51Wednesday, May 25, 2011
  • 52. Raw Java Mapping with Oracle Coherence public class Train extends AbstractEvolvable implements PortableObject { enum Type { HIGH_SPEED, NORMAL } /** Key of the Cache */ String code; /** Indexed */ Seat String name; number Type type; price booked List<Seat> seats = new ArrayList<Seat>(); Train code int version; type List<TrainStop> trainStops = new ArrayList<TrainStop>(); TrainStop @Override date public int getImplVersion() { return 1; } @Override public void readExternal(PofReader pofReader) throws IOException { this.code = pofReader.readString(0); this.name = pofReader.readString(1); this.type = (Type) pofReader.readObject(2); pofReader.readCollection(3, this.seats); pofReader.readCollection(4, this.trainStops); this.version = pofReader.readInt(5); hand-coded serialization } @Override JUnit is your friend ! public void writeExternal(PofWriter pofWriter) throws IOException { pofWriter.writeString(0, this.code); pofWriter.writeString(1, this.name); pofWriter.writeObject(2, this.type); pofWriter.writeCollection(3, this.seats, Seat.class); pofWriter.writeCollection(4, this.trainStops, TrainStop.class); pofWriter.writeInt(5, this.version); } } 52Wednesday, May 25, 2011
  • 53. JPA Style Mapping with Websphere eXtremeScale @Entity(schemaRoot=true) public class Train { Seat number price @Id booked String code; Train code @Index type @Basic TrainStop String name; date @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version; ... } sub entities can have cross relations 53Wednesday, May 25, 2011
  • 54. Map API with Oracle Coherence NamedCache trainCache = CacheFactory.getCache("train-cache"); /** Save */ void persist(Train train) { trainCache.put(train.getCode(), train); } /** Find by key */ Train findByCode(String code) { return (Train) trainCache.get(code); } /** Find by Query Language */ Train findByTrainName(String name) { Filter filter = QueryHelper.createFilter("name = :name" , Collections.singletonMap("name", name)); Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter); if (trainEntrySet.isEmpty()) { return null; } else { return trainEntrySet.iterator().next().getValue(); } } Map API 54Wednesday, May 25, 2011
  • 55. JPA Style with Websphere eXtreme Scale /** Save */ void persist(Train train) { entityManager.persist(train); } /** Find by key */ Train findByCode(String code) { return (Train) entityManager.find(Train.class, code); } /** Query Language */ Train findByTrainName(String name) { Query q = entityManager.createQuery("select t from Train t where t.name=:name"); q.setParameter("name", name); return (Train) q.getSingleResult(); } JPA Style Entity Manager 55Wednesday, May 25, 2011
  • 56. Creating Indexes Map reduce (without index) = Distributed Table Scan ! 56Wednesday, May 25, 2011
  • 57. Indexes with Oracle Coherenceclass Train { String name; Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } ...}{ NamedCache trainCache = CacheFactory.getCache("train-cache"); trainCache.addIndex(new ReflectionExtractor("getName"), false, null); trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);} 57Wednesday, May 25, 2011
  • 58. Indexes with Websphere eXtreme Scale @Entity(schemaRoot=true) class Train { @Index @Basic eXtreme Scale String name; @Index Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } ... } Query query = em.createQuery("select t from Train t where t.name=:name"); query.getPlan(); This is an execution plan for q2 in Train ObjectMap using INDEX on name = ( ?name) filter ( q2.c[0] = ?name ) returning new Tuple( q2 ) 58Wednesday, May 25, 2011
  • 59. More APIs Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data Serialization / Object to Tuple Mapping API ? Unified API ontop of NoSQL stores ? 59Wednesday, May 25, 2011
  • 60. Data Grid <-> Relational Database Interactions 60Wednesday, May 25, 2011
  • 61. Data Grid <-> Relational Database Data Grids are “In Memory” -> we need to persist data on disk ! 61Wednesday, May 25, 2011
  • 62. Data Grid <-> Relational Database update / insert / delete “select directly modified in DB” 62Wednesday, May 25, 2011
  • 63. Data Grid <-> Relational Database Data Grid -> Relational Database backend DB Highly available write behind queues + SQL batched statements 63Wednesday, May 25, 2011
  • 64. Data Grid <-> Relational Database Data Grid -> Relational Database Seat number price booked Train code type TrainStation TrainStop code date name Constrained Tree Schema <-> Relational Impedance Mismatch 64Wednesday, May 25, 2011
  • 65. Data Grid <-> Relational Database DB writes MUST succeed ! Prefer raw SQL rather than reused business logic Denormalize the database Remove the foreign keys, use same PKs in DB and data grid Support unordered SQL statements Align the database on the Data Grid model ! 65Wednesday, May 25, 2011
  • 66. Data Grid <-> Relational Database Relational Database -> Data Grid select * from train where last_modif > ? backend DB Data Grid Originated Scheduled Refresh (Oracle System Change Number, etc) 66Wednesday, May 25, 2011
  • 67. Data Grid <-> Relational Database Relational Database -> Data Grid backend DB Database Originated Push JMS = durable subscription (Oracle Database Change Notification, etc) 67Wednesday, May 25, 2011
  • 68. Data Grid <-> Relational Database In Memory -> prepare for reloading after maintenance operations ! Need for “graceful shutdown with disk persistence” Prepare consistency checkers 68Wednesday, May 25, 2011
  • 69. Transactions 69Wednesday, May 25, 2011
  • 70. We didn’t have the time to talk about transaction. Another session is planned at Paris No SQL User Group for this. 70Wednesday, May 25, 2011
  • 71. Let’s go live ! 71Wednesday, May 25, 2011
  • 72. Data Grids and Operations Standard packaging? Do It Yourself (layout, scripts, etc) Limited Management Do It Yourself (stop/start, detecting data loss, etc) Limited debugging tools Do It Yourself (debugging consoles, troubleshooting agents) JVM pandemia Dozens of JVM to manage ! 72Wednesday, May 25, 2011
  • 73. Data Grids and Operations Dev / Ops collaboration is required Experts only ! 73Wednesday, May 25, 2011
  • 74. The right tool for the right job 74Wednesday, May 25, 2011
  • 75. The right tool for the right job Incredibly fast ! Even with transactions ! Scalable If you solve the data loading issue Good at data replication (when it implements it) Reconciliation api, etc Very geeky on both dev and ops side Not an enterprise grade data store Requires very skilled people + change management “Quite” expensive 75Wednesday, May 25, 2011
  • 76. Questions / Answers ? 76Wednesday, May 25, 2011