In-memory data and compute on top of Hadoop

1,805 views

Published on

Speakers: Anthony Baker and Jags Ramnarayan
Hadoop gives us dramatic volume scalability at a cheap price. But core Hadoop is designed for sequential access - write once and read many times; making it impossible to use hadoop from a real-time/online application. Add a distributed in-memory tier in front and you could get the best of two worlds - very high speed, concurrency and the ability to scale to very large volume. We present the seamless integration of in-memory data grids with hadoop to achieve interesting new design patterns - ingesting raw or processed data into hadoop, random read-writes on operational data in memory or massive historical data in Hadoop with O(1) lookup times, zero ETL Map-reduce processing, enabling deep-scale SQL processing on data in Hadoop or the ability to easily output analytic models from hadoop into memory. We introduce and present the ideas and code samples through Pivotal in-memory real-time and the Hadoop platform.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,805
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
84
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

In-memory data and compute on top of Hadoop

  1. 1. In-memory data and compute on top of Hadoop Jags Ramnarayan – Chief Architect, Fast Data, Pivotal Anthony Baker – Architect, Fast Data, Pivotal © 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.
  2. 2. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  3. 3. “It is raining databases in the cloud” •  Next Gen transactional DB is memory based, distributed, elastic, HA, cloud ready … –  In-Memory data grids(IMDG), NoSQL, Caching •  Pivotal GemFire, Oracle coherence, Redis, Cassandra, … •  Next Gen OLAP DB is centered around Hadoop –  Driver: They say it is ‘Volume, velocity, variety’ –  Or, is it just cost/TB? (The 451Group)
  4. 4. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  5. 5. IMDG basic concepts –  Distributed memory oriented store •  KV/Objects or SQL •  Queriable, Indexable and transactional –  Multiple storage models •  Replication, partitioning in memory •  With synchronous copies in cluster •  Overflow to disk and/or RDBMS –  Parallelize Java App logic –  Multiple failure detection schemes –  Dynamic membership (elastic) –  Vendors differentiate on •  SQL support, WAN, events, etc Handle thousands of concurrent connections Replicated Region Low latency for thousands of clients Synchronous replication for slow changing data Redundant copy Partitioned Region Partition for large data or highly transactional data 5
  6. 6. Key IMDG pattern - Distributed Caching •  Designed to work with existing RDBs –  Read through: Fetch from DB on cache miss –  Write through: Reflect in cache IFF DB write succeeds –  Write behind: reliable, in-order queue and batch write to DB 6
  7. 7. Traditional RDB integration can be challenging Memory Tables (4) (1) Updates (1) (2) Queue (2) Asynchronous, Batches DB WRITER (3) Memory Tables (4) (1) DB Synchronizer Updates (1) (2) Queue (2) DB WRITER (3) Synchronous “Write through” Single point of bottleneck and failure Not an option for “Write heavy” Complex 2-phase commit protocol Parallel recovery is difficult DB Synchronizer Asynchronous “Write behind” Cannot sustain high “write” rates Queue may have to be persistent Parallel recovery is difficult
  8. 8. Some IMDG, NoSQL offer ‘Shared nothing persistence’ •  Memory Tables LOG Compressor LOG Compressor OS Buffers Record1 Record1 Record2 Record2 Record3 Record3 Memory Tables Append only Operation logs •  OS Buffers Record1 Record1 Record2 Record2 Record3 Record3 •  •  Append only operation logs Fully parallel Zero disk seeks Append only Operation logs But, cluster restart requires log scan •  Very large volumes pose challenges
  9. 9. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  10. 10. Hadoop core(HDFS) for scalable, parallel storage •  •  •  •  maturing and will be ubiquitous Handle very large data sets on commodity Handle failures well Simple Coherency model
  11. 11. Hadoop design center – batch and sequential Ÿ  64MB immutable blocks Ÿ  For random reads, you have to sequentially walk through records each time Ÿ  Write once, read many design Ÿ  Namenode can be a contention point Ÿ  Slow failure detection
  12. 12. Hadoop Strengths Ÿ  Massive volumes ( TB to PB) Ÿ  HA, compression Ÿ  Ever growing and maturing eco-system for parallel compute and analytics Ÿ  Storage systems like Isilon now offer HDFS interface Ÿ  Optimized for virtual machines
  13. 13. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  14. 14. SQL + IMDG(Objects) + HDFS Data in many shapes – support multiple data models Operational data is the focus. It is in memory (mostly) Main-memory based, distributed low latency, data store for big data All Data, History in HDFS
  15. 15. SQL + IMDG(Objects) + HDFS Replication or partitioning Storage model: In-memory, In-memory with local disk or In-memory with HDFS persistence
  16. 16. SQL + IMDG(Objects) + HDFS SQL Engine – designed for online/OLTP, Transactions IMDG caching features – readThru, writeBehind, etc
  17. 17. SQL + IMDG(Objects) + HDFS analytics without access via in-memory tier – sequential walk through or incremental processing. With parallel ingestion, you get near real time visibility to data for deep analytics. Tight HDFS integration – streaming, RW cases
  18. 18. SQL + IMDG(Objects) + HDFS closed loop between real-time and the analytics MR ‘reduce’ can directly emit results to in-memory tier
  19. 19. GemFire XD – a Pivotal HD Service Working set in memory, geo replicated SQL SQLFire Objects, JSON GemFire History, time series in HDFS SQL engine – cost based optimizer, inmemory indexing, DTxn, RDB integration.. + Clustering, in-memory storage, HA, replication, WAN, Events, Distributed queue… Pivotal HD Integrated Install, config; command center – monitoring, optimizations to Hadoop
  20. 20. The real time Latency spectrum Machine latency Human interactions Milliseconds Seconds GemFire XD, Online/OLTP/Operational DBs Interactive reports Seconds, Minutes Batch processing Minutes, Hours Analytics, Data Warehousing PivotalHD HAWQ
  21. 21. Real time on top of Hadoop – who else? Many more…. Most focused on interactive queries for analytics
  22. 22. Design patterns •  Streaming ingest – consume unbounded event streams –  Write fast into memory; stream all writes to HDFS for batch analytics •  e.g. Maintain latest price for each security in memory; time series in HDFS •  Continuously ingest click streams, audit trail or interaction data –  Trap interactions or OLTP transactions, do in-line stream processing (actionable insights) and write results or raw state into HDFS
  23. 23. Design patterns •  High performance Operational Database –  Keep operational data in-memory; history in HDFS is randomly accessible •  e.g. Last 1 month of trades in-memory but all history is accessible at some cost –  Take analytic output from Hadoop/SQL analytics and make it visible to online apps
  24. 24. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  25. 25. Agenda •  How do you use this? SQL syntax and demo
  26. 26. In-Memory Partitioning & Replication
  27. 27. Explore features using simple STAR schema FLIGHTAVAILABILITY --------------------------------------------- FLIGHTS FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER , ….. --------------------------------------------FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, ….. PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER) 1–M PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE)) 1–1 FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER) FLIGHTHISTORY --------------------------------------------FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3), ….. SEVERAL CODE/DIMENSION TABLES --------------------------------------------AIRLINES: AIRLINE INFORMATION (VERY STATIC) COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTS CITIES: MAPS: PHOTOS OF REGIONS SERVED Assume, thousands of flight rows, millions of flightavailability records 27
  28. 28. Creating tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ); Table GF XD GF XD GF XD
  29. 29. Replicated tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE; Replicated Table Replicated Table GF XD Design Pattern Replicate reference tables in STAR schemas (seldom change, often referenced in queries) Replicated Table GF XD GF XD
  30. 30. Partitioned tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID); Design Pattern Partition Fact tables in STAR schemas for load balancing (large, write heavy) Replicated Table Table Partitioned Table Replicated Table Partitioned Table GF XD Replicated Table Partitioned Table GF XD GF XD
  31. 31. Partitioned but highly available CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1; Design Pattern Increase redundant copies for HA and load balancing queries across replicas Replicated Table Partitioned Table Replicated Table Table Partitioned Table Replicated Table Partitioned Table Redundant Partition Redundant Partition Redundant Partition GF XD GF XD GF XD
  32. 32. Colocation for related data CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS); Design Pattern Colocate related tables for maximum join performance Replicated Table Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition Replicated Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition GF XD GF XD Replicated Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition GF XD
  33. 33. Native Disk resident tables (operation logging) CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT; Data dictionary is always persisted in each server sqlf backup /export/fileServerDirectory/sqlfireBackupLocation Replicated Table Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition Replicated Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition GF XD GF XD Replicated Table Partitioned Table Colocated Partition Redundant Partition Redundant Partition GF XD
  34. 34. Demo environment SQL client jdbc:sqlfire://localhost:1527 Virtual Machine GemFire XD Locator GemFire XD Server GemFire XD Server GemFire XD Server Pulse (monitoring)
  35. 35. Demo: replicated and partitioned tables
  36. 36. Agenda •  HDFS integration architecture and demo
  37. 37. Effortless HDFS integration Ÿ  Options –  Fast Streaming writes –  Random RW –  With or without time series
  38. 38. Streaming all writes to HDFS CREATE HDFSSTORE streamingstore NAMENODE hdfs://PHD1:8020 DIR /stream-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true; Replicated Table Partitioned Table Colocated Partition Redundant Partition CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE streamingstore WRITEONLY; Replicated Table Table Partitioned Table Colocated Partition Redundant Partition Replicated Table Partitioned Table Colocated Partition Redundant Partition
  39. 39. Read and Write to HDFS CREATE HDFSSTORE RWStore NAMENODE hdfs://PHD1:8020 DIR /indexed-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true; Replicated Table Partitioned Table Colocated Partition Redundant Partition CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE RWStore; Replicated Table Table Partitioned Table Colocated Partition Redundant Partition Replicated Table Partitioned Table Colocated Partition Redundant Partition
  40. 40. Write Path – streaming to HDFS GemFire XD 1 SQL client Table FLIGHTS (bucket N) 3 local store (append only) 2 5 4 DFS Client NameNode DataNode 6 7 HDFS GemFire XD Table FLIGHTS (bucket N backup) local store (append only) Directory: /GFXD/APP/FLIGHTS/BucketN In-memory partitioned data colocated with HD DN
  41. 41. Directory structure in HDFS Time-stamped records allow incremental Map/Reduce jobs /GFXD Write-only /GFXD /APP.FLIGHT_HISTORY Read/Write /APP.FLIGHTS /0 Table FLIGHT_HISTORY (bucket 0) /0 data 0-1-XXX.shop data bloom index 0-1-XXX.hop data 0-2-XXX.shop data bloom index 0-2-XXX.hop /1 Table FLIGHT_HISTORY (bucket 1) /1 data 1-1-XXX.shop data bloom index 1-1-XXX.hop data 1-2-XXX.shop data bloom index 1-2-XXX.hop
  42. 42. Read/Write with Compaction Now with sorting! GemFire XD 1 SQL client Table FLIGHTS (bucket N) 3 local store (append only) 2 4 DFS Client …and compaction 5 NameNode DataNode 6 7 HDFS GemFire XD Table FLIGHTS (bucket N backup) local store (append only) Directory: /GFXD/APP/FLIGHTS/BucketN Log structured merge tree (like HBase, Cassandra)
  43. 43. Read path for HDFS tables GemFire XD SQL client 1 Table FLIGHTS (bucket N) 4 3 2 local store (append only) DFS Client NameNode 5 DataNode Block cache 6 data index data Short circuit read path for local blocks; Block cache avoids I/O for bloom and index lookups bloom bloom index …
  44. 44. Tiered compaction •  Async writes allow lock-free sequential I/O …but more files means slower reads •  Compactions balance read/write throughput •  Minor compactions merge small files into bigger files •  Major compactions merge all files into one single file Time order Level 0 Level 1 Level 2 data bloom data index bloom data bloom index data index data data bloom bloom index index bloom … … index
  45. 45. “Closed-loop” with analytics GemFire XD Table FLIGHTS (bucket N) Map/Reduce Pivotal Hawq Hive DFS Client OutputFormat local store (append only) InputFormat HDFS Time order Level 0 Level 1 data bloom data index bloom data bloom index index data data bloom bloom index index …
  46. 46. Demo environment with PivotalHD SQL client Virtual Machine jdbc:sqlfire://localhost:1527 GemFire XD Locator GemFire XD Server GemFire XD Server GemFire XD Server PivotalHD NameNode PivotalHD DataNode Pulse (monitoring)
  47. 47. Demo: HDFS tables
  48. 48. Operational vs. Historical Data •  Operational data is retained in memory for fast access •  User-supplied criteria identifies operational data –  Enforced on incoming updates or periodically CREATE TABLE flights_history (…) PARTITION BY PRIMARY KEY EVICTION BY CRITERIA (LAST_MODIFIED_DURATION > 300000) EVICTION FREQUENCY 60 SECONDS HDFSSTORE (bar); •  Query hints or connection properties control use of historical data SELECT * FROM flights_history --PROPERTIES queryHDFS = true WHERE orig_airport = ‘PDX’ AND miles > 1000 ORDER BY dest_airport
  49. 49. Agenda •  MapReduce integration and demo
  50. 50. Hadoop Map/Reduce •  Map/Reduce is a framework for processing massive data sets in parallel –  Mapper acts on local file splits to transform individual data elements –  Reducer receives all values for a key and generates aggregate result –  Driver provides job configuration –  InputFormat and OutputFormat define data source and sink InputFormat supplies local data Mapper transforms data Node 1 Node 2 Node 3 Mapper Mapper Mapper Hadoop sorts keys Shuffle Node 1 •  Hadoop manages job execution Reducer generates aggregate result OutputFormat writes result Node 2 Node 3 Reducer Reducer Reducer
  51. 51. Map/Reduce with GemFire XD •  Users can execute Hadoop Map/Reduce jobs against GemFire XD data using –  EventInputFormat to read data from HDFS without impacting online availability or performance Hadoop EventInputFormat file split Mapper DDL –  SqlfOutputFormat to write data into SQL table for immediate use by online applications Hadoop Reducer SqlfOutputFormat GemFire XD jdbc:sqlfire://localhost:1527 PUT INTO foo (…) VALUES (?, ?, …) PUT INTO foo (…) VALUES (?, ?, …) … table foo
  52. 52. Demo: Map/Reduce
  53. 53. Using the InputFormat - Mapper //  count  each  airport  present  in  a  FLIGHT_HISTORY  row   public  class  SampleMapper  extends  MapReduceBase      implements  Mapper<Object,  Row,  Text,  IntWritable>  {        public  void  map(Object  key,  Row  row,              OutputCollector<Text,  IntWritable>  output,              Reporter  reporter)  throws  IOException  {          try  {              IntWritable  one  =  new  IntWritable(1);              ResultSet  rs  =  row.getRowAsResultSet();              String  origAirport  =  rs.getString("ORIG_AIRPORT");              String  destAirport  =  rs.getString("DEST_AIRPORT");              output.collect(new  Text(origAirport),  one);              output.collect(new  Text(destAirport),  one);          }  catch  (SQLException  e)  {              …          }      }   }   JobConf  conf  =  new  JobConf(getConf());   conf.setJobName("Busy  Airport  Count");     conf.set(EventInputFormat.HOME_DIR,  hdfsHomeDir);   conf.set(EventInputFormat.INPUT_TABLE,  tableName);     conf.setInputFormat(EventInputFormat.class);   conf.setMapperClass(SampleMapper.class);     ...    
  54. 54. Use Spring Hadoop for Job Configuration <beans:beans  …>    <job  id="busyAirportsJob"        libs="…"        input-­‐format="com.vmware.sqlfire.internal.engine.hadoop.mapreduce.EventInputFormat"        output-­‐path="${flights.intermediate.path}"        mapper="demo.sqlf.mr2.BusyAirports.SampleMapper"        combiner="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer"        reducer="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer"    />      <job  id="topBusyAirportJob"          libs="${LIB_DIR}/sqlfire-­‐mapreduce-­‐1.0-­‐SNAPSHOT.jar"          input-­‐path="${flights.intermediate.path}"          output-­‐path="${flights.output.path}"          mapper="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportMapper"          reducer="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportReducer"          number-­‐reducers="1”    />    …   </beans:beans>  
  55. 55. Using the OutputFormat - Reducer //  find  the  max,  aka  the  busiest  airport   public  class  TopBusyAirportReducer  extends  MapReduceBase      implements  Reducer<Text,  StringIntPair,  Key,  BusyAirportModel>  {        public  void  reduce(Text  token,  Iterator<StringIntPair>  values,                  OutputCollector<Key,  BusyAirportModel>  output,                    Reporter  reporter)  throws  IOException  {   JobConf  conf  =  new  JobConf(getConf());     conf.setJobName("Top  Busy  Airport");          String  topAirport  =  null;            int  max  =  0;   conf.set(SqlfOutputFormat.OUTPUT_URL,            "jdbc:sqlfire://localhost:1527");          while  (values.hasNext())  {   conf.set(SqlfOutputFormat.OUTPUT_SCHEMA,  "APP");              StringIntPair  v  =  values.next();   conf.set(SqlfOutputFormat.OUTPUT_TABLE,              if  (v.getSecond()  >  max)  {          "BUSY_AIRPORT");                  max  =  v.getSecond();                    topAirport  =  v.getFirst();   conf.setReducerClass(TopBusyAirportReducer.class);              }   conf.setOutputKeyClass(Key.class);          }   conf.setOutputValueClass(BusyAirportModel.class);          BusyAirportModel  busy  =     conf.setOutputFormat(SqlfOutputFormat.class);              new  BusyAirportModel(topAirport,  max);            output.collect(null,  busy);   ...      }     }  
  56. 56. Where do the results go? •  Automatically insert reduced values into output table by matching column names public  class  BusyAirportModel  {   PUT INTO BUSY_AIRPORT ( flights, airport) VALUES (?, ?) PUT INTO BUSY_AIRPORT (flights, airport) VALUES (?, ?) …    private  String  airport;      private  int  flights;        public  BusyAirportModel(String  airport,  int  flights)  {          this.airport  =  airport;          this.flights  =  flights;      }        public  void  setFlights(int  idx,  PreparedStatement  ps)              throws  SQLException  {          ps.setInt(idx,  flights);      }        public  void  setAirport(int  idx,  PreparedStatement  ps)              throws  SQLException  {          ps.setString(idx,  airport);      }   }  
  57. 57. Agenda •  •  •  •  •  •  In-memory data grid – concepts, strengths, weaknesses HDFS – strengths, weaknesses What is our proposal? How do you use this? SQL syntax and demo HDFS integration architecture and demo MapReduce integration and demo –  In-memory, parallel stored procedures •  Comparison to Hbase
  58. 58. Scaling Application logic with Parallel “Data Aware procedures”
  59. 59. Why not Map Reduce? Traditional Map reduce Source: UC Berkeley Spark project (just the image) parallel “data aware” procedures
  60. 60. Procedures – managed in spring containers as beans Java Stored Procedures may be created according to the SQL Standard CREATE PROCEDURE getOverBookedFlights () LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME ‘examples.OverBookedStatus.getOverBookedStatus’; SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.
  61. 61. Data Aware Procedures Parallelize procedure and prune to nodes with required data Extend the procedure call with the following syntax: Client CALL [PROCEDURE] procedure_name ( WITH RESULTexpression ]* ] ) processor_name ] [ [ expression [, PROCESSOR [ { ON TABLE table_name [ WHERE whereClause ] } { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} | Fabric Server 1 Fabric Server 2 ] CALL getOverBookedFlights( ) ON TABLE FLIGHTAVAILABILITY WHERE FLIGHT_ID = ‘AA1116’; Hint the data the procedure depends on If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with “AA1116” in this case)
  62. 62. Parallelize procedure then aggregate (reduce) CALL [PROCEDURE] register a Java Result Processor (optional in some cases): procedure_name [ WITH RESULT PROCESSOR processor_name ] [ [ ON TABLE table_name WHERE ( { expression [, expression[ ]* ] ) whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} Client ] Fabric Server 1 Fabric Server 2 Fabric Server 3
  63. 63. High density storage in Memory – Off Java Heap
  64. 64. Off-heap to minimize JVM copying, GC (MemScale) •  Off-heap memory manager for Java –  JVM memory manager not designed for volume –  Believe TB memory machines are now commodity class •  Key principles –  Avoid defrag and compaction of data blocks through reusable buffer pools –  Avoid all the nasty copying in Java heaps •  (YG – From – To – OldGen – UserToKernal copy – network copy) then repeat on the replicated node side •  Hadoop exacerbates the copying problem –  Multiple JVMs involved: TaskTracker(JVM) – Data Node (JVM) – FileSystem/Network –  Let alone all the copies and intermediate disk storage required in MR shuffling
  65. 65. Integration with SpringXD (future) •  Spring XD is a distributed, extensible framework for •  Ingestion, real time analytics, batch processing •  GemFire XD as a source and sink •  Pluggability in its Runtime (DIRT) •  GemFire XD could be an optional runtime
  66. 66. Comparison to HBase Reminder for speaker - don’t make this a product pitch J
  67. 67. Some HBase 0.9x challenges •  HBase inherently is not HA; HDFS is –  Failed segment servers can cause pauses? •  WAL writes have to synchronously go to HDFS (and its replicas) –  HDFS inherently detects failures slowly (thinks it is a overload) •  Probability for Hotspots –  Segments are sorted not stored on a random hash •  WAN replication needs a lot of work •  No Backup, recovery
  68. 68. Some HBase 0.9x challenges •  No real Querying – just key based range scans –  And, LSM on disk is suboptimal to B+Tree for querying •  You cannot execute transactions or integrate with RDBs •  Some like ColumnFamily data model; Really? –  Pros: Self describing, nested model is possible –  Cons: difficult, query engine optimization is difficult; mapping is your problem, bloat
  69. 69. Learn More. Stay Connected. Learn more: Jags – jramnarayan at gopivotal.com Anthony – abaker at gopivotal.com http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire Twitter: twitter.com/springsource YouTube: youtube.com/user/SpringSourceDev Google +: plus.google.com/+springframework
  70. 70. Extras
  71. 71. Consistency model
  72. 72. Consistency Model without Transactions •  Replication within cluster is always eager and synchronous •  Row updates are always atomic; No need to use transactions •  FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued
  73. 73. Consistency Model without Transactions •  Consistency in Partitioned tables –  a partitioned table row owned by one member at a point in time –  all updates are serialized to replicas through owner –  "Total ordering" at a row level: atomic and isolated •  Membership changes and consistency – need another hour J •  Pessimistic concurrency support using ‘Select for update’ •  Support for referential integrity
  74. 74. Distributed Transactions •  Full support for distributed transactions •  Support READ_COMITTED and REPEATABLE_READ •  Highly scalable without any centralized coordinator or lock manager •  We make some important assumptions •  Most OLTP transactions are small in duration and size •  W-W conflicts are very rare in practice
  75. 75. Distributed Transactions •  How does it work? •  Each data node has a sub-coordinator to track TX state •  Eagerly acquire local “write” locks on each replica •  Object owned by a single primary at a point in time •  Fail fast if lock cannot be obtained •  Atomic and works with the cluster Failure detection system •  Isolated until commit for READ_COMMITTED •  Only support local isolation during commit
  76. 76. GFXD Performance benchmark In-memory
  77. 77. How does it perform? Scale? •  •  •  Scale from 2 to 10 servers (one per host) Scale from 200 to 1200 simulated clients (10 hosts) Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
  78. 78. How does it perform? Scale? •  CPU% remained low per server – about 30% indicating many more clients could be handled
  79. 79. Is latency low with scale? •  •  •  Latency decreases with server capacity 50-70% take < 1 millisecond About 90% take less than 2 milliseconds

×