SUPPORTING QUERYING ON MULTI-MILLIONEVENTS PER SECOND                  SPEAKER: Damian Black                           CEO...
Querying at Multi-                             million Events per                                   second                ...
Real-time Big Data through Relational        Streaming     So	  what	  is	  a	  	  Streaming	  Big	  Data	  	  Pla,orm	  ?...
Comparison of Techniques for Scaling                                                           Data	                      ...
Parallel Processing done Hadoop                           Finite	  tuple	  sets	  are	  mapped	  into	  finite	       Histo...
Parallel Processing with Relational                           Finite	  tuple	  sets	  are	  mapped	  into	  finite	        ...
SQL: The only declarative dataflow language        standard        » The key to massive scale parallelism is Dataflow     ...
Tuple Processing: Hadoop versus Relational        Streaming                 	  Hadoop	  style:	  data	  chunking	  coarse-...
Application Example: MMO Multiplayer        Scoring        » Many MMO servers streaming game action in real-              ...
Streaming SQL: MMO Multiplayer        Scoring    CREATE OR REPLACE PUMP "SONG_SCORE_PUMP" STOPPED AS INSERT INTO "S_SONG_S...
Relational Streaming / Hadoop Synergy     » Relational Stream Processors (RSPs)     » Co-located with Hadoop Servers to st...
Use Cases for S3 Data (Sensor x System        x Service)        » Sensor Data:                  » Vehicle, GPS and transpo...
Relational Streaming – A New Data        Management Quadrant                                                        High-l...
Conclusions: Relational Streaming – the next “Big Data”        frontier?                     Any	  view	  of	  any	  data,...
Thanks! Any questions?Monday, July 30, 2012
Monday, July 30, 2012
Upcoming SlideShare
Loading in …5
×

SUPPORTING QUERYING ON MULTI-MILLION EVENTS PER SECOND from Structure:Data 2012

1,360 views

Published on

Presentation from Damian Black, SQLstream
#dataconf
More at http://event.gigaom.com/structuredata/

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,360
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

SUPPORTING QUERYING ON MULTI-MILLION EVENTS PER SECOND from Structure:Data 2012

  1. SUPPORTING QUERYING ON MULTI-MILLIONEVENTS PER SECOND SPEAKER: Damian Black CEO SQLstreamMonday, July 30, 2012
  2. Querying at Multi- million Events per second Real-time Big Data through Relational Streaming Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  3. Real-time Big Data through Relational Streaming So  what  is  a    Streaming  Big  Data    Pla,orm  ?     üStream  any  data  in,  immediately  stream  out  real-­‐2me  answers. üCon2nuously  analyze  and  process  massive  data  volumes. üReact  in  real-­‐,me  to  each  and  every  new  record. And  what  is    Rela/onal  Streaming  ?     üA  paradigm  for  processing  Streaming  Big  Data  tuples. üFamiliar  rela2onal  expressions  with  automa2c  op2miza2on. üRela/onal  queries  executed  con/nuously  on  a  massively  parallel scale.   Copyright  ©  2012    SQLstream  Inc. Confidenal  and  Trade  Secret  SQLstream  Inc.  ©  2012Monday, July 30, 2012
  4. Comparison of Techniques for Scaling Data   Hadoop  and   Rela2onal Warehouses HDFS Streaming How § Appear  as   § Appear  as   § Appear  as   are   a  single a  single a  single the  tuples held? Fat  Table Fat  File Fat  Stream How § New  tuples   § New  tuples   § New  tuples   do  we overwrite  old from  old from  old change § Old  tuples  are   § Old  tuples  le0   § Old  tuples  le0   the  data? updated alone alone What § 10s+ § 1000s+ § 1000s+ kind  of  cluster   § Shared  state   § No  shared   § No  shared   scale? propaga2on state state Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  5. Parallel Processing done Hadoop Finite  tuple  sets  are  mapped  into  finite   Historical  queries tuple  sets. Need  to  break  data  into  independent   Independent  chunks chunks. Procedural,  phased Procedural,  step-­‐wise  process  used. For  example,  great  for  sor/ng  many  years’   gaming  scores  under  different  keys. Copyright  ©  2012    SQLstream  Inc. Confidenal  and  Trade  Secret  SQLstream  Inc.  ©  2012Monday, July 30, 2012
  6. Parallel Processing with Relational Finite  tuple  sets  are  mapped  into  finite   tuple  sets. Historical  queries Infinite  tuple  streams  mapped  to  infinite   Continuous  queries tuple  streams. Need  to  break  data  into  independent   Independent  chunks chunks. Ordered  streams Data  are  processed  in  the  context  of   streams. Procedural,  phased Procedural,  step-­‐wise  process  used. Declarative,  parallel Declara,ve,  fine-­‐grained  parallel  processing. For  example,  great  for  giving  the  real-­‐/me  leaderboard  over  a   rolling  minute. Copyright  ©  2012    SQLstream  Inc. Confidenal  and  Trade  Secret  SQLstream  Inc.  ©  2012Monday, July 30, 2012
  7. SQL: The only declarative dataflow language standard » The key to massive scale parallelism is Dataflow Execution » Hadoop provides Dataflow Execution, but only in waves: » Each wave consists of a procedural execution phase » Tuple sets are transformed to new tuple sets » Tuple sets are chunked and shuffled over a “hash partition” scheme » Relational Streaming maximizes Dataflow Execution: Rela/onal  Streaming  allows  bamenable to intelligent “superscalar”   » SQL is a declarative language oth  pipelining  and   optimization » Tuple streams are shuffled also rocessing. parallel  p using hash partitioning Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  8. Tuple Processing: Hadoop versus Relational Streaming  Hadoop  style:  data  chunking  coarse-­‐grained  dataflow. Rela/onal  Streaming:  DAGs  of  fine-­‐grained  dataflow. Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  9. Application Example: MMO Multiplayer Scoring » Many MMO servers streaming game action in real- time. » Streaming analytics maintained over varying time windows. » Aggregated and continuously sorted: streaming stream   stream   “order stream   by”. stream   stream   stream   stream   Server stream   Server stream   Server Server Server Server Server stream   Server stream   Server Server Server Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  10. Streaming SQL: MMO Multiplayer Scoring CREATE OR REPLACE PUMP "SONG_SCORE_PUMP" STOPPED AS INSERT INTO "S_SONG_SCORE" ("songId", "SCORE") SELECT STREAM "SONG_ID" AS "songId", SUM("POINTS") OVER "LAST_WEEK" + ((SUM("POINTS") OVER "LAST_2_WEEKS” - SUM("POINTS") OVER "LAST_WEEK") * 0.5) + ((SUM("POINTS") OVER "LAST_3_WEEKS" - SUM("POINTS") OVER "LAST_2_WEEKS") * 0.25) + ((SUM("POINTS") OVER "LAST_4_WEEKS" - SUM("POINTS") OVER "LAST_3_WEEKS") * 0.125) AS "SCORE” FROM "S_SONG_SCORE_CHANGE” WINDOW "LAST_WEEK" AS (PARTITION BY "SONG_ID" RANGE INTERVAL 7 DAY PRECEDING), "LAST_2_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL 14 DAY PRECEDING), "LAST_3_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL 21 DAY PRECEDING), "LAST_4_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL 28 DAY PRECEDING); » Millions of events per second stream   stream   stream   Serverstream   stream   stream   stream   Serverstream   stream   Server Server Server stream   Server Server stream   Server Server Server Server » Real-time game scoring » Amazon EC2 Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  11. Relational Streaming / Hadoop Synergy » Relational Stream Processors (RSPs) » Co-located with Hadoop Servers to stream/re-stream local data » RSPs + Hadoop integrate Real-time and Historical processing: » Querying the future – Continuous ETL and Analytics (parallel pipelines) » Querying the past –Map Split Hadoop batch jobs on stored tuples (parallel batches) Combine Sort Reduce Hadoop & Relational Streaming » Re-streaming and Re-querying (for example, scenario & sensitivity analyses Server Select Project Join Agg Order Group Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  12. Use Cases for S3 Data (Sensor x System x Service) » Sensor Data: » Vehicle, GPS and transportation sensors » M2M sensor networks » Smart Energy sensors » System Data: » Log file processing for real-time Security, Compliance, Fraud » Cloud performance monitoring » Service Level Monitoring » Service Data: » SMS analysis, CDRs for billing, Fraud » Real-time pricing and promotion for eCommerce » Active Internet (real-time context-dependent content) Copyright  ©  2012    SQLstream  Inc.Monday, July 30, 2012
  13. Relational Streaming – A New Data Management Quadrant High-level Declarative Language & Operation Continuous Historical analysis analysis Periodic batches Real-time processing Low-level Procedural Language & Operation Copyright  ©  2012    SQLstream  Inc. Confidenal  and  Trade  Secret  SQLstream  Inc.  ©  2012Monday, July 30, 2012
  14. Conclusions: Relational Streaming – the next “Big Data” frontier? Any  view  of  any  data,  in  real-­‐/me,  and  all   Streaming  Views the  /me. Harness  real-­‐/me  data  and  react  and  adapt   Real-­‐time  Reaction in  real-­‐/me. Massively  Parallel Deliver  fine-­‐grained  parallelism  on  a   massive  scale. We  already  query  historical  data…       ….  let’s  now  query  future  data!   Copyright  ©  2012    SQLstream  Inc. Confidenal  and  Trade  Secret  SQLstream  Inc.  ©  2012Monday, July 30, 2012
  15. Thanks! Any questions?Monday, July 30, 2012
  16. Monday, July 30, 2012

×