• Save
BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012
Upcoming SlideShare
Loading in...5
×
 

BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012

on

  • 484 views

Presentation by Damian Black, SQLstream

Presentation by Damian Black, SQLstream
#structureconf
More at http://event.gigaom.com/structure/

Statistics

Views

Total Views
484
Views on SlideShare
484
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012 BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012 Presentation Transcript

  • BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE! SPEAKER: Damian Black CEO SQLstreamTuesday, November 27, 12
  • Real-time Big Data with Relational Streaming Dataflow Technology Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc.Tuesday, November 27, 12
  • Brief History of Dataflow Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 3Tuesday, November 27, 12
  • Brief History of Dataflow What  is  Dataflow?     üParallel  processing  model  invented  in  the  70s üGraphed-­‐based  execu6on,  without  destruc6ve  updates üData  flow  along  arcs  to  nodes,  are  combined,  and  flow  along  output   arcs Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 3Tuesday, November 27, 12
  • Brief History of Dataflow What  is  Dataflow?     üParallel  processing  model  invented  in  the  70s üGraphed-­‐based  execu6on,  without  destruc6ve  updates üData  flow  along  arcs  to  nodes,  are  combined,  and  flow  along  output   arcs What  happened  to  Dataflow?     üA  number  of  experimental  parallel  computers  designed  and  built üTransputer  and  Occam  were  literally  decades  ahead  of  their  6me üDue  for  a  resurgence  due  to  inexpensive  mul9-­‐core  servers  &  SQL Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 3Tuesday, November 27, 12
  • Brief History of Dataflow What  is  Dataflow?     üParallel  processing  model  invented  in  the  70s üGraphed-­‐based  execu6on,  without  destruc6ve  updates üData  flow  along  arcs  to  nodes,  are  combined,  and  flow  along  output   arcs What  happened  to  Dataflow?     üA  number  of  experimental  parallel  computers  designed  and  built üTransputer  and  Occam  were  literally  decades  ahead  of  their  6me üDue  for  a  resurgence  due  to  inexpensive  mul9-­‐core  servers  &  SQL What  is  Rela9onal  Streaming?     üA  dataflow  paradigm  for  processing  Streaming  Big  Data  tuples Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 3Tuesday, November 27, 12
  • Dataflow Graph: Pipelined and Superscalar Processing Rela9onal  Streaming:  DAGs  of  fine-­‐grained  dataflow. Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 4Tuesday, November 27, 12
  • Dataflow Graph: Pipelined and Superscalar Processing Rela9onal  Streaming:  DAGs  of  fine-­‐grained  dataflow. Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 4Tuesday, November 27, 12
  • Comparison of Techniques for Dataflow Scaling Hadoop  and  HDFS Rela6onal Streaming Data § Fat  File § Fat  Stream Distribu4on Dataflow § Generate  new  tuples   § Generate  new  tuples  from   Enablement from  old old § leaving  old  tuples   § leaving  old  tuples   unaltered unaltered Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 5Tuesday, November 27, 12
  • Dataflow: Hadoop versus Relational Streaming  Hadoop  style:  data  chunking  coarse-­‐grained  dataflow. Rela9onal  Streaming:  DAGs  of  fine-­‐grained  dataflow. Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 6Tuesday, November 27, 12
  • Parallel Dataflow Execution Collect » Hadoop Map Reduce Process Clean Aggregate Analyze Deliver Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 7Tuesday, November 27, 12
  • Parallel Dataflow Execution Collect » Hadoop Map Reduce Process Relational Streaming Approach: » Continuous Parallel Dataflow Execution Clean » Real-time Answers Immediately » Intelligently populate data store: Aggregate Hadoop or Data Warehouse Analyze Deliver Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 7Tuesday, November 27, 12
  • Parallel Dataflow Execution Collect » Relational Streaming Approach: » Continuous Parallel Dataflow Execution Clean » Real-time Answers Immediately » Intelligently populate data store: Aggregate Hadoop or Data Warehouse Analyze Deliver Low Latency Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 7Tuesday, November 27, 12
  • Relational Streaming synergies with Hadoop » Relational Stream Processors co-located with Hadoop Servers » Stream/re-stream into and from locally data stores in parallel » Combination performs Real-time and Historical processing: » Querying the future – Continuous ETL and Analytics (parallel pipelines) » Querying the past – Hadoop batch jobs on stored tuples (parallel batches) Select Select Select Project Project Project Join Join Join Agg Agg Agg Order Order Order Group Group Group SelectSelect Project Project Join Join Agg Agg Order Order Group Group Hadoop & Relational Streaming Server Select Project Join Agg Order Group Hadoop & RelationalProject Select StreamingJoin Hadoop & Relational Streaming Server Server Agg Order Group Hadoop & Relational Streaming Server Hadoop & Relational StreamingReduce Server Server Split Split Map MapMap Hadoop & Relational Streaming Combine Sort Hadoop & Relational Streaming Server Combine Sort Reduce Split Combine Sort Reduce Split Map Combine Sort Reduce Split Map Combine Sort Reduce Split Map Combine Sort Reduce Split Map Combine Sort Reduce Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 8Tuesday, November 27, 12
  • Application Example – Google: “Youtube Mozilla Glow” » Mozilla Firefox 4 – Real-time Download Monitor » Continuous processing of download requests » Real-time integration with Hadoop and HBase Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 9Tuesday, November 27, 12
  • Cloud Monitoring – Detecting Service Error Spikes SELECT STREAM ROWTIME, url, “numErrorsLastMinute” FROM ( SELECT STREAM ROWTIME, url, “numErrorsLastMinute”, AVG(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING) AS “avgErrorsPerMinute”, STDDEV(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING) AS “stdDevErrorsPerMinute” FROM “ServiceRequestsPerMinute”) AS S WHERE S.”numErrorsLastMinute” > S.”avgErrorsPerMinute” + 2 * S.”stdDevErrorsPerMinute”; » Millions of records per second » Real-time Bollinger Bands stream   stream   stream   stream   stream   stream   stream   Server Server Server Server stream   Server stream   Server Server » Amazon EC2 Server stream   Server Server stream   Server Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 10Tuesday, November 27, 12
  • A New Streaming Data Management Quadrant High-level Declarative Language & Operation Real-time Big Data Rela6onal Hadoop Data  Warehouses Streaming Big  Data Historical analysis Continuous analysis Messaging   Periodic batches Real-time processing Middleware Batched Big Data Low-level Procedural Language & Operation Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. 11Tuesday, November 27, 12
  • Benefits of Real-time “Big Dataflow” with Relational Streaming 1.  Real-­‐time  Integration Con4nuous,  real-­‐4me  data  integra4on 2.  Real-­‐time  Analysis Process,  analyze,  and  react  –  all  in  real-­‐4me 3.  RT  Parallel  Processing Made  easy,  auto-­‐op4mized,  massive  scale Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. Confiden6al  and  Trade  Secret  SQLstream  Inc.  ©  2012 12Tuesday, November 27, 12
  • Benefits of Real-time “Big Dataflow” with Relational Streaming 1.  Real-­‐time  Integration Con4nuous,  real-­‐4me  data  integra4on 2.  Real-­‐time  Analysis Process,  analyze,  and  react  –  all  in  real-­‐4me 3.  RT  Parallel  Processing Made  easy,  auto-­‐op4mized,  massive  scale Dataflow  finally  comes  of  age. Rela9onal  Streaming.    The  Next  Wave  of  Big  Data. Copyright  ©  2012  –  Proprietary  and  Confiden6al  Informa6on  of  SQLstream  Inc. Confiden6al  and  Trade  Secret  SQLstream  Inc.  ©  2012 12Tuesday, November 27, 12
  • Query the Future ® The Future of Query.Tuesday, November 27, 12
  • Tuesday, November 27, 12