Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

494 views

Published on

Stream Processing in conjunction with a Consistent, Durable, Reliable stream storage is kicking the revolution up a notch in Big Data processing. This modern paradigm is enabling a new generation of data middleware that delivers on the streaming promise of a simplified and unified programming model. From data ingest, transformation, and messaging to search, time series and more, a robust streaming data ecosystem means we’ll all be able to more quickly build applications that solve problems we could not solve before.

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

  1. 1. Stream Processing Revolutionizing Big Data Srikanth Satya April 2018
  2. 2. pravega.io Data-Intensive Apps Need Disruptive Technologies The Unbundled Database vision sounds awesome!  Loosely coupled data derivations and transformations  Update derived state by observing data changes  Observe changes in derived state – all the way to the edge  Integrity and correctness: end-to-end IDs, idempotence, data consistency and exactly once semantics BUT realizing it requires disruptive systems capabilities  Shared, durable, consistent, unbound distributed log storage  Ability to dynamically scale both the log(s) and downstream processors in coordination with data arrival volume  Ability to deliver timely and accurate results processing the log continuously even with late arriving or out of order data
  3. 3. pravega.io The Unbundled Database vision sounds awesome!  Loosely coupled data derivations and transformations  Update derived state by observing data changes  Observe changes in derived state – all the way to the edge  Integrity and correctness: end-to-end IDs, idempotence, data consistency and exactly once semantics BUT realizing it requires disruptive systems capabilities  Shared, durable, consistent, unbound distributed log storage  Ability to dynamically scale both the log(s) and downstream processors in coordination with data arrival volume  Ability to deliver timely and accurate results processing the log continuously even with late arriving or out of order data Data-Intensive Apps Are Disruptive We passionately believe in these principles. As the industry leaders in storage, we’re developing a new, open storage primitive enabling all of us to realize the full potential of this powerful vision.
  4. 4. pravega.io Introducing Pravega Stream Storage
  5. 5. pravega.io Introducing Pravega Stream Storage A new storage abstraction – a stream – for continuous and infinite data  Named, durable, append-only, infinite sequence of bytes  With low-latency appends to and reads from the tail of the sequence  With high-throughput reads for older portions of the sequence Coordinated scaling of stream storage and stream processing  Stream writes partitioned by app-defined routing key  Stream reads independently and automatically partitioned by arrival rate SLO  Scaling protocol to allow stream processors to scale in lockstep with storage Enabling system-wide exactly once processing across multiple apps  Streams are ordered and strongly consistent  Chain independent streaming apps via streams  Stream transactions integrate with checkpoint schemes such as the one used in Flink
  6. 6. pravega.io Revisiting the Disruptive Capabilities Required Systems Capabilities  Shared, durable, consistent, unbound distributed log storage  Dynamically scale logs in coordination with downstream processors  Deliver accurate results processing continuously even with late arriving or out of order data Enabling Pravega Features  Durable, append-only byte streams  Consistent tail and replay reads  Unlimited retention, storage efficiency  Auto-scaling  Independently scale readers/writers  Transactions and exactly once  Event time and processing time
  7. 7. pravega.io The Streaming Revolution Enabling continuous pipelines w/ consistent replay, composability, elasticity, exactly once Ingest Buffer & Pub/Sub Streaming Search Streaming Analytics Cloud-Scale Storage Pravega Stream Store State Synchronizer
  8. 8. pravega.io Pravega for Ingest Buffer and Pub/Sub Ingest Buffer, Distributed Ledger or Messaging using Pravega Event Client Stream 01110110 01101001 Consumer s Reader Groups Consumer s Writers
  9. 9. pravega.io Pravega for Application State Synchronization Distributed State via State Synchronizer Client “Shared State” Stream 01110110 01101001 App Process #1 State Synchronizer Stream Client App Process #n State Synchronizer Stream Client • Shared Properties • Shared scalar data • Shared K/V data
  10. 10. pravega.io Pravega + Flink = Pure Streaming End-to-End Dynamically Scale Storage + Compute Based On Data Arrival Volume Protocol coordination between streaming storage and streaming engine to systematically scale up and down the number of segments and Flink workers based on load variance over time Utilize transactional writes to extend Exactly Once processing semantics across multiple, chained apps Writers scale based on app configuration; stream storage elastically and independently scales based on aggregate incoming volume of data Streaming App “Raw Stream” … … Social,IoT,… Writers “Cooked Stream” 2nd Streaming App Sink Worker Worker WorkerSegment Segment Segment Sink …… Worker WorkerSegment Segment
  11. 11. pravega.io Search Reimagined for a Streaming World Advantages of This Approach • Seamlessly integrate search into streaming pipelines: continuous indexing + continuous query • Dynamically scale search based on data volume arrival rate and query SLA • Eliminate redundant storage across input streams and search Input Streams Pravega Search Continuous Indexing Continuous Query Result Streams … stream pipeline … … stream pipeline …
  12. 12. pravega.io  Pravega: an open source project with an open community  Software includes infinite byte stream primitive, event abstraction, ingest buffer, and pub/sub services  Flink integration for scale, elasticity, and system-wide exactly once  Join the community at pravega.io

×