Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Insights Without
Tradeoffs Using
Structured Streaming
Michael Armbrust - @michaelarmbrust
Spark Summit East 2017
Parallelism
Split up the problem
to harness many
machines for computation
Complexity
Handle cross-machine
communication an...
Developer Productivity
Quickly and concisely express
common computations
Efficiency
Hand-tune code to minimize
overheads a...
Throughput
Process large historical
repositories quickly
Latency
Up-to-date answers
as new data arrives
Structured Streami...
Production Use Cases
Streaming at
Collect logs and metrics from a variety of
sources to ensure the security,
availability and performance of ou...
Engineer Office Hours
MY HOURS
• Spark Core
• R
• Data Science
• ML
• GraphFrames, Deep Learning
• Databricks
• Spark SQL
...
Thank you!
All code available at databricks.com/blog
and @michaelarmbrust
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit East talk by Oscar Castaneda Villagran
Next
Upcoming SlideShare
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit East talk by Oscar Castaneda Villagran
Next
Download to read offline and view in fullscreen.

Share

Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armbrust

Download to read offline

In Spark 2.0, we introduced Structured Streaming, which allows users to continually and incrementally update your view of the world as new data arrives, while still using the same familiar Spark SQL abstractions. I talk about progress we’ve made since then on robustness, latency, expressiveness and observability, using examples of production end-to-end continuous applications.

Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armbrust

  1. 1. Insights Without Tradeoffs Using Structured Streaming Michael Armbrust - @michaelarmbrust Spark Summit East 2017
  2. 2. Parallelism Split up the problem to harness many machines for computation Complexity Handle cross-machine communication and failures
  3. 3. Developer Productivity Quickly and concisely express common computations Efficiency Hand-tune code to minimize overheads and process the most data per cycle SQL
  4. 4. Throughput Process large historical repositories quickly Latency Up-to-date answers as new data arrives Structured Streaming
  5. 5. Production Use Cases
  6. 6. Streaming at Collect logs and metrics from a variety of sources to ensure the security, availability and performance of our cloud platform.
  7. 7. Engineer Office Hours MY HOURS • Spark Core • R • Data Science • ML • GraphFrames, Deep Learning • Databricks • Spark SQL • Structured Streaming • Spark SQL • Structured Streaming • Databricks TODAY 4:30 – 5:15 OTHER ENGINEERS TODAY 1:45 – 5:15 THURS 10:30 – 2:30 @ Databricks Booth
  8. 8. Thank you! All code available at databricks.com/blog and @michaelarmbrust
  • MichaelLi100

    Mar. 27, 2017
  • vkm1971

    Feb. 20, 2017
  • bunkertor

    Feb. 13, 2017
  • ChangZong

    Feb. 13, 2017
  • GueniaIzquierdo

    Feb. 11, 2017

In Spark 2.0, we introduced Structured Streaming, which allows users to continually and incrementally update your view of the world as new data arrives, while still using the same familiar Spark SQL abstractions. I talk about progress we’ve made since then on robustness, latency, expressiveness and observability, using examples of production end-to-end continuous applications.

Views

Total views

2,193

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

115

Shares

0

Comments

0

Likes

5

×