We've updated our privacy policy. Click here to review the details. Tap here to review the details.
Activate your 30 day free trial to unlock unlimited reading.
Activate your 30 day free trial to continue reading.
Download to read offline
Presented at All Things Open 2022
Presented by Danny McCormick
Title: Streaming Data Pipelines With Apache Beam
Abstract: Handling big data presents big problems. Along with traditional concerns like scalability and performance, the increasingly common need for live streaming data processing introduces problems like late or incomplete data from flaky data sources. Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines that addresses these challenges. Using one of the open source Beam SDKs, you can build a program that defines a pipeline to be executed by one of Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.
This talk will explore some problems associated with processing large datasets at scale and how you can write Apache Beam pipelines that address those issues. It will include a demo of a basic Beam streaming pipeline.
Takeaways: an understanding of some challenges associated with large datasets, the Apache Beam model, and how to write a basic Beam streaming pipeline
Audience: anyone dealing with big datasets or interested in data processing at scale.
Presented at All Things Open 2022
Presented by Danny McCormick
Title: Streaming Data Pipelines With Apache Beam
Abstract: Handling big data presents big problems. Along with traditional concerns like scalability and performance, the increasingly common need for live streaming data processing introduces problems like late or incomplete data from flaky data sources. Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines that addresses these challenges. Using one of the open source Beam SDKs, you can build a program that defines a pipeline to be executed by one of Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.
This talk will explore some problems associated with processing large datasets at scale and how you can write Apache Beam pipelines that address those issues. It will include a demo of a basic Beam streaming pipeline.
Takeaways: an understanding of some challenges associated with large datasets, the Apache Beam model, and how to write a basic Beam streaming pipeline
Audience: anyone dealing with big datasets or interested in data processing at scale.
You just clipped your first slide!
Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.
Cancel anytime.Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.
Thank you!