SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
10.
@kimutansk
Lambda Architectureの時代
•Twitter(当時)のNathan氏が挙げたアーキテクチャ
– How to beat the CAP theorem(※)
•バッチレイヤとリアルタイムレイヤを
並行して実行し、結果をマージして表示する構成
(※)http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
59.
@kimutansk
プロダクト紹介 : Flink
•手続き的、低レベルAPIによる実装例
– processメソッドをストリームの各Eventに適用し、
個々の処理を行う。
val stream : DataStream[Tuple2[String, String]] = ...;
val result : DataStream[Tuple2[String, Long]] result =
stream
.keyBy(0)
.process(new CountWithTimeoutFunction());
case class CountWithTimestamp(key: String, count: Long, lastModified: Long)
class CountWithTimeoutFunction extends ProcessFunction[(String, Long), (String, Long)] {
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext()
.getState(new ValueStateDescriptor<>("myState", clasOf[CountWithTimestamp]))
override def processElement(value: (String, Long),
ctx: Context, out: Collector[(String, Long)]): Unit ...;
override def onTimer(timestamp: Long, ctx: OnTimerContext,
out: Collector[(String, Long)]): Unit = ...;
}
60.
@kimutansk
プロダクト紹介 : Flink
•Streaming SQL APIによる実装例
– ストリームにスキーマを設定し、SQLで処理を記述
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tableEnv = TableEnvironment.getTableEnvironment(env)
// read a DataStream from an external source
val ds: DataStream[(Long, String, Integer)] = env.addSource(...)
// register the DataStream under the name "Orders"
tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount)
// run a SQL query on the Table and retrieve the result as a new Table
val result = tableEnv.sql(
"SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")
70.
@kimutansk
参照資料(スライド中を除く)
• The world beyond batch: Streaming 101
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
• The world beyond batch: Streaming 102
– https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
• MillWheel: Fault-Tolerant Stream Processing at Internet Scale
– https://research.google.com/pubs/pub41378.html
• The Dataflow Model: A Practical Approach to Balancing Correctness, Latency,
and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
– https://research.google.com/pubs/pub43864.html
• The Evolution of Massive-Scale Data Processing
– https://goo.gl/jg4UAb
• Streaming Engines for Big Data
– http://www.slideshare.net/stavroskontopoulos/voxxed-days-thessaloniki-
21102016-streaming-engines-for-big-data
• Introduction to Streaming Analytics
– http://www.slideshare.net/gschmutz/introduction-to-streaming-analytics-
69120031
71.
@kimutansk
参照資料(スライド中を除く)
• Stream Processing Myths Debunked:Six Common Streaming Misconceptions
– http://data-artisans.com/stream-processing-myths-debunked/
• A Practical Guide to Selecting a Stream Processing Technology
– http://www.slideshare.net/ConfluentInc/a-practical-guide-to-selecting-a-stream-
processing-technology
– https://research.google.com/pubs/pub41378.html
• Apache Beam and Google Cloud Dataflow
– http://www.slideshare.net/SzabolcsFeczak/apache-beam-and-google-cloud-
dataflow-idg-final-64440998
• The Beam Model
– https://goo.gl/6ApbHV
• THROUGHPUT, LATENCY, AND YAHOO! PERFORMANCE BENCHMARKS. IS
THERE A WINNER?
– https://www.datatorrent.com/blog/throughput-latency-and-yahoo/
• Lightweight Asynchronous Snapshots for Distributed Dataflows
– https://arxiv.org/abs/1506.08603
72.
Thank you for your attention!
Enjoy Stream Processing!
https://www.flickr.com/photos/neokratz/4913885458