32. CUSTOMIZING WHAT WHERE WHEN HOW
Classic
Batch
Windowed
Batch
Streaming Streaming +
Accumulation
For more information see https://cloud.google.com/dataflow/examples/gaming-example
37. WORD COUNT
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("input.txt")
| beam.FlatMap(lamdba s: re.split("W+", s)))
38. WORD COUNT
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("input.txt")
| beam.FlatMap(lamdba s: re.split("W+", s))
| beam.combiners.Count.PerElement())
39. WORD COUNT
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("input.txt")
| beam.FlatMap(lamdba s: re.split("W+", s))
| beam.combiners.Count.PerElement()
| beam.Map(lambda (w, c): "%s: %d" % (w, c)))
40. WORD COUNT
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("input.txt")
| beam.FlatMap(lamdba s: re.split("W+", s))
| beam.combiners.Count.PerElement()
| beam.Map(lambda (w, c): "%s: %d" % (w, c))
| beam.io.textio.WriteToText("output/stringcounts"))
49. Runner API
Other
Languages
Beam
Java
Beam
Python Runner API
Runner and language agnostic
representation of the user’s pipeline
graph. It only contains nodes of Beam
model primitives that all runners
understand to maintain portability
across runners.