Sorry
How Bieber broke
Google Cloud at Spotify
Neville Li
@sinisa_lyh
Who am I?
● Spotify NYC since 2011
● Formerly Yahoo! Search
● Music recommendations
● Data & ML infrastructure
● Scala since 2013
About Us
● 100M+ active users
● 40M+ subscribers
● 30M+ songs, 20K new per day
● 2B+ playlists
● 1B+ plays per day
And We Have Data
Hadoop at Spotify
● On-premise → Amazon EMR → On-premise
● ~2,500 nodes, largest in EU
● 100PB+ Disk, 100TB+ RAM
● 60TB+ per day log ingestion
● 20K+ jobs per day
Data Processing
● Luigi, Python M/R, circa 2011
● Scalding, Spark, circa 2013
● 200+ Scala users, 1K+ unique jobs
● Storm for real-time
● Hive for ad-hoc analysis
Moving to
Google
CloudEarly 2015
Hive → BigQuery
● Full Avro scans → Columnar storage
● M/R jobs → Optimized execution engine
● Batch → interactive query
● $$$ → Pay for bytes processed
● Beam/Dataflow integration
Dremel Paper, 2010
Top Tracks in Vancouver Jun 2017
● 30 date partitioned tables, 60TB
● 1 metadata table, 418GB
● 94.2s, 4.82TB processed
Track Artist Count
Despacito - Remix Luis Fonsi 349188
I'm the One DJ Khaled 214863
HUMBLE. Kendrick Lamar 200690
Unforgettable French Montana 192141
XO TOUR Llif3 Lil Uzi Vert 148546
Slide Calvin Harris 134236
Attention Charlie Puth 131514
Strip That Down Liam Payne 125191
2U David Guetta 124998
Top Bieber Tracks Jun 2017
● 30 date partitioned tables, 60TB
● 1 metadata table, 418GB
● 81.8s, 2.92TB processed
Track Count
Love Yourself 28400153
Sorry 27269157
What Do You Mean? 20047452
Company 11794409
Purpose 8314381
I'll Show You 6627706
Baby 6161099
Boyfriend 5854955
The Feeling 5415162
As Long As You Love Me 4738529
Adoption
● ~500 unique users
● ~640K queries per month
● ~500PB queried per month
● Cluster management, multi-tenancy & resource utilization
● Tuning and scaling
● 3 sets of separate APIs
● Not cloud native & missing Google Cloud connectors
What is Beam and
Cloud Dataflow?
The Evolution of Apache Beam
MapReduce
BigTable DremelColossus
FlumeMegastoreSpanner
PubSub
Millwheel
Apache
Beam
Google Cloud
Dataflow
What is Apache Beam?
1. The Beam Programming Model
2. SDKs for writing Beam pipelines -- Java & Python
3. Runners for existing distributed processing backends
○ Apache Flink (thanks to data Artisans)
○ Apache Spark (thanks to Cloudera and PayPal)
○ Google Cloud Dataflow (fully managed service)
○ Local runner for testing
17
The Beam Model: Asking the Right Questions
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
18
Customizing What Where When How
3
Streaming
4
Streaming
+ Accumulation
1
Classic
Batch
2
Windowed
Batch
Why Beam and
Cloud Dataflow?
● Unified batch and streaming model
● Hosted, fully managed, no ops
● Auto-scaling, dynamic work re-balance
● GCP ecosystem - BigQuery, Bigtable, Datastore, Pubsub
Beam + Cloud Dataflow
No More
PagerDuty!
Why Beam and
Scala?
● High level DSL
● Familiarity with Scalding, Spark and Flink
● Functional programming natural fit for data
● Numerical libraries - Breeze, Algebird
● Macros & shapeless for code generation
Beam and Scala
Scio
A Scala API for Apache Beam and
Google Cloud Dataflow
Scio
Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]
Verb: I can, know, understand, have knowledge.
github.com/spotify/scio
Apache Licence 2.0
Cloud
Storage
Pub/Sub Datastore BigtableBigQuery
Batch Streaming Interactive REPL
Scio Scala API
Dataflow Java SDK Scala Libraries
Extra features
WordCount
val sc = ScioContext()
sc.textFile("shakespeare.txt")
.flatMap { _
.split("[^a-zA-Z']+")
.filter(_.nonEmpty)
}
.countByValue
.saveAsTextFile("wordcount.txt")
sc.close()
PageRank
def pageRank(in: SCollection[(String, String)]) = {
val links = in.groupByKey()
var ranks = links.mapValues(_ => 1.0)
for (i <- 1 to 10) {
val contribs = links.join(ranks).values
.flatMap { case (urls, rank) =>
val size = urls.size
urls.map((_, rank / size))
}
ranks = contribs.sumByKey.mapValues((1 - 0.85) + 0.85 * _)
}
ranks
}
Why Scio?
Type safe BigQuery
Macro generated case classes, schemas and converters
@BigQuery.fromQuery("SELECT id, name FROM [users] WHERE ...")
class User // look mom no code!
sc.typedBigQuery[User]().map(u => (u.id, u.name))
@BigQuery.toTable
case class Score(id: String, score: Double)
data.map(kv => Score(kv._1, kv._2)).saveAsTypedBigQuery("table")
● Best of both worlds
● BigQuery for slicing and
dicing huge datasets
● Scala for custom logic
● Seamless integration
BigQuery + Scio
REPL
$ scio-repl
Welcome to
_____
________________(_)_____
__ ___/ ___/_ /_ __ 
_(__ )/ /__ _ / / /_/ /
/____/ ___/ /_/ ____/ version 0.3.4
Using Scala version 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
Using 'scio-test' as your BigQuery project.
BigQuery client available as 'bq'
Scio context available as 'sc'
scio> _
Available in github.com/spotify/homebrew-public
● DAG and source code visualization
● BigQuery caching, legacy & SQL 2011 support
● HDFS, Protobuf, TensorFlow, Bigtable,
Elasticsearch & Cassandra I/O
● Join optimizations - hash, skewed, sparse
● Job metrics
● DistCache, job orchestration
Other goodies
Demo Time!
Adoption
● At Spotify
○ 200+ users, 700+ unique production pipelines (from ~70 10 months ago)
○ Most new to Scala and Scio
○ Both batch and streaming jobs
● Externally
○ ~10 companies, several fairly large ones
Use Cases
Fan Insights
● Listener stats
[artist|track] ×
[context|geography|demography] ×
[day|week|month]
● BigQuery, GCS, Datastore
● TBs daily
● 150+ Java jobs → ~10 Scio jobs
Master Metadata
● n1-standard-1 workers
● 1 core 3.75GB RAM
● Autoscaling 2-35 workers
● 26 Avro sources - artist, album, track, disc, cover art, ...
● 120GB out, 70M records
● 200 LOC vs original Java 600 LOC
And we broke Google
Listening history
● user x track x
[day/week/month/all time]
● 300B elements
● 800 n1-highmem-32 workers
● 32 core 208GB RAM
● 240TB in - Bigtable
● 90TB out - GCS
BigDiffy
● Pairwise field-level statistical diff
● Diff 2 SCollection[T] given keyFn: T => String
● T: Avro, BigQuery JSON, Protobuf
● Leaf field Δ - numeric, string (Levenshtein), vector (Cosine)
● Δ statistics - min, max, μ, σ, etc.
● Non-deterministic fields
○ ignore field
○ treat "repeated" field (List) as unordered list (Set)
Part of github.com/spotify/ratatool
Dataset Diff
● Diff stats
○ Global: # of SAME, DIFF, MISSING LHS/RHS
○ Key: key → SAME, DIFF, MISSING LHS/RHS
○ Field: field → min, max, μ, σ, etc.
● Use cases
○ Validating pipeline migration
○ Sanity checking ML models
Pairwise field-level deltas
val lKeyed = lhs.keyBy(keyFn)
val rKeyed = rhs.keyBy(keyFn)
val deltas = (lKeyed outerJoin rKeyed).map { case (k, (lOpt, rOpt)) =>
(lOpt, rOpt) match {
case (Some(l), Some(r)) =>
val ds = diffy(l, r) // Seq[Delta]
val dt = if (ds.isEmpty) SAME else DIFFERENT
(k, (ds, dt))
case (_, _) =>
val dt = if (lOpt.isDefined) MISSING_RHS else MISSING_LHS
(k, (Nil, dt))
}
}
Summing deltas
import com.twitter.algebird._
// convert deltas to map of (field → summable stats)
def deltasToMap(ds: Seq[Delta], dt: DeltaType)
: Map[String,
(Long, Option[(DeltaType, Min[Double], Max[Double], Moments)])] = {
// ...
}
deltas
.map { case (_, (ds, dt)) => deltasToMap(ds, dt) }
.sum // Semigroup!
1. Copy & paste from legacy
codebase to Scio
2. Verify with BigDiffy
3. Profit!
Featran
● a.k.a. Featran77 or F77
● Type safe feature transformer for machine learning
● Column-wise aggregation and transformation
● 1 shuffle stage (Algebird aggregation)
● In memory, Flink, Scalding, Scio & Spark runner
● Scala collection, array, Breeze, TensorFlow & NumPy output formats
● Property-based testing, 100% coverage
https://github.com/spotify/featran
Transforming features
case class Record(d1: Double, d2: Option[Double], d3: Double,
s: Seq[String])
val spec = FeatureSpec.of[Record]
.required(_.d1)(Binarizer("bin", threshold = 0.5))
.optional(_.d2)(Bucketizer("bucket", Array(0.0, 10.0, 20.0)))
.required(_.d3)(StandardScaling("std"))
.required(_.d3)(QuantileDiscretizer("q", 4))
.required(_.s)(NHotEncoder("nhot"))
val f = spec.extract(records)
f.featureNames // human readable names
f.featureValues[DenseVector[Double]] // output format
f.featureSettings // settings can be reloaded
shapeless-datatype
● Case class ↔ Avro, BigQuery, Datastore & TensorFlow types
● Compile time and extendable type mapping
● Generic mapper between case class types
● Type and lens based record matcher
https://github.com/nevillelyh/shapeless-datatype
protobuf-generic
● Generic Protobuf manipulation without compiled classes
● Similar to Avro GenericRecord
● Protobuf type T → JSON schema
● Bytes ↔ JSON with JSON schema
● Command line tool to inspect binary files
https://github.com/nevillelyh/protobuf-generic
Other uses
● AB testing
○ Statistical analysis with bootstrap
and DimSum
● Monetization
○ Ads targeting
○ User conversion analysis
● User understanding
○ Diversity
○ Session analysis
○ Behavior analysis
● Home page ranking
● Audio fingerprint analysis
We're developing so
fast that Google's
having a hard time
keep up capacity
Google Cloud Capacity
● Almost too easy to write big complex jobs leveraging
BigQuery & Dataflow
● BigQuery export limit 10TB/day/project
● Compute engine datacenter capacity
● Shuffle disk size & throughput
● Bigtable throttling
And We Have Data
Scala Challenges
Serialization
● Data ser/de
○ Scalding, Spark and Storm uses Kryo and Twitter Chill
○ Dataflow/Beam requires explicit Coder[T]
Sometimes inferable via Guava TypeToken
○ ClassTag to the rescue, fallback to Kryo/Chill
● Lambda ser/de
○ Custom ClosureCleaner, Serializable and @transient lazy val
○ Scala 2.12, Java 8 lambda issues #10232 #10233
REPL
● Spark REPL transports lambda bytecode via HTTP
● Dataflow requires job jar for execution (masterless)
● Custom class loader and ILoop
● Interpreted classes → job jar → job submission
● SCollection[T]#closeAndCollect(): Iterator[T]
to mimic Spark actions
Macros and IntelliJ IDEA
● IntelliJ IDEA does not see macro expanded classes
https://youtrack.jetbrains.com/issue/SCL-8834
● @BigQueryType.{fromTable, fromQuery}
class MyRecord
● Scio IDEA plugin
https://github.com/spotify/scio-idea-plugin
Make IntelliJ
Intelligent Again!
Scio in Apache Zeppelin
Local Zeppelin server, remote managed Dataflow cluster, NO OPS
The End
Thank You
Neville Li
@sinisa_lyh

Sorry - How Bieber broke Google Cloud at Spotify

  • 1.
    Sorry How Bieber broke GoogleCloud at Spotify Neville Li @sinisa_lyh
  • 2.
    Who am I? ●Spotify NYC since 2011 ● Formerly Yahoo! Search ● Music recommendations ● Data & ML infrastructure ● Scala since 2013
  • 3.
    About Us ● 100M+active users ● 40M+ subscribers ● 30M+ songs, 20K new per day ● 2B+ playlists ● 1B+ plays per day
  • 5.
  • 6.
    Hadoop at Spotify ●On-premise → Amazon EMR → On-premise ● ~2,500 nodes, largest in EU ● 100PB+ Disk, 100TB+ RAM ● 60TB+ per day log ingestion ● 20K+ jobs per day
  • 7.
    Data Processing ● Luigi,Python M/R, circa 2011 ● Scalding, Spark, circa 2013 ● 200+ Scala users, 1K+ unique jobs ● Storm for real-time ● Hive for ad-hoc analysis
  • 8.
  • 9.
    Hive → BigQuery ●Full Avro scans → Columnar storage ● M/R jobs → Optimized execution engine ● Batch → interactive query ● $$$ → Pay for bytes processed ● Beam/Dataflow integration Dremel Paper, 2010
  • 10.
    Top Tracks inVancouver Jun 2017 ● 30 date partitioned tables, 60TB ● 1 metadata table, 418GB ● 94.2s, 4.82TB processed Track Artist Count Despacito - Remix Luis Fonsi 349188 I'm the One DJ Khaled 214863 HUMBLE. Kendrick Lamar 200690 Unforgettable French Montana 192141 XO TOUR Llif3 Lil Uzi Vert 148546 Slide Calvin Harris 134236 Attention Charlie Puth 131514 Strip That Down Liam Payne 125191 2U David Guetta 124998
  • 11.
    Top Bieber TracksJun 2017 ● 30 date partitioned tables, 60TB ● 1 metadata table, 418GB ● 81.8s, 2.92TB processed Track Count Love Yourself 28400153 Sorry 27269157 What Do You Mean? 20047452 Company 11794409 Purpose 8314381 I'll Show You 6627706 Baby 6161099 Boyfriend 5854955 The Feeling 5415162 As Long As You Love Me 4738529
  • 12.
    Adoption ● ~500 uniqueusers ● ~640K queries per month ● ~500PB queried per month
  • 13.
    ● Cluster management,multi-tenancy & resource utilization ● Tuning and scaling ● 3 sets of separate APIs ● Not cloud native & missing Google Cloud connectors
  • 14.
    What is Beamand Cloud Dataflow?
  • 15.
    The Evolution ofApache Beam MapReduce BigTable DremelColossus FlumeMegastoreSpanner PubSub Millwheel Apache Beam Google Cloud Dataflow
  • 16.
    What is ApacheBeam? 1. The Beam Programming Model 2. SDKs for writing Beam pipelines -- Java & Python 3. Runners for existing distributed processing backends ○ Apache Flink (thanks to data Artisans) ○ Apache Spark (thanks to Cloudera and PayPal) ○ Google Cloud Dataflow (fully managed service) ○ Local runner for testing
  • 17.
    17 The Beam Model:Asking the Right Questions What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
  • 18.
    18 Customizing What WhereWhen How 3 Streaming 4 Streaming + Accumulation 1 Classic Batch 2 Windowed Batch
  • 19.
  • 20.
    ● Unified batchand streaming model ● Hosted, fully managed, no ops ● Auto-scaling, dynamic work re-balance ● GCP ecosystem - BigQuery, Bigtable, Datastore, Pubsub Beam + Cloud Dataflow
  • 21.
  • 22.
  • 23.
    ● High levelDSL ● Familiarity with Scalding, Spark and Flink ● Functional programming natural fit for data ● Numerical libraries - Breeze, Algebird ● Macros & shapeless for code generation Beam and Scala
  • 24.
    Scio A Scala APIfor Apache Beam and Google Cloud Dataflow
  • 25.
    Scio Ecclesiastical Latin IPA:/ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o] Verb: I can, know, understand, have knowledge.
  • 26.
  • 27.
    Cloud Storage Pub/Sub Datastore BigtableBigQuery BatchStreaming Interactive REPL Scio Scala API Dataflow Java SDK Scala Libraries Extra features
  • 28.
    WordCount val sc =ScioContext() sc.textFile("shakespeare.txt") .flatMap { _ .split("[^a-zA-Z']+") .filter(_.nonEmpty) } .countByValue .saveAsTextFile("wordcount.txt") sc.close()
  • 29.
    PageRank def pageRank(in: SCollection[(String,String)]) = { val links = in.groupByKey() var ranks = links.mapValues(_ => 1.0) for (i <- 1 to 10) { val contribs = links.join(ranks).values .flatMap { case (urls, rank) => val size = urls.size urls.map((_, rank / size)) } ranks = contribs.sumByKey.mapValues((1 - 0.85) + 0.85 * _) } ranks }
  • 30.
  • 31.
    Type safe BigQuery Macrogenerated case classes, schemas and converters @BigQuery.fromQuery("SELECT id, name FROM [users] WHERE ...") class User // look mom no code! sc.typedBigQuery[User]().map(u => (u.id, u.name)) @BigQuery.toTable case class Score(id: String, score: Double) data.map(kv => Score(kv._1, kv._2)).saveAsTypedBigQuery("table")
  • 32.
    ● Best ofboth worlds ● BigQuery for slicing and dicing huge datasets ● Scala for custom logic ● Seamless integration BigQuery + Scio
  • 33.
    REPL $ scio-repl Welcome to _____ ________________(_)_____ _____/ ___/_ /_ __ _(__ )/ /__ _ / / /_/ / /____/ ___/ /_/ ____/ version 0.3.4 Using Scala version 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) Type in expressions to have them evaluated. Type :help for more information. Using 'scio-test' as your BigQuery project. BigQuery client available as 'bq' Scio context available as 'sc' scio> _ Available in github.com/spotify/homebrew-public
  • 34.
    ● DAG andsource code visualization ● BigQuery caching, legacy & SQL 2011 support ● HDFS, Protobuf, TensorFlow, Bigtable, Elasticsearch & Cassandra I/O ● Join optimizations - hash, skewed, sparse ● Job metrics ● DistCache, job orchestration Other goodies
  • 35.
  • 36.
    Adoption ● At Spotify ○200+ users, 700+ unique production pipelines (from ~70 10 months ago) ○ Most new to Scala and Scio ○ Both batch and streaming jobs ● Externally ○ ~10 companies, several fairly large ones
  • 37.
  • 38.
    Fan Insights ● Listenerstats [artist|track] × [context|geography|demography] × [day|week|month] ● BigQuery, GCS, Datastore ● TBs daily ● 150+ Java jobs → ~10 Scio jobs
  • 39.
    Master Metadata ● n1-standard-1workers ● 1 core 3.75GB RAM ● Autoscaling 2-35 workers ● 26 Avro sources - artist, album, track, disc, cover art, ... ● 120GB out, 70M records ● 200 LOC vs original Java 600 LOC
  • 40.
  • 41.
    Listening history ● userx track x [day/week/month/all time] ● 300B elements ● 800 n1-highmem-32 workers ● 32 core 208GB RAM ● 240TB in - Bigtable ● 90TB out - GCS
  • 42.
    BigDiffy ● Pairwise field-levelstatistical diff ● Diff 2 SCollection[T] given keyFn: T => String ● T: Avro, BigQuery JSON, Protobuf ● Leaf field Δ - numeric, string (Levenshtein), vector (Cosine) ● Δ statistics - min, max, μ, σ, etc. ● Non-deterministic fields ○ ignore field ○ treat "repeated" field (List) as unordered list (Set) Part of github.com/spotify/ratatool
  • 43.
    Dataset Diff ● Diffstats ○ Global: # of SAME, DIFF, MISSING LHS/RHS ○ Key: key → SAME, DIFF, MISSING LHS/RHS ○ Field: field → min, max, μ, σ, etc. ● Use cases ○ Validating pipeline migration ○ Sanity checking ML models
  • 44.
    Pairwise field-level deltas vallKeyed = lhs.keyBy(keyFn) val rKeyed = rhs.keyBy(keyFn) val deltas = (lKeyed outerJoin rKeyed).map { case (k, (lOpt, rOpt)) => (lOpt, rOpt) match { case (Some(l), Some(r)) => val ds = diffy(l, r) // Seq[Delta] val dt = if (ds.isEmpty) SAME else DIFFERENT (k, (ds, dt)) case (_, _) => val dt = if (lOpt.isDefined) MISSING_RHS else MISSING_LHS (k, (Nil, dt)) } }
  • 45.
    Summing deltas import com.twitter.algebird._ //convert deltas to map of (field → summable stats) def deltasToMap(ds: Seq[Delta], dt: DeltaType) : Map[String, (Long, Option[(DeltaType, Min[Double], Max[Double], Moments)])] = { // ... } deltas .map { case (_, (ds, dt)) => deltasToMap(ds, dt) } .sum // Semigroup!
  • 46.
    1. Copy &paste from legacy codebase to Scio 2. Verify with BigDiffy 3. Profit!
  • 47.
    Featran ● a.k.a. Featran77or F77 ● Type safe feature transformer for machine learning ● Column-wise aggregation and transformation ● 1 shuffle stage (Algebird aggregation) ● In memory, Flink, Scalding, Scio & Spark runner ● Scala collection, array, Breeze, TensorFlow & NumPy output formats ● Property-based testing, 100% coverage https://github.com/spotify/featran
  • 48.
    Transforming features case classRecord(d1: Double, d2: Option[Double], d3: Double, s: Seq[String]) val spec = FeatureSpec.of[Record] .required(_.d1)(Binarizer("bin", threshold = 0.5)) .optional(_.d2)(Bucketizer("bucket", Array(0.0, 10.0, 20.0))) .required(_.d3)(StandardScaling("std")) .required(_.d3)(QuantileDiscretizer("q", 4)) .required(_.s)(NHotEncoder("nhot")) val f = spec.extract(records) f.featureNames // human readable names f.featureValues[DenseVector[Double]] // output format f.featureSettings // settings can be reloaded
  • 49.
    shapeless-datatype ● Case class↔ Avro, BigQuery, Datastore & TensorFlow types ● Compile time and extendable type mapping ● Generic mapper between case class types ● Type and lens based record matcher https://github.com/nevillelyh/shapeless-datatype
  • 50.
    protobuf-generic ● Generic Protobufmanipulation without compiled classes ● Similar to Avro GenericRecord ● Protobuf type T → JSON schema ● Bytes ↔ JSON with JSON schema ● Command line tool to inspect binary files https://github.com/nevillelyh/protobuf-generic
  • 51.
    Other uses ● ABtesting ○ Statistical analysis with bootstrap and DimSum ● Monetization ○ Ads targeting ○ User conversion analysis ● User understanding ○ Diversity ○ Session analysis ○ Behavior analysis ● Home page ranking ● Audio fingerprint analysis
  • 52.
    We're developing so fastthat Google's having a hard time keep up capacity
  • 53.
    Google Cloud Capacity ●Almost too easy to write big complex jobs leveraging BigQuery & Dataflow ● BigQuery export limit 10TB/day/project ● Compute engine datacenter capacity ● Shuffle disk size & throughput ● Bigtable throttling
  • 54.
  • 55.
  • 56.
    Serialization ● Data ser/de ○Scalding, Spark and Storm uses Kryo and Twitter Chill ○ Dataflow/Beam requires explicit Coder[T] Sometimes inferable via Guava TypeToken ○ ClassTag to the rescue, fallback to Kryo/Chill ● Lambda ser/de ○ Custom ClosureCleaner, Serializable and @transient lazy val ○ Scala 2.12, Java 8 lambda issues #10232 #10233
  • 57.
    REPL ● Spark REPLtransports lambda bytecode via HTTP ● Dataflow requires job jar for execution (masterless) ● Custom class loader and ILoop ● Interpreted classes → job jar → job submission ● SCollection[T]#closeAndCollect(): Iterator[T] to mimic Spark actions
  • 58.
    Macros and IntelliJIDEA ● IntelliJ IDEA does not see macro expanded classes https://youtrack.jetbrains.com/issue/SCL-8834 ● @BigQueryType.{fromTable, fromQuery} class MyRecord ● Scio IDEA plugin https://github.com/spotify/scio-idea-plugin
  • 59.
  • 60.
    Scio in ApacheZeppelin Local Zeppelin server, remote managed Dataflow cluster, NO OPS
  • 61.