SlideShare a Scribd company logo
Microservices, Containers,
and Machine Learning
Paco Nathan, @pacoid

• follow the license agreement instructions	

• then click the download for your OS	

• need JDK instead of JRE (for Maven, etc.)	

• JDK 6, 7, 8 is fine
Downloads: Java JDK
For Python 2.7, check out Anaconda by
Continuum Analytics for a full-featured
Downloads: Python
Let’s get started using Apache Spark, in just a few
easy steps… Download code from:	

or for a fallback:	

Also, the GitHub project:
Downloads: Spark
Connect into the inflated “spark” directory,
then run:	

Downloads: Spark
Spark Deconstructed
// load error messages from a log into memory!
// then interactively search for various patterns!
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
Spark Deconstructed: Log Mining Example
Spark Deconstructed: Log Mining Example
We start with Spark running on a cluster…

submitting code to be evaluated on it:
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
Spark Deconstructed: Log Mining Example
discussing the other part
Spark Deconstructed: Log Mining Example
scala> messages.toDebugString!
res5: String = !
MappedRDD[4] at map at <console>:16 (3 partitions)!
MappedRDD[3] at map at <console>:16 (3 partitions)!
FilteredRDD[2] at filter at <console>:14 (3 partitions)!
MappedRDD[1] at textFile at <console>:12 (3 partitions)!
HadoopRDD[0] at textFile at <console>:12 (3 partitions)
At this point, take a look at the transformed
RDD operator graph:
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
cache 1
cache 2
cache 3
cache data
cache data
cache data
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
cache 1
cache 2
cache 3
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
block 1
block 2
block 3
cache 1
cache 2
cache 3
Spark Deconstructed: Log Mining Example
discussing the other part
block 1
block 2
block 3
cache 1
cache 2
cache 3
from cache
from cache
from cache
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part
block 1
block 2
block 3
cache 1
cache 2
cache 3
Spark Deconstructed: Log Mining Example
// base RDD!
val lines = sc.textFile("hdfs://...")!
// transformed RDDs!
val errors = lines.filter(_.startsWith("ERROR"))!
val messages ="t")).map(r => r(1))!
// action 1!
// action 2!
discussing the other part

Key Points:	

• graph-parallel systems	

• importance of workflows	

• optimizations
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

J. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin

Pregel: Large-scale graph computing at Google

Grzegorz Czajkowski, et al.

GraphX: Unified Graph Analytics on Spark

Ankur Dave, Databricks

Advanced Exercises: GraphX
import org.apache.spark.graphx._!
import org.apache.spark.rdd.RDD!
case class Peep(name: String, age: Int)!
val nodeArray = Array(!
(1L, Peep("Kim", 23)), (2L, Peep("Pat", 31)),!
(3L, Peep("Chris", 52)), (4L, Peep("Kelly", 39)),!
(5L, Peep("Leslie", 45))!
val edgeArray = Array(!
Edge(2L, 1L, 7), Edge(2L, 4L, 2),!
Edge(3L, 2L, 4), Edge(3L, 5L, 3),!
Edge(4L, 1L, 1), Edge(5L, 3L, 9)!
val nodeRDD: RDD[(Long, Peep)] = sc.parallelize(nodeArray)!
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)!
val g: Graph[Peep, Int] = Graph(nodeRDD, edgeRDD)!
val results = g.triplets.filter(t => t.attr > 7)!
for (triplet <- results.collect) {!
println(s"${} loves ${}")!
GraphX: demo
TextRank Demo:	


IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark!
GraphX: demo
evaluationoptimizationrepresentationcirca 2010
ETL into
train set
test set
data pipelines
actionable results
decisions, feedback
bar developers
foo algorithms
Typical Workflows:
Workflows: Scraper pipeline
Typical data rates, e.g., for	

• ~2K msgs/month	

• ~6 MB as JSON	

• ~13 MB parsed	

Three months’ list activity represents a graph of:	

• 1061 senders	

• 753,400 nodes	

• 1,027,806 edges	

A big graph! However, it satisfies definition for a 

graph-parallel system; lots of data locality to leverage
Workflows: A Few Notes about Microservices and Containers
The Strengths andWeaknesses of Microservices

Abel Avram	

DockerCon EU Keynote: State of the Art in Microservices

Adrian Cockcroft

Microservices Architecture

Martin Fowler
Workflows: An Example…
Python-based service in a Docker container?	

Just Enough Math, IPython+Docker

Paco Nathan, Andrew Odewahn, Kyle Kelly	

Docker Jumpstart

Andrew Odewahn
Workflows: A Brief Note about ETL in SparkSQL
Spark SQL Data Sources API: Unified Data Access for
the Spark Platform

Michael Armbrust
This Workflow: Microservices meet Parallel Processing
archives community
Data Prep
Scraper /
data Unique
Word IDs
not so big data… relatively big compute…
Workflows: Scraper pipeline
email list
monthly list
by date
Workflows: Scraper pipeline
email list
monthly list
by date
"date": "2014-10-01T00:16:08+00:00",!
"id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",!
"next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",!
"next_url": "
"prev_thread": "",!
"sender": "Debasish Das <>",!
"subject": "Re: memory vs data_size",!
"text": "nOnly fit the data in memory where you want to run the iterativenalgorithm....n
tag and
JSON Treebank,
Workflows: Parser pipeline
tag and
JSON Treebank,
Workflows: Parser pipeline
"graf": [ [1, "Only", "only", "RB", 1, 0], [2, "fit", "fit", "VBP", 1, 1 ] ... ],!
"id": “CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",!
"polr": 0.2,!
"sha1": "178b7a57ec6168f20a8a4f705fb8b0b04e59eeb7",!
"size": 14,!
"subj": 0.7,!
"tile": [ [1, 2], [2, 3], [3, 4] ... ]!
"date": "2014-10-01T00:16:08+00:00",!
"id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",!
"next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",!
"next_url": "
"prev_thread": "",!
"sender": "Debasish Das <>",!
"subject": "Re: memory vs data_size",!
"text": "nOnly fit the data in memory where you want to run the iterativenalgorithm....nnFor
Workflows: TextRank pipeline
word graph
Workflows: TextRank pipeline
"Compatibility of systems of linear constraints"
[{'index': 0, 'stem': 'compat', 'tag': 'NNP','word': 'compatibility'},
{'index': 1, 'stem': 'of', 'tag': 'IN', 'word': 'of'},
{'index': 2, 'stem': 'system', 'tag': 'NNS', 'word': 'systems'},
{'index': 3, 'stem': 'of', 'tag': 'IN', 'word': 'of'},
{'index': 4, 'stem': 'linear', 'tag': 'JJ', 'word': 'linear'},
{'index': 5, 'stem': 'constraint', 'tag': 'NNS','word': 'constraints'}]
TextRank: Bringing Order intoTexts	

Rada Mihalcea, Paul Tarau
Workflows: TextRank – how it works
TextRank impl
TextRank impl: load parquet files
import org.apache.spark.graphx._!
import org.apache.spark.rdd.RDD!
val sqlCtx = new org.apache.spark.sql.SQLContext(sc)!
import sqlCtx._!
val edge = sqlCtx.parquetFile("graf_edge.parquet")!
val node = sqlCtx.parquetFile("graf_node.parquet")!
// pick one message as an example; at scale we'd parallelize!
val msg_id = "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw"
TextRank impl: use SparkSQL to collect node list + edge list
val sql = """!
SELECT node_id, root !
FROM node !
WHERE id='%s' AND keep='1'!
val n = sqlCtx.sql(sql.stripMargin).distinct()!
val nodes: RDD[(Long, String)] ={ p =>!
(p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[String])!
val sql = """!
SELECT node0, node1 !
FROM edge !
WHERE id='%s'!
val e = sqlCtx.sql(sql.stripMargin).distinct()!
val edges: RDD[Edge[Int]] ={ p =>!
Edge(p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[Int].toLong, 0)!
TextRank impl: use GraphX to run PageRank
// run PageRank!
val g: Graph[String, Int] = Graph(nodes, edges)!
val r = g.pageRank(0.0001).vertices!
r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)!
// save the ranks!
case class Rank(id: Int, rank: Float)!
val rank = => Rank(p._1.toInt, p._2.toFloat))!
def median[T](s: Seq[T])(implicit n: Fractional[T]) = {!
import n._!
val (lower, upper) = s.sortWith(_<_).splitAt(s.size / 2)!
if (s.size % 2 == 0) (lower.last + upper.head) / fromInt(2) else upper.head!
val min_rank = median(
TextRank impl: join ranked words with parsed text
var span:List[String] = List()!
var last_index = -1!
var rank_sum = 0.0!
var phrases:collection.mutable.Map[String, Double] = collection.mutable.Map()!
val sql = """!
SELECT n.num, n.raw, r.rank!
FROM node n JOIN rank r ON n.node_id = !
WHERE'%s' AND n.keep='1'!
ORDER BY n.num!
val s = sqlCtx.sql(sql.stripMargin).collect()
TextRank impl: “pull strings” for the top-ranked keyphrases
s.foreach { x => !
//println (x)!
val index = x.getInt(0)!
val word = x.getString(1)!
val rank = x.getFloat(2)!
var isStop = false!
// test for break from past!
if (span.size > 0 && rank < min_rank) isStop = true!
if (span.size > 0 && (index - last_index > 1)) isStop = true!
// clear accumulation!
if (isStop) {!
val phrase = span.mkString(" ")!
phrases += (phrase -> rank_sum)!
span = List()!
last_index = index!
rank_sum = 0.0!
// start or append!
if (rank >= min_rank) {!
span = span :+ word!
last_index = index!
rank_sum += rank!
TextRank impl: report the top keyphrases
// summarize the text as a list of ranked keyphrases!
val summary = sc.parallelize(phrases.toSeq)!
.sortBy(_._2, ascending=false)
Reply Graph
Reply Graph: load parquet files
import org.apache.spark.graphx._!
import org.apache.spark.rdd.RDD!
val sqlCtx = new org.apache.spark.sql.SQLContext(sc)!
import sqlCtx._!
val edge = sqlCtx.parquetFile("reply_edge.parquet")!
val node = sqlCtx.parquetFile("reply_node.parquet")!
Reply Graph: use SparkSQL to collect node list + edge list
val sql = "SELECT id, sender FROM node"!
val n = sqlCtx.sql(sql).distinct()!
val nodes: RDD[(Long, String)] ={ p =>!
(p(0).asInstanceOf[Long], p(1).asInstanceOf[String])!
val sql = "SELECT replier, sender, num FROM edge"!
val e = sqlCtx.sql(sql).distinct()!
val edges: RDD[Edge[Int]] ={ p =>!
Edge(p(0).asInstanceOf[Long], p(1).asInstanceOf[Long], p(2).asInstanceOf[Int])!
Reply Graph: use GraphX to run graph analytics
// run graph analytics!
val g: Graph[String, Int] = Graph(nodes, edges)!
val r = g.pageRank(0.0001).vertices!
r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)!
// define a reduce operation to compute the highest degree vertex!
def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {!
if (a._2 > b._2) a else b!
// compute the max degrees!
val maxInDegree: (VertexId, Int) = g.inDegrees.reduce(max)!
val maxOutDegree: (VertexId, Int) = g.outDegrees.reduce(max)!
val maxDegrees: (VertexId, Int) = g.degrees.reduce(max)!
// connected components!
val scc = g.stronglyConnectedComponents(10).vertices!
Reply Graph: PageRank of top dev@spark email, 4Q2014
(389,(22.690229478710016,Sean Owen <>))!
(857,(20.832469059298248,Akhil Das <>))!
(652,(13.281821379806798,Michael Armbrust <>))!
(101,(9.963167550803664,Tobias Pfeiffer <>))!
(471,(9.614436778460558,Steve Lewis <>))!
(931,(8.217073486575732,shahab <>))!
(48,(7.653814912512137,ll <>))!
(1011,(7.602002681952157,Ashic Mahtab <>))!
(1055,(7.572376489758199,Cheng Lian <>))!
(122,(6.87247388819558,Gerard Maas <>))!
(904,(6.252657820614504,Xiangrui Meng <>))!
(827,(6.0941062762076115,Jianshi Huang <>))!
(887,(5.835053915864531,Davies Liu <>))!
(303,(5.724235650446037,Ted Yu <>))!
(206,(5.430238461114108,Deep Pradhan <>))!
(483,(5.332452537151523,Akshat Aranya <>))!
(185,(5.259438927615685,SK <>))!
(636,(5.235941228955769,Matei Zaharia <matei.zaha…>))!
// seaaaaaaaaaan!!
maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (389,126)!
maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (389,170)!
maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (389,296)
Reply Graph: What SSSP looks like in GraphX/Pregel
Look Ahead: Where is this heading?
Feature learning withWord2Vec

Matt Krzus
by topic
better than
features… models… insights…
Apache Spark developer certificate program
• defined by Spark experts @Databricks
• assessed by O’Reilly Media
• establishes the bar for Spark expertise
Anthony Joseph

UC Berkeley	

begins 2015-02-23
Ameet Talwalkar


begins 2015-04-14
events worldwide:
video+preso archives:
Strata CA

San Jose, Feb 18-20
Spark Summit East

NYC, Mar 18-19
Big Data Tech Con

Boston, Apr 26-28
Strata EU

London, May 5-7
Spark Summit 2015

SF, Jun 15-17
Fast Data Processing 

with Spark

Holden Karau

Packt (2013)
Spark in Action

Chris Fregly

Manning (2015*)
Learning Spark

Holden Karau, 

Andy Konwinski,
Matei Zaharia

O’Reilly (2015*)
Just Enough Math
O’Reilly, 2014

monthly newsletter for updates, 

events, conf summaries, etc.:
Enterprise Data Workflows
with Cascading
O’Reilly, 2013

More Related Content

What's hot

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
Stepan Pushkarev
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlow
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Alex Zeltov
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
Spark streaming
Spark streamingSpark streaming
Spark streaming
Noam Shaish
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Robert "Chip" Senkbeil
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark

What's hot (20)

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlow
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
Spark streaming
Spark streamingSpark streaming
Spark streaming
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark

Similar to Microservices, Containers, and Machine Learning

#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
Paco Nathan
Intro to apache spark stand ford
Intro to apache spark stand fordIntro to apache spark stand ford
Intro to apache spark stand ford
Thu Hiền
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
Paco Nathan
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
Paco Nathan
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
Li Ming Tsai
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere's talk about "Introduction to spark"'s talk about "Introduction to spark"'s talk about "Introduction to spark"'s talk about "Introduction to spark"
Giivee The
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
Paco Nathan
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
IT Event
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
Wisely chen
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache Spark
Reynold Xin
Spark 101
Spark 101Spark 101
Spark 101
Mohit Garg
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Euangelos Linardos

Similar to Microservices, Containers, and Machine Learning (20)

#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
Intro to apache spark stand ford
Intro to apache spark stand fordIntro to apache spark stand ford
Intro to apache spark stand ford
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...'s talk about "Introduction to spark"'s talk about "Introduction to spark"'s talk about "Introduction to spark"'s talk about "Introduction to spark"
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache Spark
Spark 101
Spark 101Spark 101
Spark 101
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos

More from Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
Paco Nathan
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Paco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
Paco Nathan
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
Paco Nathan
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
Paco Nathan
Computable Content
Computable ContentComputable Content
Computable Content
Paco Nathan
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
Paco Nathan
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
Paco Nathan
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
Paco Nathan
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
Paco Nathan
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
Paco Nathan
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Paco Nathan
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Paco Nathan
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
Paco Nathan
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan

More from Paco Nathan (20)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
Computable Content
Computable ContentComputable Content
Computable Content
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape

Recently uploaded

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Founder Sachin Dev Duggal's Strategic Approach to Create an Innova... Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Microservices, Containers, and Machine Learning

  • 1. Microservices, Containers, and Machine Learning Paco Nathan, @pacoid
  • 3. jdk7-downloads-1880260.html • follow the license agreement instructions • then click the download for your OS • need JDK instead of JRE (for Maven, etc.) • JDK 6, 7, 8 is fine Downloads: Java JDK
  • 4. For Python 2.7, check out Anaconda by Continuum Analytics for a full-featured platform: Downloads: Python
  • 5. Let’s get started using Apache Spark, in just a few easy steps… Download code from: or for a fallback: ! Also, the GitHub project: Downloads: Spark
  • 6. Connect into the inflated “spark” directory, then run: ./bin/spark-shell! Downloads: Spark
  • 8. // load error messages from a log into memory! // then interactively search for various patterns! //! ! // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() Spark Deconstructed: Log Mining Example
  • 9. Driver Worker Worker Worker Spark Deconstructed: Log Mining Example We start with Spark running on a cluster…
 submitting code to be evaluated on it:
  • 10. // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() Spark Deconstructed: Log Mining Example discussing the other part
  • 11. Spark Deconstructed: Log Mining Example scala> messages.toDebugString! res5: String = ! MappedRDD[4] at map at <console>:16 (3 partitions)! MappedRDD[3] at map at <console>:16 (3 partitions)! FilteredRDD[2] at filter at <console>:14 (3 partitions)! MappedRDD[1] at textFile at <console>:12 (3 partitions)! HadoopRDD[0] at textFile at <console>:12 (3 partitions) At this point, take a look at the transformed RDD operator graph:
  • 12. Driver Worker Worker Worker Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 13. Driver Worker Worker Worker block 1 block 2 block 3 Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 14. Driver Worker Worker Worker block 1 block 2 block 3 Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 15. Driver Worker Worker Worker block 1 block 2 block 3 read HDFS block read HDFS block read HDFS block Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 16. Driver Worker Worker Worker block 1 block 2 block 3 cache 1 cache 2 cache 3 process, cache data process, cache data process, cache data Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 17. Driver Worker Worker Worker block 1 block 2 block 3 cache 1 cache 2 cache 3 Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 18. // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains("mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() Driver Worker Worker Worker block 1 block 2 block 3 cache 1 cache 2 cache 3 Spark Deconstructed: Log Mining Example discussing the other part
  • 19. Driver Worker Worker Worker block 1 block 2 block 3 cache 1 cache 2 cache 3 process from cache process from cache process from cache Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains(“mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 20. Driver Worker Worker Worker block 1 block 2 block 3 cache 1 cache 2 cache 3 Spark Deconstructed: Log Mining Example // base RDD! val lines = sc.textFile("hdfs://...")! ! // transformed RDDs! val errors = lines.filter(_.startsWith("ERROR"))! val messages ="t")).map(r => r(1))! messages.cache()! ! // action 1! messages.filter(_.contains(“mysql")).count()! ! // action 2! messages.filter(_.contains("php")).count() discussing the other part
  • 23. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
 J. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin bickson-guestrin.pdf Pregel: Large-scale graph computing at Google
 Grzegorz Czajkowski, et al. graph-computing-at-google.html GraphX: Unified Graph Analytics on Spark
 Ankur Dave, Databricks graphx@sparksummit_2014-07.pdf Advanced Exercises: GraphX analytics-with-graphx.html GraphX
  • 24. //! ! import org.apache.spark.graphx._! import org.apache.spark.rdd.RDD! ! case class Peep(name: String, age: Int)! ! val nodeArray = Array(! (1L, Peep("Kim", 23)), (2L, Peep("Pat", 31)),! (3L, Peep("Chris", 52)), (4L, Peep("Kelly", 39)),! (5L, Peep("Leslie", 45))! )! val edgeArray = Array(! Edge(2L, 1L, 7), Edge(2L, 4L, 2),! Edge(3L, 2L, 4), Edge(3L, 5L, 3),! Edge(4L, 1L, 1), Edge(5L, 3L, 9)! )! ! val nodeRDD: RDD[(Long, Peep)] = sc.parallelize(nodeArray)! val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)! val g: Graph[Peep, Int] = Graph(nodeRDD, edgeRDD)! ! val results = g.triplets.filter(t => t.attr > 7)! ! for (triplet <- results.collect) {! println(s"${} loves ${}")! } GraphX: demo
  • 27. evaluationoptimizationrepresentationcirca 2010 ETL into cluster/cloud data data visualize, reporting Data Prep Features Learners, Parameters Unsupervised Learning Explore train set test set models Evaluate Optimize Scoring production data use cases data pipelines actionable results decisions, feedback bar developers foo algorithms Typical Workflows:
  • 28. Workflows: Scraper pipeline Typical data rates, e.g., for • ~2K msgs/month • ~6 MB as JSON • ~13 MB parsed Three months’ list activity represents a graph of: • 1061 senders • 753,400 nodes • 1,027,806 edges A big graph! However, it satisfies definition for a 
 graph-parallel system; lots of data locality to leverage
  • 29. Workflows: A Few Notes about Microservices and Containers The Strengths andWeaknesses of Microservices
 Abel Avram DockerCon EU Keynote: State of the Art in Microservices
 Adrian Cockcroft europe-keynote-state-of-the-art-in-microservices- by-adrian-cockcroft-battery-ventures/ Microservices Architecture
 Martin Fowler
  • 30. Workflows: An Example… Python-based service in a Docker container? Just Enough Math, IPython+Docker
 Paco Nathan, Andrew Odewahn, Kyle Kelly Docker Jumpstart
 Andrew Odewahn
  • 31. Workflows: A Brief Note about ETL in SparkSQL Spark SQL Data Sources API: Unified Data Access for the Spark Platform
 Michael Armbrust data-sources-api-unified-data-access-for- the-spark-platform.html
  • 32. This Workflow: Microservices meet Parallel Processing services email archives community leaderboards SparkSQL Data Prep Features Explore Scraper / Parser NLTK data Unique Word IDs TextRank, Word2Vec, etc. community insights not so big data… relatively big compute…
  • 33. Workflows: Scraper pipeline message JSON Py filter quoted content Apache email list archive urllib2 crawl monthly list by date Py segment paragraphs
  • 34. Workflows: Scraper pipeline message JSON Py filter quoted content Apache email list archive urllib2 crawl monthly list by date Py segment paragraphs {! "date": "2014-10-01T00:16:08+00:00",! "id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",! "next_url": " "prev_thread": "",! "sender": "Debasish Das <>",! "subject": "Re: memory vs data_size",! "text": "nOnly fit the data in memory where you want to run the iterativenalgorithm....n }
  • 36. TextBlob tag and lemmatize words TextBlob segment sentences TextBlob sentiment analysis Py generate skip-grams parsed JSON message JSON Treebank, WordNet Workflows: Parser pipeline {! "graf": [ [1, "Only", "only", "RB", 1, 0], [2, "fit", "fit", "VBP", 1, 1 ] ... ],! "id": “CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "polr": 0.2,! "sha1": "178b7a57ec6168f20a8a4f705fb8b0b04e59eeb7",! "size": 14,! "subj": 0.7,! "tile": [ [1, 2], [2, 3], [3, 4] ... ]! ]! } {! "date": "2014-10-01T00:16:08+00:00",! "id": "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw",! "next_thread": "CALEj8eP5hpQDM=p2xryL-JT-x_VhkRcD59Q+9Qr9LJ9sYLeLVg",! "next_url": " "prev_thread": "",! "sender": "Debasish Das <>",! "subject": "Re: memory vs data_size",! "text": "nOnly fit the data in memory where you want to run the iterativenalgorithm....nnFor }
  • 37. Workflows: TextRank pipeline Spark create word graph RDD word graph NetworkX visualize graph GraphX run TextRank Spark extract phrases ranked phrases parsed JSON
  • 38. Workflows: TextRank pipeline "Compatibility of systems of linear constraints" [{'index': 0, 'stem': 'compat', 'tag': 'NNP','word': 'compatibility'}, {'index': 1, 'stem': 'of', 'tag': 'IN', 'word': 'of'}, {'index': 2, 'stem': 'system', 'tag': 'NNS', 'word': 'systems'}, {'index': 3, 'stem': 'of', 'tag': 'IN', 'word': 'of'}, {'index': 4, 'stem': 'linear', 'tag': 'JJ', 'word': 'linear'}, {'index': 5, 'stem': 'constraint', 'tag': 'NNS','word': 'constraints'}] compat system linear constraint 1: 2: 3: TextRank: Bringing Order intoTexts Rada Mihalcea, Paul Tarau papers/mihalcea.emnlp04.pdf
  • 41. TextRank impl: load parquet files import org.apache.spark.graphx._! import org.apache.spark.rdd.RDD! ! val sqlCtx = new org.apache.spark.sql.SQLContext(sc)! import sqlCtx._! ! val edge = sqlCtx.parquetFile("graf_edge.parquet")! edge.registerTempTable("edge")! ! val node = sqlCtx.parquetFile("graf_node.parquet")! node.registerTempTable("node")! ! // pick one message as an example; at scale we'd parallelize! val msg_id = "CA+B-+fyrBU1yGZAYJM_u=gnBVtzB=sXoBHkhmS-6L1n8K5Hhbw"
  • 42. TextRank impl: use SparkSQL to collect node list + edge list val sql = """! SELECT node_id, root ! FROM node ! WHERE id='%s' AND keep='1'! """.format(msg_id)! ! val n = sqlCtx.sql(sql.stripMargin).distinct()! val nodes: RDD[(Long, String)] ={ p =>! (p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[String])! }! nodes.collect()! ! val sql = """! SELECT node0, node1 ! FROM edge ! WHERE id='%s'! """.format(msg_id)! ! val e = sqlCtx.sql(sql.stripMargin).distinct()! val edges: RDD[Edge[Int]] ={ p =>! Edge(p(0).asInstanceOf[Int].toLong, p(1).asInstanceOf[Int].toLong, 0)! }! edges.collect()
  • 43. TextRank impl: use GraphX to run PageRank // run PageRank! val g: Graph[String, Int] = Graph(nodes, edges)! val r = g.pageRank(0.0001).vertices! ! r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)! ! // save the ranks! case class Rank(id: Int, rank: Float)! val rank = => Rank(p._1.toInt, p._2.toFloat))! rank.registerTempTable("rank")! ! def median[T](s: Seq[T])(implicit n: Fractional[T]) = {! import n._! val (lower, upper) = s.sortWith(_<_).splitAt(s.size / 2)! if (s.size % 2 == 0) (lower.last + upper.head) / fromInt(2) else upper.head! }! ! val min_rank = median(
  • 44. TextRank impl: join ranked words with parsed text var span:List[String] = List()! var last_index = -1! var rank_sum = 0.0! ! var phrases:collection.mutable.Map[String, Double] = collection.mutable.Map()! ! val sql = """! SELECT n.num, n.raw, r.rank! FROM node n JOIN rank r ON n.node_id = ! WHERE'%s' AND n.keep='1'! ORDER BY n.num! """.format(msg_id)! ! val s = sqlCtx.sql(sql.stripMargin).collect()
  • 45. TextRank impl: “pull strings” for the top-ranked keyphrases s.foreach { x => ! //println (x)! val index = x.getInt(0)! val word = x.getString(1)! val rank = x.getFloat(2)! var isStop = false! ! // test for break from past! if (span.size > 0 && rank < min_rank) isStop = true! if (span.size > 0 && (index - last_index > 1)) isStop = true! ! // clear accumulation! if (isStop) {! val phrase = span.mkString(" ")! phrases += (phrase -> rank_sum)! ! span = List()! last_index = index! rank_sum = 0.0! }! ! // start or append! if (rank >= min_rank) {! span = span :+ word! last_index = index! rank_sum += rank! }! }!
  • 46. TextRank impl: report the top keyphrases // summarize the text as a list of ranked keyphrases! val summary = sc.parallelize(phrases.toSeq)! .distinct()! .sortBy(_._2, ascending=false)
  • 48. Reply Graph: load parquet files import org.apache.spark.graphx._! import org.apache.spark.rdd.RDD! ! val sqlCtx = new org.apache.spark.sql.SQLContext(sc)! import sqlCtx._! ! val edge = sqlCtx.parquetFile("reply_edge.parquet")! edge.registerTempTable("edge")! ! val node = sqlCtx.parquetFile("reply_node.parquet")! node.registerTempTable("node")! ! edge.schemaString! node.schemaString
  • 49. Reply Graph: use SparkSQL to collect node list + edge list val sql = "SELECT id, sender FROM node"! val n = sqlCtx.sql(sql).distinct()! val nodes: RDD[(Long, String)] ={ p =>! (p(0).asInstanceOf[Long], p(1).asInstanceOf[String])! }! nodes.collect()! ! val sql = "SELECT replier, sender, num FROM edge"! val e = sqlCtx.sql(sql).distinct()! val edges: RDD[Edge[Int]] ={ p =>! Edge(p(0).asInstanceOf[Long], p(1).asInstanceOf[Long], p(2).asInstanceOf[Int])! }! edges.collect()
  • 50. Reply Graph: use GraphX to run graph analytics // run graph analytics! val g: Graph[String, Int] = Graph(nodes, edges)! val r = g.pageRank(0.0001).vertices! r.join(nodes).sortBy(_._2._1, ascending=false).foreach(println)! ! // define a reduce operation to compute the highest degree vertex! def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {! if (a._2 > b._2) a else b! }! ! // compute the max degrees! val maxInDegree: (VertexId, Int) = g.inDegrees.reduce(max)! val maxOutDegree: (VertexId, Int) = g.outDegrees.reduce(max)! val maxDegrees: (VertexId, Int) = g.degrees.reduce(max)! ! // connected components! val scc = g.stronglyConnectedComponents(10).vertices! node.join(scc).foreach(println)
  • 51. Reply Graph: PageRank of top dev@spark email, 4Q2014 (389,(22.690229478710016,Sean Owen <>))! (857,(20.832469059298248,Akhil Das <>))! (652,(13.281821379806798,Michael Armbrust <>))! (101,(9.963167550803664,Tobias Pfeiffer <>))! (471,(9.614436778460558,Steve Lewis <>))! (931,(8.217073486575732,shahab <>))! (48,(7.653814912512137,ll <>))! (1011,(7.602002681952157,Ashic Mahtab <>))! (1055,(7.572376489758199,Cheng Lian <>))! (122,(6.87247388819558,Gerard Maas <>))! (904,(6.252657820614504,Xiangrui Meng <>))! (827,(6.0941062762076115,Jianshi Huang <>))! (887,(5.835053915864531,Davies Liu <>))! (303,(5.724235650446037,Ted Yu <>))! (206,(5.430238461114108,Deep Pradhan <>))! (483,(5.332452537151523,Akshat Aranya <>))! (185,(5.259438927615685,SK <>))! (636,(5.235941228955769,Matei Zaharia <matei.zaha…>))! ! // seaaaaaaaaaan!! maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (389,126)! maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (389,170)! maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (389,296)
  • 52. Reply Graph: What SSSP looks like in GraphX/Pregel com/databricks/apps/graphx/sssp.scala
  • 53. Look Ahead: Where is this heading? Feature learning withWord2Vec
 Matt Krzus ranked phrases GraphX run Con.Comp. MLlib run Word2Vec aggregated by topic MLlib run KMeans topic vectors better than LDA? features… models… insights…
  • 55. Apache Spark developer certificate program • • defined by Spark experts @Databricks • assessed by O’Reilly Media • establishes the bar for Spark expertise certification:
  • 56. MOOCs: Anthony Joseph
 UC Berkeley begins 2015-02-23 berkeleyx-cs100-1x- introduction-big-6181 Ameet Talwalkar
 UCLA begins 2015-04-14 uc-berkeleyx-cs190-1x- scalable-machine-6066
  • 57. community: events worldwide: ! video+preso archives: resources: workshops:
  • 59. confs: Strata CA
 San Jose, Feb 18-20 Spark Summit East
 NYC, Mar 18-19 Big Data Tech Con
 Boston, Apr 26-28 Strata EU
 London, May 5-7 Spark Summit 2015
 SF, Jun 15-17
  • 60. books: Fast Data Processing 
 with Spark
 Holden Karau
 Packt (2013) Spark in Action
 Chris Fregly
 Manning (2015*) Learning Spark
 Holden Karau, 
 Andy Konwinski, Matei Zaharia
 O’Reilly (2015*)
  • 61. presenter: Just Enough Math O’Reilly, 2014
 preview: monthly newsletter for updates, 
 events, conf summaries, etc.: Enterprise Data Workflows with Cascading O’Reilly, 2013