SlideShare a Scribd company logo
Rapid Prototyping in
PySpark Streaming
The Thermodynamics of Docker Containers
Rich Seymour @rseymour
Washington DC Area Apache Spark Interactive Meetup
Or trying the
bleeding edge
without
bleeding out
Why?
The Buzzwords
5
 Docker
 Spark
 Thermodynamics (not really a buzzword)
The Buzzversions
6
 Docker (1.5.0) we have stats
 Spark (1.2.1) pyspark streaming!
 Thermodynamics (beta)
Docker
At a high level
(docker) containers
contain your
dependencies
Fire up multiple
services in seconds on
your laptop
Docker
10
 Open Sourced by dotCloud March 2013
 Switched from LXC to libcontainer March 2014
 Written in Go
 Allows us to contain dependencies with:
 cgroups, namespaces, capabilities, netlink, netfilter, etc.
 Currently Linux only, but supported on Amazon, Google,
Microsoft and Redhat cloud offerings
 Gives us a registry and a method for pulling binary diffs
Control Groups (cgroups)
11
 Started in 2007
 Basis for Docker’s control of resources
Cute Logo
12
Cute Logo
13
Spark
“a fast and general
engine for large scale
data processing”
Spark
16
Apache Project born out of UC Berkeley’s Algorithms,
Machines, People Lab (AMPLab)
Java / Scala / Python APIs for computing on redundant
distributed datasets across a cluster of multicore machines.
Resilient Distributed Datasets (RDDs)
17
“Represents an immutable, partitioned collection
of elements that can be operated on in parallel.”
Resilient Distributed Datasets (RDDs)
18
“Represents an immutable, partitioned collection
of elements that can be operated on in parallel.”
Immutable: can’t be changed over time.
If you want to preserve a change, create a new
RDD on the left of the equals sign.
Resilient Distributed Datasets (RDDs)
19
“Represents an immutable, partitioned collection
of elements that can be operated on in parallel.”
Partitioned: Split up often by key w/ a partitioner,
if your RDD is made up of key value pairs:
my_rdd = [(1,”Apple”), (2,”IBM”)]
aggregate
aggregateByKey
cache
cartesian
checkpoint
coalesce
cogroup
collect
collectAsMap
combineByKey
context
count
countApprox
countApproxDistinct
countByKey
countByValue
distinct
filter
first
flatMap
flatMapValues
fold
foldByKey
foreach
foreachPartition
fullOuterJoin
getCheckpointFile
getNumPartitions
getStorageLevel
glom
groupBy
groupByKey
groupWith
histogram
id
intersection
isCheckpointed
join
keyBy
keys
leftOuterJoin
lookup
map
mapPartitions
mapPartitionsWithIndex
mapPartitionsWithSplit
mapValues
max
mean
meanApprox
min
name
partitionBy
persist
pipe
randomSplit
reduce
reduceByKey
reduceByKeyLocally
repartition
repartitionAndSortWithinPartit
ions
rightOuterJoin
sample
sampleByKey
sampleStdev
sampleVariance
saveAsHadoopDataset
saveAsHadoopFile
saveAsNewAPIHadoopDatas
et
saveAsNewAPIHadoopFile
saveAsPickleFile
saveAsSequenceFile
saveAsTextFile
setName
sortBy
sortByKey
stats
stdev
subtract
subtractByKey
sum
sumApprox
take
takeOrdered
takeSample
toDebugString
top
union
unpersist
values
variance
zip
zipWithIndex
zipWithUniqueId
20
94 RDD methods… (we’ll revisit these)
94 methods?
Transformations
Actions
aggregate
aggregateByKey
cache
cartesian
checkpoint
coalesce
cogroup
collect
collectAsMap
combineByKey
context
count
countApprox
countApproxDistinct
countByKey
countByValue
distinct
filter
first
flatMap
flatMapValues
fold
foldByKey
foreach
foreachPartition
fullOuterJoin
getCheckpointFile
getNumPartitions
getStorageLevel
glom
groupBy
groupByKey
groupWith
histogram
id
intersection
isCheckpointed
join
keyBy
keys
leftOuterJoin
lookup
map
mapPartitions
mapPartitionsWithIndex
mapPartitionsWithSplit
mapValues
max
mean
meanApprox
min
name
partitionBy
persist
pipe
randomSplit
reduce
reduceByKey
reduceByKeyLocally
repartition
repartitionAndSortWithinPartit
ions
rightOuterJoin
sample
sampleByKey
sampleStdev
sampleVariance
saveAsHadoopDataset
saveAsHadoopFile
saveAsNewAPIHadoopDatas
et
saveAsNewAPIHadoopFile
saveAsPickleFile
saveAsSequenceFile
saveAsTextFile
setName
sortBy
sortByKey
stats
stdev
subtract
subtractByKey
sum
sumApprox
take
takeOrdered
takeSample
toDebugString
top
union
unpersist
values
variance
zip
zipWithIndex
zipWithUniqueId
23
94 RDD methods… and a pipe is one
Pipes circa 1964 – Doug McIlroy
25
Summary--what's most important.
To put my strongest concerns into a nutshell:
1. We should have some ways of coupling programs like
garden hose--screw in another segment when it becomes when
it becomes necessary to massage data in another way.
This is the way of IO also.
2. Our loader should be able to do link-loading and
controlled establishment.
3. Our library filing scheme should allow for rather
general indexing, responsibility, generations, data path
switching.
4. It should be possible to get private system components
(all routines are system components) for buggering around
with.
M. D. McIlroy
October 11, 1964
Interesting side notes: http://www.cs.dartmouth.edu/~doug/sieve/
Resilient Distributed Datasets (RDDs)
26
“Represents an immutable, partitioned collection
of elements that can be operated on in parallel.”
27
Wolfsburg – Inside the Volkswagen Plant photo by Roger
https://www.flickr.com/photos/24736216@N07/5869083813/
Such parallel efficiency!
PySpark Streaming
28
Creates Discretized Streams (DStreams) composed of
RDDs which can be processed in microbatches.
In Python
PySpark Streaming
29
Creates Discretized Streams (DStreams) composed of
RDDs which can be processed in microbatches.
In Python
With a lot of caveats
cache
checkpoint
cogroup
combineByKey
context
count
countByValue
countByValueAndWindow
countByWindow
filter
flatMap
flatMapValues
foreachRDD
fullOuterJoin
glom
groupByKey
groupByKeyAndWindow
join
leftOuterJoin
map
mapPartitions
mapPartitionsWithIndex
mapValues
partitionBy
persist
pprint
reduce
reduceByKey
reduceByKeyAndWindow
reduceByWindow
repartition
rightOuterJoin
saveAsTextFiles
slice
transform
transformWith
union
updateStateByKey
window
30
39 DStream methods
PySpark is in some ways
just helpers for Functional
Programming in Python
Functional Programming in Python
32
Please check out this article by Mary Rose Cook (@maryrosecook) in
which she writes:
“Functional code is characterized by one thing:
the absence of side effects”
https://codewords.hackerschool.com/issues/one/an-introduction-to-functional-programming
Today we’ll look at
34
And how they
relate (kinda) to
thermodynamics
GROUPS
CPU limits in cgroups
cpu shares
36
Each container gets 1024 by default. Unless you specify a
different X everything is equal. As soon as you do, the
scheduler steps in.
Thermodynamics...
37
The idea was to run docker containers that try to use all of the
CPU, but then limit them using cgroups and see how well that
works.
"Tolman & Einstein" by Los Angeles Times. Original uploader was Tillman at en.wikipedia - Transferred from en.wikipedia; transfer was stated to be made by User:chazchaz101.(Original text :
Los Angeles Times photographic archive, UCLA Library [1]). Licensed under Public Domain via Wikimedia Commons -
https://commons.wikimedia.org/wiki/File:Tolman_%26_Einstein.jpg#mediaviewer/File:Tolman_%26_Einstein.jpg
STOP
(Py)Spark Streaming is the wrong tool for this job.
39
“You must go on, I can’t go on, I’ll go on,”
40
Or less dramatically
41
Or less dramatically
42
Whatever! We can learn how and why PySpark works in this
situation, even though it isn’t ideal.
So, why?
44
"cpu_stats": {
"cpu_usage": {
"percpu_usage": [
13407699471,
40464379579,
44303391682,
15849983951
],
"total_usage": 114025454683,
"usage_in_kernelmode": 80000000,
"usage_in_usermode": 113940000000
},
"system_cpu_usage": 52519779330000000,
"throttling_data": {}
},
nanoseconds.
nanoseconds.
nanoseconds.
45
def calculate_cpu_percent(prev_cpu, prev_sys, stats):
cpu_percent = 0.0
cpu_delta = float(stats['cpu_stats']['cpu_usage']['total_usage']) - prev_cpu
system_delta = float(stats['cpu_stats']['system_cpu_usage']) - prev_sys
if system_delta > 0.0 and cpu_delta > 0.0:
cpu_percent = (cpu_delta / system_delta) *
float(len(stats['cpu_stats']['cpu_usage']['percpu_usage'])) * 100.0
return cpu_percent
Really easy to do in a for loop:
(eg:
https://github.com/docker/docker/blob/ea8cb16af7e8c83a264a1d1c48db3cacd4cc082b/api/client/commands.go#L264
0-L2665 and
https://github.com/docker/docker/blob/ea8cb16af7e8c83a264a1d1c48db3cacd4cc082b/api/client/commands.go#L275
8-L2771 )
But not straightforward over a DStream of RDDs.
RDDs are generally
not in guaranteed
order
System CPU: 101 nanoseconds
Container X: 12 nanoseconds
Container Y: 16 nanoseconds
System CPU: 106 nanoseconds
Container X: 11 nanoseconds
Container Y: 16 nanoseconds
System CPU: 110 nanoseconds
Container X: 17 nanoseconds
Container Y: 19 nanoseconds
47
System CPU: 101 nanoseconds
Container X: 12 nanoseconds
Container Y: 16 nanoseconds
System CPU: 106 nanoseconds
Container X: 10 nanoseconds
Container Y: 16 nanoseconds
System CPU: 110 nanoseconds
Container X: 17 nanoseconds
Container Y: 19 nanoseconds
48
out of order!!! why not just a FIFO
queue!
./stats.sh |
./easy_perc.py
❯ ./stats.sh | head -200 | ./easy_perc.py
isosystem_medium_2 0.00350599250802
isosystem_medium_2 74.9828826
isosystem_medium_1 0.00353414664914
isosystem_medium_2 93.8992185
isosystem_medium_1 91.7493855
isosystem_medium_1 99.5302075188
isosystem_medium_2 99.1333246115
isosystem_medium_2 99.6940311
isosystem_medium_1 99.6248363
isosystem_large_1 0.00706705183238
isosystem_medium_2 99.2483985
isosystem_medium_1 99.8950363
isosystem_large_1 199.3501746
isosystem_medium_1 99.2755894264
isosystem_large_1 198.486462344
isosystem_medium_2 99.6565749626
isosystem_large_1 199.3187556
isosystem_medium_1 99.1366565
isosystem_medium_2 100.2461571
isosystem_large_1 199.1434093
50
Not perfect, but once
stable it’s nice and
easy
Let’s see how I did it
in PySpark
Streaming
52
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
53
keyed_up = stats 
.map(safe_load) 
.filter(lambda x: x != None) 
.flatMap(key_up) 
.filter(lambda x: x != None) 
.groupByKeyAndWindow(20,5)
{
"id": ”isosystem_large_1",
"stats": {
"read": "2015-02-24T13:28:03.510603276-05:00",
"network": {...},
"cpu_stats": {
"cpu_usage": {
"total_usage": 54016836033,
"percpu_usage": [
14829724030,
8132644889,
17463950886,
13590516228
],
"usage_in_kernelmode": 50000000,
"usage_in_usermode": 53970000000
},
"system_cpu_usage": 52812200870000000,
"throttling_data": {}
},
"memory_stats": {...},
"blkio_stats": {}
}
}
Turn the JSON into 3 key value (K,V) pairs ie:
K: (‘isosystem_large_1’,total_usage)
V: 54016836033.0
((‘isosystem_large_1’,total_usage), 54016836033.0)
((‘isosystem_large_1’,system_cpu_usage), 52812200870000000.0)
((‘isosystem_large_1’, tot_cpus), 4)
54
keyed_up = stats 
.map(safe_load) 
.filter(lambda x: x != None) 
.flatMap(key_up) 
.filter(lambda x: x != None) 
.groupByKeyAndWindow(20,5)
((‘isosystem_large_1’,total_usage), 54016836033.0)
((‘isosystem_large_1’,total_usage), 54016936033.0)
((‘isosystem_large_1’,total_usage), 54017036033.0)
((‘isosystem_large_1’,system_cpu_usage), 52812200870000000.0)
((‘isosystem_large_1’,system_cpu_usage), 52812200870200000.0)
((‘isosystem_large_1’,system_cpu_usage), 52812200870400000.0)
((‘isosystem_large_1’, tot_cpus), 4)
((‘isosystem_large_1’, tot_cpus), 4)
((‘isosystem_large_1’, tot_cpus), 4)
So now groupByKeyAndWindow groups every 20 seconds of data by
key, it then slides it by 5 seconds to keep a moving delta.
55
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
56
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
((‘isosystem_large_1’,total_usage), 10000.0)
((‘isosystem_large_1’,total_usage), 20000.0)
((‘isosystem_large_1’,total_usage), 40000.0)
mins -> ((‘isosystem_large_1’,total_usage), 10000.0)
maxes -> ((‘isosystem_large_1’,total_usage), 40000.0)
57
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
58
diffed = maxes.join(mins) 
.mapValues(lambda x: x[0] - x[1])
)
maxes
((‘isosystem_large_1’,total_usage), 40000.0) mins
((‘isosystem_large_1’,total_usage), 10000.0)
diffed
((‘isosystem_large_1’,total_usage), 30000.0)
join and map the values to their
difference
59
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
Give me 3 streams where I filter by the ‘subkey’
60
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
61
math_me = system_diff
.map(rm_subkey)
.join(total_diff
.map(rm_subkey))
def rm_subkey(x):
return (x[0][0], x[1])
In other words take: ((‘isosystem_large_1’,’total_usage’), 30000.0)
and make: (‘isosystem_large_1’, 30000.0)
and join it with ((‘isosystem_large_1’, ‘system_cpu_usage’), 90000.0) ->
(‘isosystem_large_1’, 90000.0)
-> (‘isosystem_large_1’, 90000.0, 30000.0)
total_usagesystem_cpu_usage
62
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
63
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus 
.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
math_me
(‘isosystem_large_1’, 90000.0, 30000.0)
percs
(‘isosystem_large_1’, 133.3)
join and map the values to their
difference
33.3 <- (30000.0/90000.0) * 100.0
tot_cpus
(‘isosystem_large_1’, 4)
64
keyed_up = stats.map(safe_load).filter(lambda x: x != None).flatMap(key_up) 
.filter(lambda x: x != None).groupByKeyAndWindow(20,5)
mins = keyed_up.mapValues(min)
maxes = keyed_up.mapValues(max)
diffed = maxes.join(mins).mapValues(lambda x: x[0] - x[1])
system_diff = diffed.filter(lambda x: x[0][1] == 'system_cpu_usage')
total_diff = diffed.filter(lambda x: x[0][1] == 'total_usage')
tot_cpus = maxes.filter(lambda x: x[0][1] == 'tot_cpus')
math_me = system_diff.map(rm_subkey).join(total_diff.map(rm_subkey))
percs = math_me.mapValues(lambda x: x[1]/x[0] * 100.0 if x[0] > 0.01 else 0) 
.join(tot_cpus.map(rm_subkey)) 
.mapValues(lambda x: x[0]*x[1])
percs.filter(lambda x: x != None).pprint()
Finally an Action!
65
def calculate_cpu_percent(prev_cpu, prev_sys, stats):
cpu_percent = 0.0
cpu_delta = float(stats['cpu_stats']['cpu_usage']['total_usage']) - prev_cpu
system_delta = float(stats['cpu_stats']['system_cpu_usage']) - prev_sys
if system_delta > 0.0 and cpu_delta > 0.0:
cpu_percent = (cpu_delta / system_delta) *
float(len(stats['cpu_stats']['cpu_usage']['percpu_usage'])) * 100.0
return cpu_percent
The future!
67
[SPARK-5154] [PySpark] [Streaming]
Kafka streaming support in Python
[SPARK-5704] [SQL] [PySpark]
createDataFrame from RDD with columns
Coming in Spark 1.3.0!
Things I’d like to see
69
Could we abstract out the best parts of PySpark to work
as a pure Python library?
Even just for local use?
Thank You
@rseymour
Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containers 2015 02-24 Washington DC Apache Spark Interactive

More Related Content

What's hot

Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
Ilya Ganelin
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
spinningmatt
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
Taras Matyashovsky
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)
hiteshnd
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Christian Perone
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
Bojan Babic
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
Olesya Eidam
 
PySaprk
PySaprkPySaprk
PySaprk
Giivee The
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Samy Dindane
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Josef A. Habdank
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 

What's hot (20)

Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
PySaprk
PySaprkPySaprk
PySaprk
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 

Similar to Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containers 2015 02-24 Washington DC Apache Spark Interactive

Container Mythbusters
Container MythbustersContainer Mythbusters
Container Mythbusters
inside-BigData.com
 
What is this "docker"
What is this  "docker" What is this  "docker"
What is this "docker"
Jean-Marc Meessen
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Containers & Security
Containers & SecurityContainers & Security
Containers & Security
All Things Open
 
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas ClusterVictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics
 
Intro to containerization
Intro to containerizationIntro to containerization
Intro to containerization
Balint Pato
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
Daniel Nüst
 
Hack the whale
Hack the whaleHack the whale
Hack the whale
Marco Ferrigno
 
Docker Platform and Ecosystem
Docker Platform and EcosystemDocker Platform and Ecosystem
Docker Platform and Ecosystem
Patrick Chanezon
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
SIGHUP
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
Jeff Henrikson
 
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Walid Shaari
 
Practical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
Practical Container Security by Mrunal Patel and Thomas Cameron, Red HatPractical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
Practical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
Docker, Inc.
 
Situation Normal - UKUUG Mar'10
Situation Normal - UKUUG Mar'10Situation Normal - UKUUG Mar'10
Situation Normal - UKUUG Mar'10
Simon Wardley
 
Situation Normal - Presentation at NottTuesday
Situation Normal - Presentation at NottTuesdaySituation Normal - Presentation at NottTuesday
Situation Normal - Presentation at NottTuesday
Simon Wardley
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
Ricardo Amaro
 
Simon Wardley
Simon WardleySimon Wardley
Simon Wardley
Skills Matter
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
Hajime Tazaki
 
Computer cluster
Computer clusterComputer cluster

Similar to Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containers 2015 02-24 Washington DC Apache Spark Interactive (20)

Container Mythbusters
Container MythbustersContainer Mythbusters
Container Mythbusters
 
What is this "docker"
What is this  "docker" What is this  "docker"
What is this "docker"
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Containers & Security
Containers & SecurityContainers & Security
Containers & Security
 
VictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas ClusterVictoriaMetrics for the Atlas Cluster
VictoriaMetrics for the Atlas Cluster
 
Intro to containerization
Intro to containerizationIntro to containerization
Intro to containerization
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Hack the whale
Hack the whaleHack the whale
Hack the whale
 
Docker Platform and Ecosystem
Docker Platform and EcosystemDocker Platform and Ecosystem
Docker Platform and Ecosystem
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...Containers - Portable, repeatable user-oriented application delivery. Build, ...
Containers - Portable, repeatable user-oriented application delivery. Build, ...
 
Practical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
Practical Container Security by Mrunal Patel and Thomas Cameron, Red HatPractical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
Practical Container Security by Mrunal Patel and Thomas Cameron, Red Hat
 
Situation Normal - UKUUG Mar'10
Situation Normal - UKUUG Mar'10Situation Normal - UKUUG Mar'10
Situation Normal - UKUUG Mar'10
 
Situation Normal - Presentation at NottTuesday
Situation Normal - Presentation at NottTuesdaySituation Normal - Presentation at NottTuesday
Situation Normal - Presentation at NottTuesday
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
Simon Wardley
Simon WardleySimon Wardley
Simon Wardley
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 

Recently uploaded

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Rapid Prototyping in PySpark Streaming: The Thermodynamics of Docker Containers 2015 02-24 Washington DC Apache Spark Interactive

Editor's Notes

  1. YARN and Mesos also support cgroups
  2. YARN and Mesos also support cgroups
  3. YARN and Mesos also support cgroups
  4. Range or hash partitioning.
  5. There are lot’s of cool ways to run things in parallel from the command line GNU parallel is one
  6. YARN and Mesos also support cgroups
  7. Tolman & Einstein
  8. http://localhost:8888/notebooks/DStream%20CPU%20Percentage.ipynb