Extending Structured Streaming Made Easy with Algebra with Erik Erlandson

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft
Erik Erlandson
Red Hat, Inc.
Extending Structured
Streaming Made Easy
with Algebra
#SAISDev2
eje@redhat.com
@manyangled
#SAISDev2
In The Beginning
2
D
A
T
A
#SAISDev2
In The Beginning
3
D
A
T
A
1 Data Set
#SAISDev2
In The Beginning
4
D
A
T
A
1 File
1 Data Set
#SAISDev2
In The Beginning
5
D
A
T
A
1 Machine
1 Data Set
1 File
#SAISDev2
Sum
6
2
3
5
#SAISDev2
Sum
7
2
3
5
s = 0
#SAISDev2
Sum
8
2
3
5
s = s + 2 (2)
#SAISDev2
Sum
9
2
3
5
s = s + 3 (5)
#SAISDev2
Sum
10
2
3
5 s = s + 5 (10)
#SAISDev2
Commodity Scale-Out
11
2003 Google FS
2004 MapReduce
#SAISDev2
Commodity Scale-Out
12
2003 Google FS
2004 MapReduce
2006 Hadoop & HDFS
#SAISDev2
Commodity Scale-Out
13
2003 Google FS
2004 MapReduce
2006 Hadoop & HDFS
2009 Spark & RDD
#SAISDev2
Commodity Scale-Out
14
2003 Google FS
2004 MapReduce
2006 Hadoop & HDFS
2009 Spark & RDD
2015 DataFrame
2016 Structured Streaming
#SAISDev2
Our Scale-Out World
15
2
3
2
5
3
5
2
3
5
logical
#SAISDev2
Our Scale-Out World
16
2 3 2
5 3 5
2 3 5
2
3
2
5
3
5
2
3
5
physical
logical
#SAISDev2
Scale-Out Sum
17
2 3 5
s
=
0
#SAISDev2
Scale-Out Sum
18
2 3 5
s
=
s
+
2
(2)
#SAISDev2
Scale-Out Sum
19
2 3 5
s
=
s
+
3
(5)
#SAISDev2
Scale-Out Sum
20
2 3 5
s
=
s
+
5
(10)
#SAISDev2
Scale-Out Sum
21
2 3 5 10
#SAISDev2
Scale-Out Sum
22
5 3 5
2 3 5 10
13
#SAISDev2
Scale-Out Sum
23
2 3 2
5 3 5
2 3 5 10
13
7
#SAISDev2
Scale-Out Sum
24
2 3 2
5 3 5
2 3 5 10
13 + 7 = 20
#SAISDev2
Scale-Out Sum
25
2 3 2
5 3 5
2 3 5 10 + 20 = 30
#SAISDev2
Unique
26
2 3 5
{}
#SAISDev2
Unique
27
2 3 5
{}+
2
=
{2}
#SAISDev2
Unique
28
2 3 5
{2}+
3
=
{2,3}
#SAISDev2
Unique
29
2 3 5
{2,3}+
5
=
{2,3,5}
#SAISDev2
Unique
30
2 3 5 {2,3,5}
#SAISDev2
Unique
31
5 3 5
2 3 5 {2,3,5}
{3,5}
#SAISDev2
Unique
32
2 3 2
5 3 5
2 3 5 {2,3,5}
{3,5}
{2,3}
#SAISDev2
Unique
33
2 3 2
5 3 5
2 3 5
{3,5} U {2,3} = {2,3,5}
{2,3,5}
#SAISDev2
Unique
34
2 3 2
5 3 5
2 3 5 {2,3,5} U {2,3,5} = {2,3,5}
#SAISDev2
Patterns
35
Examples Sum Unique Pattern
s = 0
s = {}
0 { }
zero
(aka identity)
#SAISDev2
Patterns
36
Examples Sum Unique Pattern
s = 0
s = {}
0 { }
zero
(aka identity)
2 + 3 = 5
{2} + 3 = {2, 3}
addition set insertion
update
(aka reduce)
#SAISDev2
Patterns
37
Examples Sum Unique Pattern
s = 0
s = {}
0 { }
zero
(aka identity)
2 + 3 = 5
{2} + 3 = {2, 3}
addition set insertion
update
(aka reduce)
13 + 7 = 20
{3,5} U {2,3} = {2,3,5}
addition set union
merge
(aka combine)
#SAISDev2
Spark Operators
38
Operation Data Accumulator Zero Update Merge
Sum Numbers Number 0 a + x a1 + a2
#SAISDev2
Spark Operators
39
Operation Data Accumulator Zero Update Merge
Sum Numbers Number 0 a + x a1 + a2
Max Numbers Number -∞ max(a, x) max(a1, a2)
#SAISDev2
Spark Operators
40
Operation Data Accumulator Zero Update Merge
Sum Numbers Number 0 a + x a1 + a2
Max Numbers Number -∞ max(a, x) max(a1, a2)
Average Numbers (sum, count) (0, 0) (sum + x, count + 1) (s1 + s2, c1 + c2)
Present
sum / count
#SAISDev2
Shhhhh...
41
Operation Data Accumulator Zero Update Merge
Sum Numbers Number 0 a + x a1 + a2
Max Numbers Number -∞ max(a, x) max(a1, a2)
Average Numbers (sum, count) (0, 0) (sum + x, count + 1) (s1 + s2, c1 + c2)
We’re secretly algebras!
#SAISDev2
Algebras are Pattern Checklists
42
Object Sets
(data types) ✔
#SAISDev2
Algebras are Pattern Checklists
43
Object Sets
(data types) ✔
Operations ✔
#SAISDev2
Algebras are Pattern Checklists
44
Object Sets
(data types) ✔
Operations ✔
Properties ✔
#SAISDev2
DataFrame Aggregations...
records.show(5)
+----------+---------+
| user_id|wordcount|
+----------+---------+
|6458791872| 12|
|7699035787| 5|
|2509155359| 9|
|9914782373| 18|
|7816616846| 12|
+----------+---------+
45
records.groupBy($"user_id")
.agg(avg($"wordcount").alias("avg"))
.orderBy($"avg".desc)
.show(5)
+----------+----+
| user_id| avg|
+----------+----+
|9438801796|42.0|
|0837938601|41.0|
|0004926696|40.0|
|7439949213|39.0|
|2505585758|39.0|
+----------+----+
#SAISDev2
DataFrame Aggregations...
records.show(5)
+----------+---------+
| user_id|wordcount|
+----------+---------+
|6458791872| 12|
|7699035787| 5|
|2509155359| 9|
|9914782373| 18|
|7816616846| 12|
+----------+---------+
46
records.groupBy($"user_id")
.agg(avg($"wordcount").alias("avg"))
.orderBy($"avg".desc)
.show(5)
+----------+----+
| user_id| avg|
+----------+----+
|9438801796|42.0|
|0837938601|41.0|
|0004926696|40.0|
|7439949213|39.0|
|2505585758|39.0|
+----------+----+
Are Algebras!
#SAISDev2
Aggregator Algebra
47
Data Type
#SAISDev2
Aggregator Algebra
48
Data Type
Accumulator Type
#SAISDev2
Aggregator Algebra
49
Data Type
Accumulator Type
Zero
#SAISDev2
Aggregator Algebra
50
Data Type
Accumulator Type
Zero
Update
#SAISDev2
Aggregator Algebra
51
Data Type
Accumulator Type
Zero
Update
Merge
#SAISDev2
Aggregator Algebra
52
Data Type
Accumulator Type
Zero
Update
Merge
Present
#SAISDev2
Merge Is Associative
53
10 13 7
2
3
5
5
3
5
2
3
2
(10 + 13) + 7 = 30
#SAISDev2
Merge Is Associative
54
10 13 7
2
3
5
5
3
5
2
3
2
(10 + 13) + 7 = 30
10 + (13 + 7) = 30
#SAISDev2
Merge Is (usually) Commutative
55
10 13 7
2
3
5
5
3
5
2
3
2
10 + 13 + 7 = 30
#SAISDev2
Merge Is (usually) Commutative
56
10 13 7
2
3
5
5
3
5
2
3
2
10 + 13 + 7 = 30
13 + 7 + 10 = 30
#SAISDev2
Merge Is (usually) Commutative
57
10 13 7
2
3
5
5
3
5
2
3
2
sum a1 + a2 = a2 + a1
max max(a1, a2) = max(a2, a1)
avg (s1+s2, n1+n2) = (s2+s1, n2+n1)
#SAISDev2
Structured Streaming
58
a 5
c 7
a 5
c 7
user wordcount
a 5
c 7
Aggregations
Logical
#SAISDev2
Structured Streaming
59
a 5
c 7
b 8
c 10
a 5
c 7
a 5
c 7
b 8
c 10
user wordcount
a 5
b 8
c 17
a 5
c 7 Aggregations
Logical
Discarded
#SAISDev2
Structured Streaming
60
a 5
c 7
b 8
c 10
a 9
b 6
a 5
c 7
a 5
c 7
b 8
c 10
user wordcount
a 5
c 7
b 8
c 10
a 9
b 6
a 14
b 14
c 17
a 5
b 8
c 17
a 5
c 7 Aggregations
Logical
Discarded
#SAISDev2
Structured Streaming
61
a 5
c 7
b 4
a 8
c 10
a 3
user wordcount
#SAISDev2
Structured Streaming
62
a 5
c 7
b 4
a 8
c 10
a 3
user wordcount
a 5, 8, 3
b 4
c 7, 10
group-by
#SAISDev2
Structured Streaming
63
a 5
c 7
b 4
a 8
c 10
a 3
user wordcount
a 5, 8, 3
b 4
c 7, 10
a 16
b 4
c 17
group-by
update
#SAISDev2
Structured Streaming
64
a 5
c 7
b 4
a 8
c 10
a 3
user wordcount
a 5, 8, 3
b 4
c 7, 10
a 5
b 8
c 20
a 16
b 4
c 17
a 21
b 12
c 37
group-by
update
merge
#SAISDev2
Structured Streaming
65
a 5
c 7
b 4
a 8
c 10
a 3
user wordcount
a 5, 8, 3
b 4
c 7, 10
a 5
b 8
c 20
a 16
b 4
c 17
a 21
b 12
c 37
group-by
update
merge
S
B
Algebras
#SAISDev2
Algebras Around Us
66
Business
Logic
#SAISDev2
Algebras Around Us
67
Business
Logic
Data Type
Accumulator
Zero
Update
Merge
Present
#SAISDev2
Algebras Around Us
68
Business
Logic
Data Type
Accumulator
Zero
Update
Merge
Present
Structured
Streaming
Query
#SAISDev2
Aggregating Quantiles
69
10
9
12
27
.
.
.
?
aggregating
query
The median of
my data is 11
The 90th
percentile of
my data is 25
#SAISDev2
Distribution Sketch: T-Digest
70
0
1 CDF
#SAISDev2
Distribution Sketch: T-Digest
71
q = 0.9
0
1 CDF
#SAISDev2
Distribution Sketch: T-Digest
72
q = 0.9
0
1
(x,q)
CDF
#SAISDev2
Distribution Sketch: T-Digest
73
q = 0.9
x is the 90th %-ile
0
1
(x,q)
CDF
#SAISDev2
Distribution Sketch: T-Digest
74
q = 0.9
x is the 90th %-ile
0
1
(x,q)
CDF
#SAISDev2
Is T-Digest an Algebra?
75
Data Type Numeric
#SAISDev2
Is T-Digest an Algebra?
76
Data Type Numeric
Accumulator Type T-Digest Sketch
#SAISDev2
Is T-Digest an Algebra?
77
Data Type Numeric
Accumulator Type T-Digest Sketch
Zero Empty T-Digest
#SAISDev2
Is T-Digest an Algebra?
78
Data Type Numeric
Accumulator Type T-Digest Sketch
Zero Empty T-Digest
Update tdigest + x
#SAISDev2
Is T-Digest an Algebra?
79
Data Type Numeric
Accumulator Type T-Digest Sketch
Zero Empty T-Digest
Update tdigest + x
Merge tdigest1 + tdigest2
#SAISDev2
Is T-Digest an Algebra?
80
Data Type Numeric
Accumulator Type T-Digest Sketch
Zero Empty T-Digest
Update tdigest + x
Merge tdigest1 + tdigest2
Present tdigest.cdfInverse(quantile)
#SAISDev2
Is T-Digest an Algebra?
81
Data Type Numeric
Accumulator Type T-Digest Sketch
Zero Empty T-Digest
Update tdigest + x
Merge tdigest1 + tdigest2
Present tdigest.cdfInverse(quantile)
YES!
#SAISDev2
User Defined Aggregator Function
82
val sketchCDF = tdigestUDAF[Double]
spark.udf.register("p50",
(c:Any)=>c.asInstanceOf[TDigestSQL].tdigest.cdfInverse(0.5))
spark.udf.register("p90",
(c:Any)=>c.asInstanceOf[TDigestSQL].tdigest.cdfInverse(0.9))
#SAISDev2
User Defined Aggregator Function
83
val sketchCDF = tdigestUDAF[Double]
spark.udf.register("p50",
(c:Any)=>c.asInstanceOf[TDigestSQL].tdigest.cdfInverse(0.5))
spark.udf.register("p90",
(c:Any)=>c.asInstanceOf[TDigestSQL].tdigest.cdfInverse(0.9))
#SAISDev2
Streaming Percentiles
84
val query = records
.writeStream //...
+---------+
|wordcount|
+---------+
| 12|
| 5|
| 9|
| 18|
| 12|
+---------+
val r = records.withColumn("time", current_timestamp())
.groupBy(window($”time”, “30 seconds”))
.agg(sketchCDF($"wordcount").alias("CDF"))
.select(callUDF("p50", $"CDF").alias("p50"),
callUDF("p90", $"CDF").alias("p90"))
val query = r.writeStream //...
+----+----+
| p50| p90|
+----+----+
|15.6|31.0|
|16.0|30.8|
|15.8|30.0|
|15.7|31.0|
|16.0|31.0|
+----+----+
#SAISDev2
Streaming Percentiles
85
val query = records
.writeStream //...
+---------+
|wordcount|
+---------+
| 12|
| 5|
| 9|
| 18|
| 12|
+---------+
val r = records.withColumn("time", current_timestamp())
.groupBy(window($”time”, “30 seconds”))
.agg(sketchCDF($"wordcount").alias("CDF"))
.select(callUDF("p50", $"CDF").alias("p50"),
callUDF("p90", $"CDF").alias("p90"))
val query = r.writeStream //...
+----+----+
| p50| p90|
+----+----+
|15.6|31.0|
|16.0|30.8|
|15.8|30.0|
|15.7|31.0|
|16.0|31.0|
+----+----+
#SAISDev2
Most-Frequent Items
86
#DogRates
#YOLO
#SAIRocks
#TruthBomb
#Mondays
?
aggregating
query
#tag frequency
#DogRates 1000000000
#Blockchain 780000
#TaylorSwift 650000
#SAISDev2
Heavy-Hitter Sketch
87
CountMin
Heap
I estimate
frequencies of
objects!
I store objects
in sorted order!
#SAISDev2
Heavy-Hitter Sketch
88
#DogRates
update
CountMin
Heap
#SAISDev2
Heavy-Hitter Sketch
89
#DogRates
update query
CountMin
Heap
#SAISDev2
Heavy-Hitter Sketch
90
#DogRates
update query
insert
CountMin
Heap
#SAISDev2
Heavy-Hitter Sketch
91
#DogRates
#tag frequency
#DogRates 1000000000
#Blockchain 780000
#TaylorSwift 650000
update query
insert
top k
CountMin
Heap
#SAISDev2
Heavy-Hitter Sketch
92
#DogRates
#tag frequency
#DogRates 1000000000
#Blockchain 780000
#TaylorSwift 650000
update query
insert
top k
CountMin
Heap
#SAISDev2
Is Heavy-Hitter an Algebra?
93
Data Type Items
#SAISDev2
Is Heavy-Hitter an Algebra?
94
Data Type Items
Accumulator Type
countmin sketch
heap
#SAISDev2
Is Heavy-Hitter an Algebra?
95
Data Type Items
Accumulator Type
countmin sketch
heap
Zero
empty countmin
empty heap
#SAISDev2
Is Heavy-Hitter an Algebra?
96
Data Type Items
Accumulator Type
countmin sketch
heap
Zero
empty countmin
empty heap
Update
countmin.update(x)
heap.insert(x, frequency)
#SAISDev2
Is Heavy-Hitter an Algebra?
97
Data Type Items
Accumulator Type
countmin sketch
heap
Zero
empty countmin
empty heap
Update
countmin.update(x)
heap.insert(x, frequency)
Merge
countmin1 + countmin2
update(heap1 + heap2)
#SAISDev2
Is Heavy-Hitter an Algebra?
98
Data Type Items
Accumulator Type
countmin sketch
heap
Zero
empty countmin
empty heap
Update
countmin.update(x)
heap.insert(x, frequency)
Merge
countmin1 + countmin2
update(heap1 + heap2)
Present heap.top(k)
#SAISDev2
Is Heavy-Hitter an Algebra?
99
Data Type Items
Accumulator Type
countmin sketch
heap
Zero
empty countmin
empty heap
Update
countmin.update(x)
heap.insert(x, frequency)
Merge
countmin1 + countmin2
update(heap1 + heap2)
Present heap.top(k)
YES!
#SAISDev2
Streaming Most-Frequent Items
100
val query = records
.writeStream //...
+----------+
| hashtag|
+----------+
| #Gardiner|
| #PETE|
| #Nutiva|
| #Fairfax|
| #Darcy|
+----------+
val windowBy60 =
windowing.windowBy[(Timestamp, String)](_._1, 60)
val top3 = new TopKAgg[(Timestamp, String)](_._2)
val r = records.withColumn("time", current_timestamp())
.as[(Timestamp, String)]
.groupByKey(windowBy60).agg(top3.toColumn)
.map { case (_, tk) => tk.toList.toString }
val query = r.writeStream //...
+------------------------------------------------------+
+ value |
+------------------------------------------------------+
|List((#Iddesleigh,3), (#Gardiner,2), (#Willoughby,2)) |
|List((#Elizabeth,3), (#HERPRESENT,1), (#upforhours,1))|
|List((#PETE,4), (#Nutiva,1), (#Fairfax,1)) |
+------------------------------------------------------+
#SAISDev2
Review
101
#SAISDev2
Review
102
#SAISDev2
Review
103
#SAISDev2
Review
104
#SAISDev2
Review
105
eje@redhat.com
@manyangled
1 of 105

Recommended

Forecasting Revenue With Stationary Time Series Models by
Forecasting Revenue With Stationary Time Series ModelsForecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsGeoffery Mullings
344 views12 slides
R演習補講 (2腕バンディット問題を題材に) by
R演習補講 (2腕バンディット問題を題材に)R演習補講 (2腕バンディット問題を題材に)
R演習補講 (2腕バンディット問題を題材に)Masatoshi Yoshida
3K views14 slides
Muhammad ariefnugraha 142014066_kode4 by
Muhammad ariefnugraha 142014066_kode4Muhammad ariefnugraha 142014066_kode4
Muhammad ariefnugraha 142014066_kode4Muhammad Nugraha
41 views65 slides
R v01 rprogamming_basic01 (R 프로그래밍 기본) by
R v01 rprogamming_basic01 (R 프로그래밍 기본)R v01 rprogamming_basic01 (R 프로그래밍 기본)
R v01 rprogamming_basic01 (R 프로그래밍 기본)BuskersBu
18 views32 slides
cs8project by
cs8projectcs8project
cs8projectKevin Zhang
230 views6 slides
Implementing qrcode by
Implementing qrcodeImplementing qrcode
Implementing qrcodeBoshan Sun
671 views63 slides

More Related Content

Similar to Extending Structured Streaming Made Easy with Algebra with Erik Erlandson

User Defined Aggregation in Apache Spark: A Love Story by
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
251 views50 slides
User Defined Aggregation in Apache Spark: A Love Story by
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
402 views50 slides
Beyond PHP - It's not (just) about the code by
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeWim Godden
2.9K views56 slides
Algebra and Trigonometry 9th Edition Larson Solutions Manual by
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manualkejeqadaqo
2.6K views163 slides
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest... by
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...Tarek Dib
857 views8 slides
Introduction to R by
Introduction to RIntroduction to R
Introduction to RSander Kieft
496 views26 slides

Similar to Extending Structured Streaming Made Easy with Algebra with Erik Erlandson(20)

User Defined Aggregation in Apache Spark: A Love Story by Databricks
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks251 views
User Defined Aggregation in Apache Spark: A Love Story by Databricks
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks402 views
Beyond PHP - It's not (just) about the code by Wim Godden
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden2.9K views
Algebra and Trigonometry 9th Edition Larson Solutions Manual by kejeqadaqo
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manual
kejeqadaqo2.6K views
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest... by Tarek Dib
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...
Logistic Regression, Linear and Quadratic Discriminant Analysis and K-Nearest...
Tarek Dib857 views
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in... by Data Con LA
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Data Con LA871 views
GCC by Kir Chou
GCCGCC
GCC
Kir Chou3.8K views
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO by Altinity Ltd
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd3.1K views
Beyond php - it's not (just) about the code by Wim Godden
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden957 views
SQL techniques for faster applications by Connor McDonald
SQL techniques for faster applicationsSQL techniques for faster applications
SQL techniques for faster applications
Connor McDonald127 views
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介 by Masayuki Matsushita
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Remainder estimate for MathML integration.pdf by Igalia
Remainder estimate for MathML integration.pdfRemainder estimate for MathML integration.pdf
Remainder estimate for MathML integration.pdf
Igalia65 views
Beyond php - it's not (just) about the code by Wim Godden
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden403 views
Beyond php - it's not (just) about the code by Wim Godden
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden8 views
RDataMining slides-r-programming by Yanchang Zhao
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
Yanchang Zhao338 views
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き) by Wataru Shito
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
Wataru Shito472 views

More from Databricks

DW Migration Webinar-March 2022.pptx by
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 slides
Data Lakehouse Symposium | Day 1 | Part 1 by
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K views43 slides
Data Lakehouse Symposium | Day 1 | Part 2 by
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
739 views16 slides
Data Lakehouse Symposium | Day 4 by
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K views74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K views64 slides
Democratizing Data Quality Through a Centralized Platform by
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K views36 slides

More from Databricks(20)

DW Migration Webinar-March 2022.pptx by Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K views
Data Lakehouse Symposium | Day 1 | Part 1 by Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K views
Data Lakehouse Symposium | Day 1 | Part 2 by Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks739 views
Data Lakehouse Symposium | Day 4 by Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K views
Democratizing Data Quality Through a Centralized Platform by Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K views
Learn to Use Databricks for Data Science by Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K views
Why APM Is Not the Same As ML Monitoring by Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 views
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix by Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 views
Stage Level Scheduling Improving Big Data and AI Integration by Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 views
Simplify Data Conversion from Spark to TensorFlow and PyTorch by Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K views
Scaling your Data Pipelines with Apache Spark on Kubernetes by Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K views
Scaling and Unifying SciKit Learn and Apache Spark Pipelines by Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 views
Sawtooth Windows for Feature Aggregations by Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks605 views
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink by Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks675 views
Re-imagine Data Monitoring with whylogs and Spark by Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks550 views
Raven: End-to-end Optimization of ML Prediction Queries by Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks449 views
Processing Large Datasets for ADAS Applications using Apache Spark by Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 views
Massive Data Processing in Adobe Using Delta Lake by Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 views
Machine Learning CI/CD for Email Attack Detection by Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 views

Recently uploaded

TGP 2.docx by
TGP 2.docxTGP 2.docx
TGP 2.docxsandi636490
10 views8 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides
Binder1.pdf by
Binder1.pdfBinder1.pdf
Binder1.pdfEstherSita2
10 views21 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
19 views17 slides
SUPER STORE SQL PROJECT.pptx by
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
12 views16 slides

Recently uploaded(20)

UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862012 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9012 views

Extending Structured Streaming Made Easy with Algebra with Erik Erlandson