SlideShare a Scribd company logo
1 of 30
Download to read offline
Liang Zhang (lzhang6@wpi.edu)
Data Science Dept.,
Worcester Polytechnic Institute
Spark-ITS:
Indexing for Large-Scale Time
Series Data on Spark
#SAISEco5
Prof. Elke A. Rundensteiner
Prof. Mohamed Y. Eltabakh
Liang Zhang
Noura Alghamdi
Data Science Research Group @ Worcester Polytechnic Institute
Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner. TARDIS: Distributed Indexing Framework for
Big Time Series Data. Proceedings of 35th IEEE International Conference on Data Engineering ICDE, 2019
Outline
• Motivation
• Background
• Spark-ITS Framework
– Overview
– Index Construction
– Query Processing
• Performance Evaluation
3
Time Series are Continuously Produced Everywhere
• How to deal with billions
of time series?
4
Climate	data Web	log
Stock	priceEEG
Almost all Time Series Data Mining Tasks rely on
Similarity Query
5
Esling,	Philippe,	and	Carlos	Agon.	"Time-series	data	mining." ACM	(CSUR) 45.1	(2012):	12.
Clustering Motif	Discovery
Classification
çΩ
Outlier Detection
Whole	Matching
Subsequence	Matching
Spark-ITS
• A new Index Tree and an effective Signature to simplify
the cardinality conversion and keep better similarity
• A Distributed Index Framework to support large-scale
time series dataset
• Efficient algorithms for Exact Match and kNN
Approximate queries process
6
Spark-ITS Overview
7
Global Index
Indexed	Data
Local Index
Query
Partition
1. Sampling
2. Node Statistic
3. Build Index Tree
4. Assign Partition ID
1. Construct Local Structure
2. Construct Bloom Filter
1. Read and convert data
2. Shuffle data
Background:	iSAX	Representation
8
Shieh, Jin, and Eamonn Keogh. "iSAX: indexing and mining terabyte sized time series." SIGKDD ACM, 2008.
Camerra, A., Palpanas, T., Shieh, J., & Keogh, E. "iSAX 2.0: Indexing and mining one billion time series." ICDM, 2010
A time series of length 16
PAA representation with 4
segments
PAA:	Piecewise	Aggregate	Approximation
iSAX:	indexable Symbolic	Aggregate	approXimation
SAX	representation	with	
4	segments	and	
cardinality 4
[11,10,01,00]
iSAX	representation	
with	4	segments	and	
variable	cardinality
[1",	1",	𝟎𝟏 𝟒,	0"]
Word-level Similarity
9
-2	
-1	
0
1
2
1 3 5 7 9 11
A B C 111
110
101
100
011
010
001
000
-2	
-1	
0
1
2
1 3 5 7 9 11
A B C 111
110
101
100
011
010
001
000
State-of-the-art: Character-level	Similarity Proposed: Word-level	Similarity
B and C
are similar
A and C
are similar
A: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟏 𝟑, 𝟏 𝟏]
B: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟎 𝟑, 𝟏 𝟏]
C: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟎 𝟑, 𝟏 𝟏]
A: [𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟏𝟎 𝟐]
B: [𝟎𝟎 𝟐, 𝟎𝟎 𝟐, 𝟎𝟏 𝟐, 𝟏𝟏 𝟐]
C: [𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟏𝟎 𝟐]
New Index Tree Supports Word-level Similarity
10
Proposed: iSAX-T	K-ary TreeState-of-the-art: iSAX	Binary	Tree
Root
0", 1",0" 1", 1", 1"0", 0", 0"
0", 11%,0" 0", 10%, 0"
0",11%, 01% 0", 11%,00%
. . .. . .
Leaf nodeInternal node
1st
bit
2nd
bit
3rd
bit
Root
1",1",0" 1",1",1"0",0",0"
01$, 00$, 10$ 01$, 01$, 11$
010&,000&,100&
. . .
0",0",1"
. . .
00$, 00$, 10$
011&, 001&, 101&
. . .
Leaf nodeInternal node
iSAX-T(Transpose) Signature
11
iSAX-T
SAX(T,4,16) = {1100, 1101, 0110, 0001} = CE25
SAX(T,4,8) = {110, 110, 011, 000 } = CE2
SAX(T,4,4) = {11, 11, 01, 00 } = CE
SAX(T,4,2) = {1, 1, 0, 0 } = C
HexTranspose
1 1 0 0
1 1 0 1
0 1 1 0
0 0 0 1
1 1 0 0
1 1 1 0
0 0 1 0
0 1 0 1
C
E
2
5
Time series:
[1100, 1101, 0110, 0001]
Outline
• Motivation
• Background
• Spark-ITS Framework
– Overview
– Index Construction
– Query Processing
• Performance Evaluation
12
1.368099 2.713573 -4.851872 -2.710113 -5.577432 0.797747
-0.998534 0.535733 -2.244053 -0.298195 -5.040225 -0.093288
2.683385 -4.839688 5.617443 -0.087439 -0.857566 2.537812
-3.809641 0.638194 0.706312 -3.016157 -3.094813 4.975719
-3.664357 1.402586 0.444090 1.969943 1.282233 1.912557
2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288
0.850925 -2.994073 1.270280 1.286681 -5.681450 3.137617
-4.996282 3.160174 -8.749059 2.648822 6.117611 3.109095
3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183
-3.925308 -2.112708 -2.991460 -3.692369 4.508871 6.430551
-3.929611 -4.271633 0.268938 -1.756457 0.978831 0.783966
-1.982449 1.100825 -6.741050 0.882729 -3.098735 -0.330746
-0.659716 1.345305 -1.537599 -0.639539 2.028107 3.638267
0.211225 5.067515 -0.479032 2.713979 -2.921332 1.231413
-1.559693 -1.057173 -0.335133 -3.601023 -4.891684 -1.832524
-2.828772 0.257098 2.288298 -4.795566 -0.054114 1.991941
Global Index[1/4]: Sampling
1.368099 2.713573 -4.851872 -2.710113 -5.577432 0.797747
-0.998534 0.535733 -2.244053 -0.298195 -5.040225 -0.093288
2.683385 -4.839688 5.617443 -0.087439 -0.857566 2.537812
-3.809641 0.638194 0.706312 -3.016157 -3.094813 4.975719
0.594820 1.136821 0.163368 5.379237 -5.453637 -0.282540
0.572556 6.158454 -1.632961 -1.560935 2.514265 2.987787
-3.006184 0.965107 4.543610 0.614290 1.851868 -2.935539
-0.716928 2.357205 -3.126861 1.620514 -0.490122 -3.380533
-2.301087 -4.727099 6.885664 -5.210190 1.707254 -7.965270
-0.914942 0.622116 1.620520 -0.994487 -0.021151 -1.749576
-3.664357 1.402586 0.444090 1.969943 1.282233 1.912557
2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288
0.850925 -2.994073 1.270280 1.286681 -5.681450 3.137617
-4.996282 3.160174 -8.749059 2.648822 6.117611 3.109095
-3.164684 0.884269 2.925519 -1.051656 -0.371788 -1.661374
0.041967 0.126226 -5.662528 -1.026395 -1.317764 2.268905
-2.998881 2.628193 -4.195228 2.261641 -1.676540 -1.646810
-0.056534 -1.551837 -5.098098 2.857196 -2.981121 -0.559482
-0.573813 2.996416 -2.567590 3.113241 2.385687 -1.195035
-4.255606 2.898200 -2.443996 1.196084 -0.759899 -2.065437
-0.709810 -2.848963 4.183200 -0.901386 3.303330 -2.852400
0.022771 1.184107 1.556879 1.825932 -2.375073 1.430655
-1.844884 0.074289 3.309890 1.141529 2.022026 0.552751
3.941867 -4.384278 -0.374243 1.231169 3.094143 1.208599
-1.893548 4.995226 -4.282130 -1.408826 4.439037 1.134620
3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183
-3.925308 -2.112708 -2.991460 -3.692369 4.508871 6.430551
-3.929611 -4.271633 0.268938 -1.756457 0.978831 0.783966
-1.982449 1.100825 -6.741050 0.882729 -3.098735 -0.330746
1.041127 -3.140574 -1.436662 2.035271 0.203884 -0.821091
-0.659716 1.345305 -1.537599 -0.639539 2.028107 3.638267
0.211225 5.067515 -0.479032 2.713979 -2.921332 1.231413
-1.559693 -1.057173 -0.335133 -3.601023 -4.891684 -1.832524
-2.828772 0.257098 2.288298 -4.795566 -0.054114 1.991941
-3.091072 4.949271 -0.935447 3.327516 -3.299987 4.897994
-2.998881 2.628193 -4.195228 2.261641 -1.676540 -1.646810
-0.056534 -1.551837 -5.098098 2.857196 -2.981121 -0.559482
-0.573813 2.996416 -2.567590 3.113241 2.385687 -1.195035
-4.255606 2.898200 -2.443996 1.196084 -0.759899 -2.065437
-0.709810 -2.848963 4.183200 -0.901386 3.303330 -2.852400
0.022771 1.184107 1.556879 1.825932 -2.375073 1.430655
-1.844884 0.074289 3.309890 1.141529 2.022026 0.552751
3.941867 -4.384278 -0.374243 1.231169 3.094143 1.208599
-1.893548 4.995226 -4.282130 -1.408826 4.439037 1.134620
3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183
13
Sampling Map Reduce
iSAX-T(b bits),	Freq:1Time	series iSAX-T(b	bits),	Freq(b	bits)
1256ae3e , 1
0134ef45	 , 1
234567ae ,	 1	
1256ae3e , 1
1256ae3e , 1
234567ae , 1
237867ae ,	 1	
024567ae , 1
6243371e ,	 1	
……
452167ef , 1
1256ae3e , 23
0134ef45	 , 2
234567ae ,	 20	
024567ae , 4
237867ae ,	 10	
……
452167ef , 10
Segment	Number:	8,	so	use	2	letters	to	represent	1	bit
Initial	cardinality:	 b	bit	level
The	data	size	is	based	on	1	billion	time	series	with	256	length
Word	counting	MapReduce process
1	Terabyte
100	G
0.9	G
HDFS
0.1	G
Global Index[2/4]: Node Statistic
14
(iSAX-T(b),Freq(b)) (iSAX-T(1),Freq(b)) [(iSAX-T(1),Freq(1))]
max(Freq(1))
Map Reduce
Judge
1st layer:
2nd layer:
Filter
(iSAX-T(b),Freq(b)) (iSAX-T(2),Freq(b)) [(iSAX-T(2),Freq(2))]
max(Freq(2))
Map Reduce
Judge
Filter
3rd layer: (iSAX-T(b),Freq(b)) (iSAX-T(3),Freq(b)) [(iSAX-T(3),Freq(3))]
max(Freq(3))
Map Reduce
Judge
……
Global Index[3/4]: Build Tree
15
Root
iSAX-T:	01
Freq:	512
.	.	.	
Segment number: 8
Partition Capacity: 100,000
iSAX-T:	02
Freq:	350,000
iSAX-T:	03
Freq:	4,352
iSAX-T:	ff
Freq:	270,520
iSAX-T:	0201
Freq:	5,012
iSAX-T:	0202
Freq:	100,550
iSAX-T:	02ff
Freq:	620.	.	.	
iSAX-T:	020201
Freq:	12
iSAX-T:	020202
Freq:	550
iSAX-T:	0202ff
Freq:	620
.	.	.	
• (“01”, 512)
• (“02”, 355,000)
• ….
• (“ff”, 270,520)
• (“0201”, 5,012)
• (“0202”, 100,550)
• ….
• (“ffff”, 10,520)
• (“020201”, 12)
• (“020202”, 550)
• ….
• (“0202ff”, 620)
1st layer	(iSAX-T,	Freq)
2nd layer	(iSAX-T,	Freq)
3rd layer	(iSAX-T,	Freq)
Global Index[4/4]: Assign Partition Id to Leaf Nodes
16
Bin	Packing	Problem:	
How	to	fit	a	set	of	nodes	in	the	smallest	numbers	of	partitions?
Partition	capacity:	100,000
Partition	ID:	1 Partition	ID:	2 Partition	ID:	3
iSAX-T:	02
Freq:	390,500
iSAX-T:	0201
Freq:	70,000
iSAX-T:	0202
Freq:	50,000
iSAX-T:	0203
Freq:	20,000
iSAX-T:	0204
Freq:	40,000
iSAX-T:	0205
Freq:	80,000
iSAX-T:	0206
Freq:	130,500
iSAX-T:	0202
Freq:	50,000
iSAX-T:	0204
Freq:	40,000
iSAX-T:	0201
Freq:	70,000
iSAX-T:	0203
Freq:	20,000
iSAX-T:	0205
Freq:	80,000
Repartition: Wrap Global Index as the Partitioner
17
Root
iSAX-T:	01
Freq:	512
pid:	1
.	.	.	
iSAX-T:	02
Freq:	350,000
pid:	5,6,7
iSAX-T:	03
Freq:	4,352
pid:1
iSAX-T:	ff
Freq:	360,520
pid:	10,11,12
iSAX-T:	0201
Freq:	5,012
pid:	5
iSAX-T:	0202
Freq:	100,550
pid:	6,7
iSAX-T:	02ff
Freq:	620
pid:	5
.	.	.	
iSAX-T:	020201
Freq:	12
pid:	6
iSAX-T:	020202
Freq:	550
pid:6
iSAX-T:	0202ff
Freq:	620
.	.	.	
iSAX-T:	0202ff45
TS:							[0.34,	0.31,	1.14…]
iSAX-T:	0202ff45
A	Time	Series
iSAX-T:	0202ff45
iSAX-T:	0202ff45
Local Index: Construction Within Each Partition
18
Partition	capacity:	 100,000
Node	split	threshold:	 1000
Segment	Number:							8
abcd
Freq:5000
ab3c
Freq:	450
Root
Freq:	90,990
abcd12
Freq:	1010
abcd45
Freq:	96
…
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
…
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
……
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
abcd12ff
Freq:	30
abcd1201
Freq:	42
……
ab45
Freq:4000
……
…
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
…
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T,	ts,	rid
iSAX-T:	abcd12ef34,		….
+1
+1
+1
1.368099 2.713573 -4.851872 -2.710113
-
5.577432
0.797747
-
0.998534
0.535733 -2.244053
-
0.298195
-5.040225 -0.093288
2.683385
-
4.839688
5.617443
-
0.087439
-
0.857566
2.537812
-
3.809641
0.638194 0.706312
-
3.016157
-
3.094813
4.975719
0.594820 1.136821 0.163368 5.379237
-
5.453637
-
0.282540
0.572556 6.158454
-
1.632961
-
1.560935
2.514265 2.987787
-3.006184 0.965107 4.543610 0.614290 1.851868
-
2.935539
-
0.716928
2.357205 -3.126861 1.620514 -0.490122
-
3.380533
-2.301087
-
4.727099
6.885664 -5.210190 1.707254
-
7.965270
-0.914942 0.622116 1.620520
-
0.994487
-
0.021151
-
1.749576
-
3.664357
1.402586 0.444090 1.969943 1.282233 1.912557
2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288
0.850925
-
2.994073
1.270280 1.286681
-
5.681450
3.137617
-
4.996282
3.160174
-
8.749059
2.648822 6.117611 3.109095
-
3.164684
0.884269 2.925519
-
1.051656
-
0.371788
-
1.661374
0.041967 0.126226
-
5.662528
-1.026395
-
1.317764
2.268905
-
2.998881
2.628193
-
4.195228
2.261641
-
1.676540
-
1.646810
-
0.056534
-
1.551837
-
5.098098
2.857196 -2.981121
-
0.559482
-
0.573813
2.996416
-
2.567590
3.113241 2.385687
-
1.195035
-
4.255606
2.898200
-
2.443996
1.196084
-
0.759899
-
2.065437
-
0.709810
-
2.848963
4.183200
-
0.901386
3.303330
-
2.852400
0.022771 1.184107 1.556879 1.825932
-
2.375073
1.430655
-
1.844884
0.074289 3.309890 1.141529 2.022026 0.552751
3.941867 -4.384278
-
0.374243
1.231169 3.094143 1.208599
-
1.893548
4.995226 -4.282130
-
1.408826
4.439037 1.134620
3.159747 0.442472
-
1.482878
3.432288 0.960204 0.380183
-
3.925308
-
2.112708
-
2.991460
-
3.692369
4.508871 6.430551
-
3.929611
-
4.271633
0.268938
-
1.756457
0.978831 0.783966
-
1.982449
1.100825
-
6.741050
0.882729
-
3.098735
-
0.330746
1.041127
-
3.140574
-
1.436662
2.035271 0.203884
-
0.821091
-
0.659716
1.345305
-
1.537599
-
0.639539
2.028107 3.638267
0.211225 5.067515
-
0.479032
2.713979 -2.921332 1.231413
-
1.559693
-
1.057173
-
0.335133
-3.601023
-
4.891684
-
1.832524
-
2.828772
0.257098 2.288298
-
4.795566
-
0.054114
1.991941
-
3.091072
4.949271
-
0.935447
3.327516 -3.299987 4.897994
1.306157 1.228019 -2.920305 0.710852
-
2.590932
-
3.644530
Time	series	
in	one	partition
Local	Index Bloom	Filter
iSAX-T:	abcd12ef34,		….
Outline
• Motivation
• Background
• Spark-ITS Framework
– Overview
– Index Construction
– Query Processing
• Performance Evaluation
19
000*,001*,001*
Isax-t: 002
Exact Matching Query
20
Local Index
Master
Global	Index
Query
Records	in
leaf node
euDist =	0
Pid
6 Bloom Filter
No
Yes
Exist?
Pid:6
Worker
Partition 6
KNN Approximate Query: One Partition Access
21
Worker
000*,001*,001*
Isax-t: 002
Local Index
Master
iSAX-T	Skeleton
Query
Pid
Pid:6
(euDist,	rid)1. euDist
2. sort	
3. take	Top(K)Records	in	leaf	
/	internal	node
Partition 6
Records	in	
leaf/internal
node
1. euDist
2. sort	
3. Top(K)	dist
as	
threshold
KNN Approximate Query: Multi-Partitions Access
22
000*,001*,001*
iSAX-T: 002
Partition	6
Local Index
Master
Worker
(euDist,rid)
Local Index
1. euDist
2. sort	
3. take	Top(K)
Sibling
Pid List
iSAX-T	Skeleton
Records	in	
leaf/internal
node
1. euDist
2. sort	
3. Top(K)	dist
as	
threshold
Records	in	leaf	
/	internal	node
Query
Outline
• Motivation
• Background
• Spark-ITS Framework
– Overview
– Index Construction
– Query Processing
• Performance Evaluation
23
Experimental Setup
24
Dataset Size Length
Random	Walk	 1	billion 256
Texmex 1	billion 128
DNA 200	million 192
Noaa Climate 200	million 64
HW&SW Configuration
Spark 2.0.2, Standalone mode
Hadoop 2.7.3
Platform Ubuntu 16.04.	LTS
HW 2	nodes, each	node	consist of	56	Xeon	
E5	processors,	500G	RAM,	7TB	SATA	
hard	drive
The dataset is normalized
Each point is saved as float format
Source:
1. http://corpus-texmex.irisa.fr/
2. https://genmone.ucsc.edu
3. https://www.ncdc.gov/
1
2
3
State-of-the-Art: Yagoubi, Djamel-Edine, et al. "DPiSAX:
Massively Distributed Partitioned iSAX." ICDM 2017
The initial cardinality of the baseline system is the default value
and it needs a large initial value to guarantee enough bit level for
binary split.
Baseline Spark-ITS
Initial	cardinality 512 64
Word	length 8 8
Sampling	percent 10% 10%
Leaf node	split	threshold	
of	Local index 1000 1000
Index Construction Time
25
0
500
1,000
1,500
2,000
200m 400m 600m 800m 1b 200m 400m 600m 800m 1b
Spark-ITS Baseline
(Minuts)
#Time Series
Global Index
Local Index
2,323
334
Dataset: Random Walk Benchmark
80+%
State-of-the-Art
0
10
20
30
40
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
200m 400m 600m 800m 1b
(Minutes)
sampling
statistic
build index
assign Pid
State-of-Art
State-of-Art
State-of-Art
State-of-Art
State-of-Art
Sampling
Statistic
Build Index
Assign Pid
Index Construction Time: Breakdown
26
Global Index Time	Breakdown Repartition	and	Local	Index	Time	Breakdown
Dataset: Random Walk Benchmark
0
500
1,000
1,500
2,000
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
Spark-ITS
Baseline
200m 400m 600m 800m 1b
(Minutes)
iSAX read and convert
Shuffle and build Local index
Read and conversion
State-of-Art
State-of-Art
State-of-Art
State-of-Art
State-of-Art
Shuffle and Build Local Index
Exact Matching Query
27
State-of-
the-Art
State-of-the-Art
kNN-Approximate Query Performance
28
0%
15%
30%
45%
60%
RandomWalk Texmex DNA Noaa
✕30
✕ 35 ✕ 28
✕ 72
1.0
1.3
1.6
1.9
2.2
RandomWalk Texmex DNA Noaa
27%
26%
48%
34%
Recall Error	Ratio
Dataset (#Time series) Dataset (#Time series)
(400m) (400m) (200m) (200m) (400m) (400m) (200m) (200m)
State-of-the-Art
Conclusion
• Index Tree
– Large fan-out decreases the depth of leaf nodes
– Keeps better similarity at Word-level
– The signature simplifies the conversion of cardinality
• Spark-ITS: Index Construction
– Block-sampling and node statistic collection to fast build global index
– Synchronously build local indices within a partition
– Constructs Index faster 80+%.
• Spark-ITS: Query
– Exact Matching: the time decreases by 50%.
– kNN approximate: the accuracy increases more than 10 fold.
29
Acknowledge Funding from...
Xianjin Tech Co., Ltd.
Saudi Arabian Cultural Mission
WPI Computer Science Dept.,
NSF CNS: 305258 II-EN
NSF CRI: 0551584

More Related Content

Similar to Spark-ITS: Indexing for Large-Scale Time Series Data on Spark with Liang Zhang

Finite element analysis and experimental investigations on small size wind tu...
Finite element analysis and experimental investigations on small size wind tu...Finite element analysis and experimental investigations on small size wind tu...
Finite element analysis and experimental investigations on small size wind tu...iaemedu
 
USEFUL FORMULAS Measures of risk Expected returns .docx
USEFUL FORMULAS Measures of risk Expected returns  .docxUSEFUL FORMULAS Measures of risk Expected returns  .docx
USEFUL FORMULAS Measures of risk Expected returns .docxjessiehampson
 
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docx
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docxEENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docx
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docxjack60216
 
Rethinking Technical Analysis
Rethinking Technical AnalysisRethinking Technical Analysis
Rethinking Technical AnalysisThomas Starke
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
 
Project x1.1 Presentation
Project x1.1 PresentationProject x1.1 Presentation
Project x1.1 PresentationJoseph Sroka
 
PV, FV, & Annuity tables
PV, FV, & Annuity tablesPV, FV, & Annuity tables
PV, FV, & Annuity tablesDr. Amir Ikram
 
Tablice rozkładu t-studenta
Tablice rozkładu t-studentaTablice rozkładu t-studenta
Tablice rozkładu t-studentataskbook
 
Socios agraciados para el Córdoba-Albacete de Copa
Socios agraciados para el Córdoba-Albacete de CopaSocios agraciados para el Córdoba-Albacete de Copa
Socios agraciados para el Córdoba-Albacete de Copacordopolis
 
Como se utiliza la tabla t de student (formulas)
Como se utiliza la tabla t de student (formulas)Como se utiliza la tabla t de student (formulas)
Como se utiliza la tabla t de student (formulas)Mayi Virgi Marcelo Santos
 
Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsExplaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsAndreas Metzger
 
Chapter 3 Experimental Errors Statistics
Chapter 3 Experimental Errors  StatisticsChapter 3 Experimental Errors  Statistics
Chapter 3 Experimental Errors Statisticswanafif8
 
PV Project Development Pre-Feasibility Financial Model
PV Project Development Pre-Feasibility Financial ModelPV Project Development Pre-Feasibility Financial Model
PV Project Development Pre-Feasibility Financial ModelFadi Maalouf, PMP
 
Financial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfFinancial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfSIMBARASHEMABHEKA
 
Financial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfFinancial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfSIMBARASHEMABHEKA
 
Mathematics- IB Math Studies IA
Mathematics- IB Math Studies IAMathematics- IB Math Studies IA
Mathematics- IB Math Studies IAalipinedo
 
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)Sayogyo Rahman Doko
 
Weight chart for_hexagon_bolts_&_nuts
Weight chart for_hexagon_bolts_&_nutsWeight chart for_hexagon_bolts_&_nuts
Weight chart for_hexagon_bolts_&_nutsManish Thakur
 

Similar to Spark-ITS: Indexing for Large-Scale Time Series Data on Spark with Liang Zhang (20)

Finite element analysis and experimental investigations on small size wind tu...
Finite element analysis and experimental investigations on small size wind tu...Finite element analysis and experimental investigations on small size wind tu...
Finite element analysis and experimental investigations on small size wind tu...
 
USEFUL FORMULAS Measures of risk Expected returns .docx
USEFUL FORMULAS Measures of risk Expected returns  .docxUSEFUL FORMULAS Measures of risk Expected returns  .docx
USEFUL FORMULAS Measures of risk Expected returns .docx
 
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docx
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docxEENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docx
EENG 3305Linear Circuit Analysis IIFinal ExamDecembe.docx
 
Rethinking Technical Analysis
Rethinking Technical AnalysisRethinking Technical Analysis
Rethinking Technical Analysis
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
 
Project x1.1 Presentation
Project x1.1 PresentationProject x1.1 Presentation
Project x1.1 Presentation
 
PV, FV, & Annuity tables
PV, FV, & Annuity tablesPV, FV, & Annuity tables
PV, FV, & Annuity tables
 
Tablice rozkładu t-studenta
Tablice rozkładu t-studentaTablice rozkładu t-studenta
Tablice rozkładu t-studenta
 
Socios agraciados para el Córdoba-Albacete de Copa
Socios agraciados para el Córdoba-Albacete de CopaSocios agraciados para el Córdoba-Albacete de Copa
Socios agraciados para el Córdoba-Albacete de Copa
 
Tabelas
TabelasTabelas
Tabelas
 
Como se utiliza la tabla t de student (formulas)
Como se utiliza la tabla t de student (formulas)Como se utiliza la tabla t de student (formulas)
Como se utiliza la tabla t de student (formulas)
 
Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsExplaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
 
Chapter 3 Experimental Errors Statistics
Chapter 3 Experimental Errors  StatisticsChapter 3 Experimental Errors  Statistics
Chapter 3 Experimental Errors Statistics
 
Present value tables
Present value tablesPresent value tables
Present value tables
 
PV Project Development Pre-Feasibility Financial Model
PV Project Development Pre-Feasibility Financial ModelPV Project Development Pre-Feasibility Financial Model
PV Project Development Pre-Feasibility Financial Model
 
Financial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdfFinancial_Management_Class_Notes (1).pdf
Financial_Management_Class_Notes (1).pdf
 
Financial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdfFinancial_Management_Class_Notes.pdf
Financial_Management_Class_Notes.pdf
 
Mathematics- IB Math Studies IA
Mathematics- IB Math Studies IAMathematics- IB Math Studies IA
Mathematics- IB Math Studies IA
 
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)
Homework 3: Metode Schrenk dan Gaya Dalam (Schrenk’s Method and Internal Forces)
 
Weight chart for_hexagon_bolts_&_nuts
Weight chart for_hexagon_bolts_&_nutsWeight chart for_hexagon_bolts_&_nuts
Weight chart for_hexagon_bolts_&_nuts
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Spark-ITS: Indexing for Large-Scale Time Series Data on Spark with Liang Zhang

  • 1. Liang Zhang (lzhang6@wpi.edu) Data Science Dept., Worcester Polytechnic Institute Spark-ITS: Indexing for Large-Scale Time Series Data on Spark #SAISEco5
  • 2. Prof. Elke A. Rundensteiner Prof. Mohamed Y. Eltabakh Liang Zhang Noura Alghamdi Data Science Research Group @ Worcester Polytechnic Institute Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner. TARDIS: Distributed Indexing Framework for Big Time Series Data. Proceedings of 35th IEEE International Conference on Data Engineering ICDE, 2019
  • 3. Outline • Motivation • Background • Spark-ITS Framework – Overview – Index Construction – Query Processing • Performance Evaluation 3
  • 4. Time Series are Continuously Produced Everywhere • How to deal with billions of time series? 4 Climate data Web log Stock priceEEG
  • 5. Almost all Time Series Data Mining Tasks rely on Similarity Query 5 Esling, Philippe, and Carlos Agon. "Time-series data mining." ACM (CSUR) 45.1 (2012): 12. Clustering Motif Discovery Classification çΩ Outlier Detection Whole Matching Subsequence Matching
  • 6. Spark-ITS • A new Index Tree and an effective Signature to simplify the cardinality conversion and keep better similarity • A Distributed Index Framework to support large-scale time series dataset • Efficient algorithms for Exact Match and kNN Approximate queries process 6
  • 7. Spark-ITS Overview 7 Global Index Indexed Data Local Index Query Partition 1. Sampling 2. Node Statistic 3. Build Index Tree 4. Assign Partition ID 1. Construct Local Structure 2. Construct Bloom Filter 1. Read and convert data 2. Shuffle data
  • 8. Background: iSAX Representation 8 Shieh, Jin, and Eamonn Keogh. "iSAX: indexing and mining terabyte sized time series." SIGKDD ACM, 2008. Camerra, A., Palpanas, T., Shieh, J., & Keogh, E. "iSAX 2.0: Indexing and mining one billion time series." ICDM, 2010 A time series of length 16 PAA representation with 4 segments PAA: Piecewise Aggregate Approximation iSAX: indexable Symbolic Aggregate approXimation SAX representation with 4 segments and cardinality 4 [11,10,01,00] iSAX representation with 4 segments and variable cardinality [1", 1", 𝟎𝟏 𝟒, 0"]
  • 9. Word-level Similarity 9 -2 -1 0 1 2 1 3 5 7 9 11 A B C 111 110 101 100 011 010 001 000 -2 -1 0 1 2 1 3 5 7 9 11 A B C 111 110 101 100 011 010 001 000 State-of-the-art: Character-level Similarity Proposed: Word-level Similarity B and C are similar A and C are similar A: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟏 𝟑, 𝟏 𝟏] B: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟎 𝟑, 𝟏 𝟏] C: [𝟎 𝟏, 𝟎 𝟏, 𝟎𝟏𝟎 𝟑, 𝟏 𝟏] A: [𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟏𝟎 𝟐] B: [𝟎𝟎 𝟐, 𝟎𝟎 𝟐, 𝟎𝟏 𝟐, 𝟏𝟏 𝟐] C: [𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟎𝟏 𝟐, 𝟏𝟎 𝟐]
  • 10. New Index Tree Supports Word-level Similarity 10 Proposed: iSAX-T K-ary TreeState-of-the-art: iSAX Binary Tree Root 0", 1",0" 1", 1", 1"0", 0", 0" 0", 11%,0" 0", 10%, 0" 0",11%, 01% 0", 11%,00% . . .. . . Leaf nodeInternal node 1st bit 2nd bit 3rd bit Root 1",1",0" 1",1",1"0",0",0" 01$, 00$, 10$ 01$, 01$, 11$ 010&,000&,100& . . . 0",0",1" . . . 00$, 00$, 10$ 011&, 001&, 101& . . . Leaf nodeInternal node
  • 11. iSAX-T(Transpose) Signature 11 iSAX-T SAX(T,4,16) = {1100, 1101, 0110, 0001} = CE25 SAX(T,4,8) = {110, 110, 011, 000 } = CE2 SAX(T,4,4) = {11, 11, 01, 00 } = CE SAX(T,4,2) = {1, 1, 0, 0 } = C HexTranspose 1 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 C E 2 5 Time series: [1100, 1101, 0110, 0001]
  • 12. Outline • Motivation • Background • Spark-ITS Framework – Overview – Index Construction – Query Processing • Performance Evaluation 12
  • 13. 1.368099 2.713573 -4.851872 -2.710113 -5.577432 0.797747 -0.998534 0.535733 -2.244053 -0.298195 -5.040225 -0.093288 2.683385 -4.839688 5.617443 -0.087439 -0.857566 2.537812 -3.809641 0.638194 0.706312 -3.016157 -3.094813 4.975719 -3.664357 1.402586 0.444090 1.969943 1.282233 1.912557 2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288 0.850925 -2.994073 1.270280 1.286681 -5.681450 3.137617 -4.996282 3.160174 -8.749059 2.648822 6.117611 3.109095 3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183 -3.925308 -2.112708 -2.991460 -3.692369 4.508871 6.430551 -3.929611 -4.271633 0.268938 -1.756457 0.978831 0.783966 -1.982449 1.100825 -6.741050 0.882729 -3.098735 -0.330746 -0.659716 1.345305 -1.537599 -0.639539 2.028107 3.638267 0.211225 5.067515 -0.479032 2.713979 -2.921332 1.231413 -1.559693 -1.057173 -0.335133 -3.601023 -4.891684 -1.832524 -2.828772 0.257098 2.288298 -4.795566 -0.054114 1.991941 Global Index[1/4]: Sampling 1.368099 2.713573 -4.851872 -2.710113 -5.577432 0.797747 -0.998534 0.535733 -2.244053 -0.298195 -5.040225 -0.093288 2.683385 -4.839688 5.617443 -0.087439 -0.857566 2.537812 -3.809641 0.638194 0.706312 -3.016157 -3.094813 4.975719 0.594820 1.136821 0.163368 5.379237 -5.453637 -0.282540 0.572556 6.158454 -1.632961 -1.560935 2.514265 2.987787 -3.006184 0.965107 4.543610 0.614290 1.851868 -2.935539 -0.716928 2.357205 -3.126861 1.620514 -0.490122 -3.380533 -2.301087 -4.727099 6.885664 -5.210190 1.707254 -7.965270 -0.914942 0.622116 1.620520 -0.994487 -0.021151 -1.749576 -3.664357 1.402586 0.444090 1.969943 1.282233 1.912557 2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288 0.850925 -2.994073 1.270280 1.286681 -5.681450 3.137617 -4.996282 3.160174 -8.749059 2.648822 6.117611 3.109095 -3.164684 0.884269 2.925519 -1.051656 -0.371788 -1.661374 0.041967 0.126226 -5.662528 -1.026395 -1.317764 2.268905 -2.998881 2.628193 -4.195228 2.261641 -1.676540 -1.646810 -0.056534 -1.551837 -5.098098 2.857196 -2.981121 -0.559482 -0.573813 2.996416 -2.567590 3.113241 2.385687 -1.195035 -4.255606 2.898200 -2.443996 1.196084 -0.759899 -2.065437 -0.709810 -2.848963 4.183200 -0.901386 3.303330 -2.852400 0.022771 1.184107 1.556879 1.825932 -2.375073 1.430655 -1.844884 0.074289 3.309890 1.141529 2.022026 0.552751 3.941867 -4.384278 -0.374243 1.231169 3.094143 1.208599 -1.893548 4.995226 -4.282130 -1.408826 4.439037 1.134620 3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183 -3.925308 -2.112708 -2.991460 -3.692369 4.508871 6.430551 -3.929611 -4.271633 0.268938 -1.756457 0.978831 0.783966 -1.982449 1.100825 -6.741050 0.882729 -3.098735 -0.330746 1.041127 -3.140574 -1.436662 2.035271 0.203884 -0.821091 -0.659716 1.345305 -1.537599 -0.639539 2.028107 3.638267 0.211225 5.067515 -0.479032 2.713979 -2.921332 1.231413 -1.559693 -1.057173 -0.335133 -3.601023 -4.891684 -1.832524 -2.828772 0.257098 2.288298 -4.795566 -0.054114 1.991941 -3.091072 4.949271 -0.935447 3.327516 -3.299987 4.897994 -2.998881 2.628193 -4.195228 2.261641 -1.676540 -1.646810 -0.056534 -1.551837 -5.098098 2.857196 -2.981121 -0.559482 -0.573813 2.996416 -2.567590 3.113241 2.385687 -1.195035 -4.255606 2.898200 -2.443996 1.196084 -0.759899 -2.065437 -0.709810 -2.848963 4.183200 -0.901386 3.303330 -2.852400 0.022771 1.184107 1.556879 1.825932 -2.375073 1.430655 -1.844884 0.074289 3.309890 1.141529 2.022026 0.552751 3.941867 -4.384278 -0.374243 1.231169 3.094143 1.208599 -1.893548 4.995226 -4.282130 -1.408826 4.439037 1.134620 3.159747 0.442472 -1.482878 3.432288 0.960204 0.380183 13 Sampling Map Reduce iSAX-T(b bits), Freq:1Time series iSAX-T(b bits), Freq(b bits) 1256ae3e , 1 0134ef45 , 1 234567ae , 1 1256ae3e , 1 1256ae3e , 1 234567ae , 1 237867ae , 1 024567ae , 1 6243371e , 1 …… 452167ef , 1 1256ae3e , 23 0134ef45 , 2 234567ae , 20 024567ae , 4 237867ae , 10 …… 452167ef , 10 Segment Number: 8, so use 2 letters to represent 1 bit Initial cardinality: b bit level The data size is based on 1 billion time series with 256 length Word counting MapReduce process 1 Terabyte 100 G 0.9 G HDFS 0.1 G
  • 14. Global Index[2/4]: Node Statistic 14 (iSAX-T(b),Freq(b)) (iSAX-T(1),Freq(b)) [(iSAX-T(1),Freq(1))] max(Freq(1)) Map Reduce Judge 1st layer: 2nd layer: Filter (iSAX-T(b),Freq(b)) (iSAX-T(2),Freq(b)) [(iSAX-T(2),Freq(2))] max(Freq(2)) Map Reduce Judge Filter 3rd layer: (iSAX-T(b),Freq(b)) (iSAX-T(3),Freq(b)) [(iSAX-T(3),Freq(3))] max(Freq(3)) Map Reduce Judge ……
  • 15. Global Index[3/4]: Build Tree 15 Root iSAX-T: 01 Freq: 512 . . . Segment number: 8 Partition Capacity: 100,000 iSAX-T: 02 Freq: 350,000 iSAX-T: 03 Freq: 4,352 iSAX-T: ff Freq: 270,520 iSAX-T: 0201 Freq: 5,012 iSAX-T: 0202 Freq: 100,550 iSAX-T: 02ff Freq: 620. . . iSAX-T: 020201 Freq: 12 iSAX-T: 020202 Freq: 550 iSAX-T: 0202ff Freq: 620 . . . • (“01”, 512) • (“02”, 355,000) • …. • (“ff”, 270,520) • (“0201”, 5,012) • (“0202”, 100,550) • …. • (“ffff”, 10,520) • (“020201”, 12) • (“020202”, 550) • …. • (“0202ff”, 620) 1st layer (iSAX-T, Freq) 2nd layer (iSAX-T, Freq) 3rd layer (iSAX-T, Freq)
  • 16. Global Index[4/4]: Assign Partition Id to Leaf Nodes 16 Bin Packing Problem: How to fit a set of nodes in the smallest numbers of partitions? Partition capacity: 100,000 Partition ID: 1 Partition ID: 2 Partition ID: 3 iSAX-T: 02 Freq: 390,500 iSAX-T: 0201 Freq: 70,000 iSAX-T: 0202 Freq: 50,000 iSAX-T: 0203 Freq: 20,000 iSAX-T: 0204 Freq: 40,000 iSAX-T: 0205 Freq: 80,000 iSAX-T: 0206 Freq: 130,500 iSAX-T: 0202 Freq: 50,000 iSAX-T: 0204 Freq: 40,000 iSAX-T: 0201 Freq: 70,000 iSAX-T: 0203 Freq: 20,000 iSAX-T: 0205 Freq: 80,000
  • 17. Repartition: Wrap Global Index as the Partitioner 17 Root iSAX-T: 01 Freq: 512 pid: 1 . . . iSAX-T: 02 Freq: 350,000 pid: 5,6,7 iSAX-T: 03 Freq: 4,352 pid:1 iSAX-T: ff Freq: 360,520 pid: 10,11,12 iSAX-T: 0201 Freq: 5,012 pid: 5 iSAX-T: 0202 Freq: 100,550 pid: 6,7 iSAX-T: 02ff Freq: 620 pid: 5 . . . iSAX-T: 020201 Freq: 12 pid: 6 iSAX-T: 020202 Freq: 550 pid:6 iSAX-T: 0202ff Freq: 620 . . . iSAX-T: 0202ff45 TS: [0.34, 0.31, 1.14…] iSAX-T: 0202ff45 A Time Series iSAX-T: 0202ff45 iSAX-T: 0202ff45
  • 18. Local Index: Construction Within Each Partition 18 Partition capacity: 100,000 Node split threshold: 1000 Segment Number: 8 abcd Freq:5000 ab3c Freq: 450 Root Freq: 90,990 abcd12 Freq: 1010 abcd45 Freq: 96 … iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid … iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid …… iSAX-T, ts, rid iSAX-T, ts, rid abcd12ff Freq: 30 abcd1201 Freq: 42 …… ab45 Freq:4000 …… … iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid … iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T, ts, rid iSAX-T: abcd12ef34, …. +1 +1 +1 1.368099 2.713573 -4.851872 -2.710113 - 5.577432 0.797747 - 0.998534 0.535733 -2.244053 - 0.298195 -5.040225 -0.093288 2.683385 - 4.839688 5.617443 - 0.087439 - 0.857566 2.537812 - 3.809641 0.638194 0.706312 - 3.016157 - 3.094813 4.975719 0.594820 1.136821 0.163368 5.379237 - 5.453637 - 0.282540 0.572556 6.158454 - 1.632961 - 1.560935 2.514265 2.987787 -3.006184 0.965107 4.543610 0.614290 1.851868 - 2.935539 - 0.716928 2.357205 -3.126861 1.620514 -0.490122 - 3.380533 -2.301087 - 4.727099 6.885664 -5.210190 1.707254 - 7.965270 -0.914942 0.622116 1.620520 - 0.994487 - 0.021151 - 1.749576 - 3.664357 1.402586 0.444090 1.969943 1.282233 1.912557 2.277926 1.511366 0.945206 5.769843 0.406734 -4.205288 0.850925 - 2.994073 1.270280 1.286681 - 5.681450 3.137617 - 4.996282 3.160174 - 8.749059 2.648822 6.117611 3.109095 - 3.164684 0.884269 2.925519 - 1.051656 - 0.371788 - 1.661374 0.041967 0.126226 - 5.662528 -1.026395 - 1.317764 2.268905 - 2.998881 2.628193 - 4.195228 2.261641 - 1.676540 - 1.646810 - 0.056534 - 1.551837 - 5.098098 2.857196 -2.981121 - 0.559482 - 0.573813 2.996416 - 2.567590 3.113241 2.385687 - 1.195035 - 4.255606 2.898200 - 2.443996 1.196084 - 0.759899 - 2.065437 - 0.709810 - 2.848963 4.183200 - 0.901386 3.303330 - 2.852400 0.022771 1.184107 1.556879 1.825932 - 2.375073 1.430655 - 1.844884 0.074289 3.309890 1.141529 2.022026 0.552751 3.941867 -4.384278 - 0.374243 1.231169 3.094143 1.208599 - 1.893548 4.995226 -4.282130 - 1.408826 4.439037 1.134620 3.159747 0.442472 - 1.482878 3.432288 0.960204 0.380183 - 3.925308 - 2.112708 - 2.991460 - 3.692369 4.508871 6.430551 - 3.929611 - 4.271633 0.268938 - 1.756457 0.978831 0.783966 - 1.982449 1.100825 - 6.741050 0.882729 - 3.098735 - 0.330746 1.041127 - 3.140574 - 1.436662 2.035271 0.203884 - 0.821091 - 0.659716 1.345305 - 1.537599 - 0.639539 2.028107 3.638267 0.211225 5.067515 - 0.479032 2.713979 -2.921332 1.231413 - 1.559693 - 1.057173 - 0.335133 -3.601023 - 4.891684 - 1.832524 - 2.828772 0.257098 2.288298 - 4.795566 - 0.054114 1.991941 - 3.091072 4.949271 - 0.935447 3.327516 -3.299987 4.897994 1.306157 1.228019 -2.920305 0.710852 - 2.590932 - 3.644530 Time series in one partition Local Index Bloom Filter iSAX-T: abcd12ef34, ….
  • 19. Outline • Motivation • Background • Spark-ITS Framework – Overview – Index Construction – Query Processing • Performance Evaluation 19
  • 20. 000*,001*,001* Isax-t: 002 Exact Matching Query 20 Local Index Master Global Index Query Records in leaf node euDist = 0 Pid 6 Bloom Filter No Yes Exist? Pid:6 Worker Partition 6
  • 21. KNN Approximate Query: One Partition Access 21 Worker 000*,001*,001* Isax-t: 002 Local Index Master iSAX-T Skeleton Query Pid Pid:6 (euDist, rid)1. euDist 2. sort 3. take Top(K)Records in leaf / internal node Partition 6 Records in leaf/internal node 1. euDist 2. sort 3. Top(K) dist as threshold
  • 22. KNN Approximate Query: Multi-Partitions Access 22 000*,001*,001* iSAX-T: 002 Partition 6 Local Index Master Worker (euDist,rid) Local Index 1. euDist 2. sort 3. take Top(K) Sibling Pid List iSAX-T Skeleton Records in leaf/internal node 1. euDist 2. sort 3. Top(K) dist as threshold Records in leaf / internal node Query
  • 23. Outline • Motivation • Background • Spark-ITS Framework – Overview – Index Construction – Query Processing • Performance Evaluation 23
  • 24. Experimental Setup 24 Dataset Size Length Random Walk 1 billion 256 Texmex 1 billion 128 DNA 200 million 192 Noaa Climate 200 million 64 HW&SW Configuration Spark 2.0.2, Standalone mode Hadoop 2.7.3 Platform Ubuntu 16.04. LTS HW 2 nodes, each node consist of 56 Xeon E5 processors, 500G RAM, 7TB SATA hard drive The dataset is normalized Each point is saved as float format Source: 1. http://corpus-texmex.irisa.fr/ 2. https://genmone.ucsc.edu 3. https://www.ncdc.gov/ 1 2 3 State-of-the-Art: Yagoubi, Djamel-Edine, et al. "DPiSAX: Massively Distributed Partitioned iSAX." ICDM 2017 The initial cardinality of the baseline system is the default value and it needs a large initial value to guarantee enough bit level for binary split. Baseline Spark-ITS Initial cardinality 512 64 Word length 8 8 Sampling percent 10% 10% Leaf node split threshold of Local index 1000 1000
  • 25. Index Construction Time 25 0 500 1,000 1,500 2,000 200m 400m 600m 800m 1b 200m 400m 600m 800m 1b Spark-ITS Baseline (Minuts) #Time Series Global Index Local Index 2,323 334 Dataset: Random Walk Benchmark 80+% State-of-the-Art
  • 26. 0 10 20 30 40 Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline 200m 400m 600m 800m 1b (Minutes) sampling statistic build index assign Pid State-of-Art State-of-Art State-of-Art State-of-Art State-of-Art Sampling Statistic Build Index Assign Pid Index Construction Time: Breakdown 26 Global Index Time Breakdown Repartition and Local Index Time Breakdown Dataset: Random Walk Benchmark 0 500 1,000 1,500 2,000 Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline Spark-ITS Baseline 200m 400m 600m 800m 1b (Minutes) iSAX read and convert Shuffle and build Local index Read and conversion State-of-Art State-of-Art State-of-Art State-of-Art State-of-Art Shuffle and Build Local Index
  • 28. kNN-Approximate Query Performance 28 0% 15% 30% 45% 60% RandomWalk Texmex DNA Noaa ✕30 ✕ 35 ✕ 28 ✕ 72 1.0 1.3 1.6 1.9 2.2 RandomWalk Texmex DNA Noaa 27% 26% 48% 34% Recall Error Ratio Dataset (#Time series) Dataset (#Time series) (400m) (400m) (200m) (200m) (400m) (400m) (200m) (200m) State-of-the-Art
  • 29. Conclusion • Index Tree – Large fan-out decreases the depth of leaf nodes – Keeps better similarity at Word-level – The signature simplifies the conversion of cardinality • Spark-ITS: Index Construction – Block-sampling and node statistic collection to fast build global index – Synchronously build local indices within a partition – Constructs Index faster 80+%. • Spark-ITS: Query – Exact Matching: the time decreases by 50%. – kNN approximate: the accuracy increases more than 10 fold. 29
  • 30. Acknowledge Funding from... Xianjin Tech Co., Ltd. Saudi Arabian Cultural Mission WPI Computer Science Dept., NSF CNS: 305258 II-EN NSF CRI: 0551584