SlideShare a Scribd company logo
Muhammad Ali Gulzar Miryung Kim
University of California, Los Angeles
Automated Debugging of Big Data
Analytics in Apache Spark Using BigSift
#Res3SAIS
Big Data Debugging in the Dark
2#Res3SAIS
Map Reduce
Develop locally Hope it works Run in cloud Bug!
Guesswork
BigDebug: Debugging Primitives for
Interactive Big Data Processing
ICSE 2016
Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai
Deep Tetali Tyson Condie, Todd Millstein, Miryung Kim
Presented at Spark Summit 2017
3#Res3SAIS
Interactive Debugging with BigDebug
4#Res3SAIS
4. Backward and Forward Tracing
1. Simulated Breakpoint 2. On Demand Guarded Watchpoint
3. Crash Culprit Identification
Titian: Data Provenance support in Spark
VLDB 2015
Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali
Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, Tyson Condie
5#Res3SAIS
Data Provenance – in SQL
6#Res3SAIS
Tuple-ID Time Sendor-ID Temperature
T1 11AM 1 34
T2 11AM 2 35
T3 11AM 3 35
T4 12PM 1 35
T5 12PM 2 35
T6 12PM 3 100
T7 1PM 1 35
T8 1PM 2 35
T9 1PM 3 80
SELECT time, AVG(temp)
FROM sensors
GROUP BY time
Sensors
Result-
ID
Time AVG(temp)
ID-1 11AM 34.6
ID-2 12PM 56.6
ID-3 1PM 50
Outlier
Outlier
Why ID-2 and
ID-3 have those
high values?
Step 1: Instrument Workflow in Spark
7#Res3SAIS
Combiner
LineageRDD
Reducer
LineageRDD
Hadoop
LineageRDD
counts
pairscodeserrorslines
Stage 1
Stage 2
reports
Stage
LineageRDD
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Input
ID
Output
ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Step 2: Example Backward Tracing
8#Res3SAIS
Worker1
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Worker1
Worker3
Input ID Output ID
[p1, p2] 400
[ p1 ] 4
Input ID Output ID
400 id1
4 id2
Reducer Stage
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Input ID Output ID
p1 400
Input ID Output ID
p1 400
Stage.Input IDReducer.Output ID
Worker2
Step 2: Example Backward Tracing
9#Res3SAIS
Worker1
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Worker1
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Input ID Output ID
p1 400
Input ID Output ID
p1 400
Worker2
Combiner.Output ID Reducer.Output ID
Combiner.Output ID Reducer.Output ID
Step 2: Example Backward Tracing
10#Res3SAIS
Worker1
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner
Worker1
Input ID Output ID
offset1 id1
offset2 id2
offset3 id3
Input ID Output ID
{ id1, id 3} 400
{ id2 } 4
Hadoop Combiner
Input ID Output ID
offset1 id1
… …
Input ID Output ID
{ id1, …} 400
Hadoop Combiner Worker2
Combiner.Input IDHadoop.Output ID
Combiner.Input IDHadoop.Output ID
BigSift: Automated Debugging of Big
Data Analytics in DISC Applications
SoCC 2017
Muhammad Ali Gulzar, Matteo Interlandi, Xueyuan Han, Tyson
Condie, Miryung Kim
11#Res3SAIS
Motivating Example for Automated
Debugging with BigSift
• Alice writes a Spark program that identifies, for each state in the US,
the delta between the minimum and the maximum snowfall reading
for each day of any year and for any particular year.
• An input data record that measures 1 foot of snowfall on January 1st of
Year 1992, in the 99504 zip code (Anchorage, AK) area, appears as
99504 , 01/01/1992 , 1ft
12#Res3SAIS
Debugging Incorrect Output
13
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
99504 , 03/01/1993 , 145mm
99504 , 01/01/1994 , 245mm
99504 , 01/01/1993 , 85mm
90031 , 02/01/1991 , 0mm
AK , 01/01 , [304.8, 21336, 245, 85]
AK , 03/01 , [30.5 , 145]
AK , 1992 , [304.8 , 30.5]
AK , 1993 , [21336, 145, 85]
AK , 1994 , [245]
CA , 02/01 , [0]
CA , 1991 , [0]
TextFile FlatMap GroupByKey Map Output
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 03/01 , 145
AK , 1993 , 145
AK , 01/01 , 245
AK , 1994 , 245
…. ….
AK , 01/01 , 21251
AK , 03/01 , 114.5
AK , 1992 , 274.3
AK , 1993 , 21251
AK , 1994 , 0
CA , 02/01 , 0
CA , 1991 , 0
It is challenging because the input records is very large and the program
involves computing min and max and a unit conversion
#Res3SAIS
Data Provenance with Titian [VLDB ‘15]
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
99504 , 03/01/1993 , 145mm
99504 , 01/01/1994 , 245mm
99504 , 01/01/1993 , 85mm
90031 , 02/01/1991 , 0mm
AK , 01/01 , [304.8, 21336, 245, 85]
AK , 03/01 , [30.5 , 145]
AK , 1992 , [304.8 , 30.5]
AK , 1993 , [21336, 145, 85]
AK , 1994 , [245]
CA , 02/01 , [0]
CA , 1991 , [0]
TextFile FlatMap GroupByKey Map Output
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 03/01 , 145
AK , 1993 , 145
AK , 01/01 , 245
AK , 1994 , 245
…. ….
AK , 01/01 , 21251
AK , 03/01 , 114.5
AK , 1992 , 274.3
AK , 1993 , 21251
AK , 1994 , 0
CA , 02/01 , 0
CA , 1991 , 0
#Res3SAIS 14
It over-approximates the scope of failure-inducing inputs i.e. records in
the faulty key-group are all marked as faulty
Delta Debugging
• Alice tests the output from different input configurations using a test function
15
Run 1 fails the test
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
99504 , 03/01/1993 , 145mm
99504 , 01/01/1994 , 245mm
99504 , 01/01/1993 , 85mm
90031 , 02/01/1991 , 0mm
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 03/01 , 145
AK , 1993 , 145
AK , 01/01 , 245
AK , 1994 , 245
…. ….
AK , 01/01 , [304.8, 21336, 245, 85]
AK , 03/01 , [30.5 , 145]
AK , 1992 , [304.8 , 30.5]
AK , 1993 , [21336, 145, 85]
AK , 1994 , [245]
CA , 02/01 , [0]
CA , 1991 , [0]
AK , 01/01 , 21251
AK , 03/01 , 114.5
AK , 1992 , 274.3
AK , 1993 , 21251
AK , 1994 , 0
CA , 02/01 , 0
CA , 1991 , 0
TextFile FlatMap GroupByKey Map Output
1
2
def test(key:String, delta: Float) : Boolean = {
delta < 6000
}
#Res3SAIS
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
16
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 01/01 , [304.8, 21336]
AK , 03/01 , [30.5]
AK , 1992 , [304.8 , 30.5]
AK , 1993 , [21336]
AK , 01/01 , 21031.2
AK , 03/01 , 0
AK , 1992 , 274.3
AK , 1993 , 0
Run 2 fails the test
1
2
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
17
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , [304.8]
AK , 03/01 , [30.5]
AK , 1992 , [304.8 , 30.5]
AK , 01/01 , 0
AK , 03/01 , 0
AK , 1992 , 274.3
Run 3 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
18
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 01/01 , [21336]
AK , 1993 , [21336]
AK , 01/01 , 0
AK , 1993 , 0
Run 4 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
19
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 01/01 , [304.8]
AK , 1992 , [304.8]
AK , 01/01 , 0
AK , 1992 , 0
Run 5 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
20
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 03/01 , [30.5]
AK , 1992 , [30.5]
AK , 03/01 , 0
AK , 1992 , 0
Run 6 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
21
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 01/01 , [21336]
AK , 1993 , [21336]
AK , 01/01 , 0
AK , 1993 , 0
Run 7 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
22
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 03/01 , [30.5]
AK , 1992 , [30.5]
AK , 01/01 , [21336]
AK , 1993 , [21336]
AK , 03/01 , 0
AK , 1992 , 0
AK , 01/01 , 0
AK , 1993 , 0
Run 8 passes the test
Delta Debugging
• DD is an systematic debugging procedure guided by a test function.
DD is inefficient because it does not eliminate irrelevant input records
upfront.
23
TextFile FlatMap GroupByKey Map Output
#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 01/01 , [304.8 , 21336]
AK , 1992 , [304.8]
AK , 1993 , [21336]
AK , 01/01 , 21031.2
AK , 1992 , 0
AK , 1993 , 0
Run 9 fails the test
Automated Debugging in DISC with BigSift
24#Res3SAIS
Test
Predicate
Pushdown
Prioritizing
Backward
Traces
Bitmap based
Test
Memoization
Input: A Spark
Program, A Test
Function
Output: Minimum Fault-
Inducing
Input Records
Data Provenance + Delta Debugging
Optimization 1: Test Predicate Pushdown
• Observation: During backward tracing, data provenance traces
through all the partitions even though only a few partitions are faulty
25#Res3SAIS
If applicable, BigSift pushes down the test function to test the
output of combiners in order to isolate the faulty partitions.
Test
Test
Test
Test
Test
Test
Test
Optimization 2 :Prioritizing Backward Traces
• Observation: The same faulty input record may contribute to multiple
output records failing the test.
26#Res3SAIS
99504 , 01/01/1992 , 1ft
99504 , 03/01/1992 , 0.1ft
99504 , 01/01/1993 , 70in
99504 , 03/01/1993 , 145mm
99504 , 01/01/1994 , 245mm
99504 , 01/01/1993 , 85mm
90031 , 02/01/1991 , 0mm
AK , 01/01 , [304.8, 21336, 245, 85]
AK , 03/01 , [30.5 , 145]
AK , 1992 , [304.8 , 30.5]
AK , 1993 , [21336, 145, 85]
AK , 1994 , [245]
CA , 02/01 , [0]
CA , 1991 , [0]
TextFile FlatMap GroupByKey Map Output
AK , 01/01 , 304.8
AK , 1992 , 304.8
AK , 03/01 , 30.5
AK , 1992 , 30.5
AK , 01/01 , 21336
AK , 1993 , 21336
AK , 03/01 , 145
AK , 1993 , 145
AK , 01/01 , 245
AK , 1994 , 245
…. ….
AK , 01/01 , 21251
AK , 03/01 , 114.5
AK , 1992 , 274.3
AK , 1993 , 21251
AK , 1994 , 0
CA , 02/01 , 0
CA , 1991 , 0
In case of multiple faulty outputs, BigSift overlaps two backward
traces to minimize the scope of fault-inducing input records
Optimization 3: Bitmap based Test Memoization
• Observation: Delta debugging may
try running a program on the same
subset of input redundantly.
• BigSift leverages bitmap to compactly
encode the offsets of original input to
refer to an input subset
#Res3SAIS
0
1
0
1
0
0
0
0
1
1
Input Data Bitmap
!
Test Outcome
We use a bitmap based test memoization technique to avoid
redundant testing of the same input dataset.
Demo
Debugging Time
29#Res3SAIS
Subject Program Running Time (sec) Debugging Time (sec)
Subject Program Fault Original Job DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
Inverted Index Code 107.7 584.2 13.4 43.6X
Rating Histogram Code 40.3 263.4 16.6 15.9X
Sequence Count Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
College Student Data 53.1 235.3 31.8 7.4X
Weather Analysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
BigSift provides up to a 66X speed up in isolating the precise
fault-inducing input records, in comparison to the baseline DD
Debugging Time
30#Res3SAIS
Subject Program Running Time (sec) Debugging Time (sec)
Subject Program Fault Original Job DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
Inverted Index Code 107.7 584.2 13.4 43.6X
Rating Histogram Code 40.3 263.4 16.6 15.9X
Sequence Count Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
College Student Data 53.1 235.3 31.8 7.4X
Weather Analysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
On average, BigSift takes 62% less time to debug a single faulty
output than the time taken for a single run on the entire data.
Debugging Time
31#Res3SAIS
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
0 2000 4000 6000 8000 10000 12000 14000
#offault-inducinginput
records
Fault Localization Time (s)
Sequence Count
Delta Debugging
BigSift
Test Driven Data
Provenance
Data Provenance
On average, BigSift takes 62% less time to debug a single faulty
output than the time taken for a single run on the entire data.
Fault Localizability over Data Provenance
32#Res3SAIS
143796
6487290
520904
23411
5800
15003060
2554788
350
2
1350
15 13
1 1 1 1 1 1 2
1
10
100
1000
10000
100000
1000000
10000000
100000000
Movie
Historgram
Inverted Index Rating
Histogram
Sequence
Count
Rating
Frequency
College
Students
Weather
Analysis
#offault-inducinginputrecords
Data Provenance Test Driven Data Provenance BigSift & DD
BigSift leverages DD after DP to continue fault isolation, achieving
several orders of magnitude 103 to 107 better precision.
Summary
• BigSift is the first piece of work in automated debugging of big data
analytics in DISC.
• BigSift provides 103X – 107X more precision than data provenance
in terms of fault localizability.
• It provides up to 66X speed up in debugging time over baseline
Delta Debugging.
• In our evaluation we have observed that, on average, BigSift finds
the faulty input in 62% less than the original job execution time.
33#Res3SAIS

More Related Content

What's hot

Performance tuning through iterations
Performance tuning through iterationsPerformance tuning through iterations
Performance tuning through iterations
Gert Franz
 
Oracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and riversOracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and rivers
Cary Millsap
 
Most important "trick" of performance instrumentation
Most important "trick" of performance instrumentationMost important "trick" of performance instrumentation
Most important "trick" of performance instrumentation
Cary Millsap
 
Presentation indexing new features oracle 11g release 1 and release 2
Presentation    indexing new features oracle 11g release 1 and release 2Presentation    indexing new features oracle 11g release 1 and release 2
Presentation indexing new features oracle 11g release 1 and release 2
xKinAnx
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Dr. Mohieddin Moradi
 

What's hot (6)

Performance tuning through iterations
Performance tuning through iterationsPerformance tuning through iterations
Performance tuning through iterations
 
Oracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and riversOracle trace data collection errors: the story about oceans, islands, and rivers
Oracle trace data collection errors: the story about oceans, islands, and rivers
 
Most important "trick" of performance instrumentation
Most important "trick" of performance instrumentationMost important "trick" of performance instrumentation
Most important "trick" of performance instrumentation
 
Presentation indexing new features oracle 11g release 1 and release 2
Presentation    indexing new features oracle 11g release 1 and release 2Presentation    indexing new features oracle 11g release 1 and release 2
Presentation indexing new features oracle 11g release 1 and release 2
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
 

Similar to Automated Debugging of Big Data Analytics in Apache Spark Using BigSift with Muhammad Ali Gulzar and Miryung Kim

Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
Georgi Sotirov
 
Reducing Failure Analysis Time: An Industrial Evaluation
Reducing Failure Analysis Time: An Industrial EvaluationReducing Failure Analysis Time: An Industrial Evaluation
Reducing Failure Analysis Time: An Industrial Evaluation
Mojdeh Golagha
 
Roll grinding Six Sigma project
Roll grinding Six Sigma projectRoll grinding Six Sigma project
Roll grinding Six Sigma project
Tariq Aziz
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
fcamachob
 
Design of security room
Design of security roomDesign of security room
Design of security room
Nabendu Lodh
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
Anurag Srivastava
 
Optimization based Chassis Design
Optimization based Chassis DesignOptimization based Chassis Design
Optimization based Chassis DesignAltair
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
Studio 3 Geology Training Manual.pdf
Studio 3 Geology Training Manual.pdfStudio 3 Geology Training Manual.pdf
Studio 3 Geology Training Manual.pdf
RonaldQuispePerez2
 
Oracle Database 12c Application Development
Oracle Database 12c Application DevelopmentOracle Database 12c Application Development
Oracle Database 12c Application Development
Saurabh K. Gupta
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
OpenWorld 2018 - Common Application Developer Disasters
OpenWorld 2018 - Common Application Developer DisastersOpenWorld 2018 - Common Application Developer Disasters
OpenWorld 2018 - Common Application Developer Disasters
Connor McDonald
 
Dun ddd
Dun dddDun ddd
Scheduling by Primavera - Training
Scheduling by Primavera - TrainingScheduling by Primavera - Training
Scheduling by Primavera - TrainingMohammed Feroze
 
Mersen semiconductor fuses-oman-stockist
Mersen semiconductor fuses-oman-stockistMersen semiconductor fuses-oman-stockist
BI Apps ETL-SSIS 2008 & 2012, Pentaho & Talend
BI Apps ETL-SSIS 2008 & 2012, Pentaho & TalendBI Apps ETL-SSIS 2008 & 2012, Pentaho & Talend
BI Apps ETL-SSIS 2008 & 2012, Pentaho & TalendSunny U Okoro
 
Plaxis Report.pdf
Plaxis Report.pdfPlaxis Report.pdf
Plaxis Report.pdf
YETCHAN QUISPE VERA
 

Similar to Automated Debugging of Big Data Analytics in Apache Spark Using BigSift with Muhammad Ali Gulzar and Miryung Kim (20)

Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
Reducing Failure Analysis Time: An Industrial Evaluation
Reducing Failure Analysis Time: An Industrial EvaluationReducing Failure Analysis Time: An Industrial Evaluation
Reducing Failure Analysis Time: An Industrial Evaluation
 
Roll grinding Six Sigma project
Roll grinding Six Sigma projectRoll grinding Six Sigma project
Roll grinding Six Sigma project
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Design of security room
Design of security roomDesign of security room
Design of security room
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
Optimization based Chassis Design
Optimization based Chassis DesignOptimization based Chassis Design
Optimization based Chassis Design
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Studio 3 Geology Training Manual.pdf
Studio 3 Geology Training Manual.pdfStudio 3 Geology Training Manual.pdf
Studio 3 Geology Training Manual.pdf
 
Oracle Database 12c Application Development
Oracle Database 12c Application DevelopmentOracle Database 12c Application Development
Oracle Database 12c Application Development
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
OpenWorld 2018 - Common Application Developer Disasters
OpenWorld 2018 - Common Application Developer DisastersOpenWorld 2018 - Common Application Developer Disasters
OpenWorld 2018 - Common Application Developer Disasters
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
Scheduling by Primavera - Training
Scheduling by Primavera - TrainingScheduling by Primavera - Training
Scheduling by Primavera - Training
 
Sql 3
Sql 3Sql 3
Sql 3
 
Mersen semiconductor fuses-oman-stockist
Mersen semiconductor fuses-oman-stockistMersen semiconductor fuses-oman-stockist
Mersen semiconductor fuses-oman-stockist
 
BI Apps ETL-SSIS 2008 & 2012, Pentaho & Talend
BI Apps ETL-SSIS 2008 & 2012, Pentaho & TalendBI Apps ETL-SSIS 2008 & 2012, Pentaho & Talend
BI Apps ETL-SSIS 2008 & 2012, Pentaho & Talend
 
Plaxis Report.pdf
Plaxis Report.pdfPlaxis Report.pdf
Plaxis Report.pdf
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Automated Debugging of Big Data Analytics in Apache Spark Using BigSift with Muhammad Ali Gulzar and Miryung Kim

  • 1. Muhammad Ali Gulzar Miryung Kim University of California, Los Angeles Automated Debugging of Big Data Analytics in Apache Spark Using BigSift #Res3SAIS
  • 2. Big Data Debugging in the Dark 2#Res3SAIS Map Reduce Develop locally Hope it works Run in cloud Bug! Guesswork
  • 3. BigDebug: Debugging Primitives for Interactive Big Data Processing ICSE 2016 Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali Tyson Condie, Todd Millstein, Miryung Kim Presented at Spark Summit 2017 3#Res3SAIS
  • 4. Interactive Debugging with BigDebug 4#Res3SAIS 4. Backward and Forward Tracing 1. Simulated Breakpoint 2. On Demand Guarded Watchpoint 3. Crash Culprit Identification
  • 5. Titian: Data Provenance support in Spark VLDB 2015 Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, Tyson Condie 5#Res3SAIS
  • 6. Data Provenance – in SQL 6#Res3SAIS Tuple-ID Time Sendor-ID Temperature T1 11AM 1 34 T2 11AM 2 35 T3 11AM 3 35 T4 12PM 1 35 T5 12PM 2 35 T6 12PM 3 100 T7 1PM 1 35 T8 1PM 2 35 T9 1PM 3 80 SELECT time, AVG(temp) FROM sensors GROUP BY time Sensors Result- ID Time AVG(temp) ID-1 11AM 34.6 ID-2 12PM 56.6 ID-3 1PM 50 Outlier Outlier Why ID-2 and ID-3 have those high values?
  • 7. Step 1: Instrument Workflow in Spark 7#Res3SAIS Combiner LineageRDD Reducer LineageRDD Hadoop LineageRDD counts pairscodeserrorslines Stage 1 Stage 2 reports Stage LineageRDD Input ID Output ID { id1, id 3} 400 { id2 } 4 Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID [p1, p2] 400 [ p1 ] 4 Input ID Output ID 400 id1 4 id2
  • 8. Step 2: Example Backward Tracing 8#Res3SAIS Worker1 Worker3 Input ID Output ID [p1, p2] 400 [ p1 ] 4 Input ID Output ID 400 id1 4 id2 Reducer Stage Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Worker1 Worker3 Input ID Output ID [p1, p2] 400 [ p1 ] 4 Input ID Output ID 400 id1 4 id2 Reducer Stage Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Input ID Output ID p1 400 Input ID Output ID p1 400 Stage.Input IDReducer.Output ID Worker2
  • 9. Step 2: Example Backward Tracing 9#Res3SAIS Worker1 Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Worker1 Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Input ID Output ID p1 400 Input ID Output ID p1 400 Worker2 Combiner.Output ID Reducer.Output ID Combiner.Output ID Reducer.Output ID
  • 10. Step 2: Example Backward Tracing 10#Res3SAIS Worker1 Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Worker1 Input ID Output ID offset1 id1 offset2 id2 offset3 id3 Input ID Output ID { id1, id 3} 400 { id2 } 4 Hadoop Combiner Input ID Output ID offset1 id1 … … Input ID Output ID { id1, …} 400 Hadoop Combiner Worker2 Combiner.Input IDHadoop.Output ID Combiner.Input IDHadoop.Output ID
  • 11. BigSift: Automated Debugging of Big Data Analytics in DISC Applications SoCC 2017 Muhammad Ali Gulzar, Matteo Interlandi, Xueyuan Han, Tyson Condie, Miryung Kim 11#Res3SAIS
  • 12. Motivating Example for Automated Debugging with BigSift • Alice writes a Spark program that identifies, for each state in the US, the delta between the minimum and the maximum snowfall reading for each day of any year and for any particular year. • An input data record that measures 1 foot of snowfall on January 1st of Year 1992, in the 99504 zip code (Anchorage, AK) area, appears as 99504 , 01/01/1992 , 1ft 12#Res3SAIS
  • 13. Debugging Incorrect Output 13 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in 99504 , 03/01/1993 , 145mm 99504 , 01/01/1994 , 245mm 99504 , 01/01/1993 , 85mm 90031 , 02/01/1991 , 0mm AK , 01/01 , [304.8, 21336, 245, 85] AK , 03/01 , [30.5 , 145] AK , 1992 , [304.8 , 30.5] AK , 1993 , [21336, 145, 85] AK , 1994 , [245] CA , 02/01 , [0] CA , 1991 , [0] TextFile FlatMap GroupByKey Map Output AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 03/01 , 145 AK , 1993 , 145 AK , 01/01 , 245 AK , 1994 , 245 …. …. AK , 01/01 , 21251 AK , 03/01 , 114.5 AK , 1992 , 274.3 AK , 1993 , 21251 AK , 1994 , 0 CA , 02/01 , 0 CA , 1991 , 0 It is challenging because the input records is very large and the program involves computing min and max and a unit conversion #Res3SAIS
  • 14. Data Provenance with Titian [VLDB ‘15] 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in 99504 , 03/01/1993 , 145mm 99504 , 01/01/1994 , 245mm 99504 , 01/01/1993 , 85mm 90031 , 02/01/1991 , 0mm AK , 01/01 , [304.8, 21336, 245, 85] AK , 03/01 , [30.5 , 145] AK , 1992 , [304.8 , 30.5] AK , 1993 , [21336, 145, 85] AK , 1994 , [245] CA , 02/01 , [0] CA , 1991 , [0] TextFile FlatMap GroupByKey Map Output AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 03/01 , 145 AK , 1993 , 145 AK , 01/01 , 245 AK , 1994 , 245 …. …. AK , 01/01 , 21251 AK , 03/01 , 114.5 AK , 1992 , 274.3 AK , 1993 , 21251 AK , 1994 , 0 CA , 02/01 , 0 CA , 1991 , 0 #Res3SAIS 14 It over-approximates the scope of failure-inducing inputs i.e. records in the faulty key-group are all marked as faulty
  • 15. Delta Debugging • Alice tests the output from different input configurations using a test function 15 Run 1 fails the test 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in 99504 , 03/01/1993 , 145mm 99504 , 01/01/1994 , 245mm 99504 , 01/01/1993 , 85mm 90031 , 02/01/1991 , 0mm AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 03/01 , 145 AK , 1993 , 145 AK , 01/01 , 245 AK , 1994 , 245 …. …. AK , 01/01 , [304.8, 21336, 245, 85] AK , 03/01 , [30.5 , 145] AK , 1992 , [304.8 , 30.5] AK , 1993 , [21336, 145, 85] AK , 1994 , [245] CA , 02/01 , [0] CA , 1991 , [0] AK , 01/01 , 21251 AK , 03/01 , 114.5 AK , 1992 , 274.3 AK , 1993 , 21251 AK , 1994 , 0 CA , 02/01 , 0 CA , 1991 , 0 TextFile FlatMap GroupByKey Map Output 1 2 def test(key:String, delta: Float) : Boolean = { delta < 6000 } #Res3SAIS
  • 16. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 16 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 01/01 , [304.8, 21336] AK , 03/01 , [30.5] AK , 1992 , [304.8 , 30.5] AK , 1993 , [21336] AK , 01/01 , 21031.2 AK , 03/01 , 0 AK , 1992 , 274.3 AK , 1993 , 0 Run 2 fails the test 1 2
  • 17. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 17 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , [304.8] AK , 03/01 , [30.5] AK , 1992 , [304.8 , 30.5] AK , 01/01 , 0 AK , 03/01 , 0 AK , 1992 , 274.3 Run 3 passes the test
  • 18. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 18 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 21336 AK , 1993 , 21336 AK , 01/01 , [21336] AK , 1993 , [21336] AK , 01/01 , 0 AK , 1993 , 0 Run 4 passes the test
  • 19. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 19 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 01/01 , [304.8] AK , 1992 , [304.8] AK , 01/01 , 0 AK , 1992 , 0 Run 5 passes the test
  • 20. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 20 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 03/01 , [30.5] AK , 1992 , [30.5] AK , 03/01 , 0 AK , 1992 , 0 Run 6 passes the test
  • 21. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 21 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 21336 AK , 1993 , 21336 AK , 01/01 , [21336] AK , 1993 , [21336] AK , 01/01 , 0 AK , 1993 , 0 Run 7 passes the test
  • 22. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 22 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 03/01 , [30.5] AK , 1992 , [30.5] AK , 01/01 , [21336] AK , 1993 , [21336] AK , 03/01 , 0 AK , 1992 , 0 AK , 01/01 , 0 AK , 1993 , 0 Run 8 passes the test
  • 23. Delta Debugging • DD is an systematic debugging procedure guided by a test function. DD is inefficient because it does not eliminate irrelevant input records upfront. 23 TextFile FlatMap GroupByKey Map Output #Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 01/01 , [304.8 , 21336] AK , 1992 , [304.8] AK , 1993 , [21336] AK , 01/01 , 21031.2 AK , 1992 , 0 AK , 1993 , 0 Run 9 fails the test
  • 24. Automated Debugging in DISC with BigSift 24#Res3SAIS Test Predicate Pushdown Prioritizing Backward Traces Bitmap based Test Memoization Input: A Spark Program, A Test Function Output: Minimum Fault- Inducing Input Records Data Provenance + Delta Debugging
  • 25. Optimization 1: Test Predicate Pushdown • Observation: During backward tracing, data provenance traces through all the partitions even though only a few partitions are faulty 25#Res3SAIS If applicable, BigSift pushes down the test function to test the output of combiners in order to isolate the faulty partitions. Test Test Test Test Test Test Test
  • 26. Optimization 2 :Prioritizing Backward Traces • Observation: The same faulty input record may contribute to multiple output records failing the test. 26#Res3SAIS 99504 , 01/01/1992 , 1ft 99504 , 03/01/1992 , 0.1ft 99504 , 01/01/1993 , 70in 99504 , 03/01/1993 , 145mm 99504 , 01/01/1994 , 245mm 99504 , 01/01/1993 , 85mm 90031 , 02/01/1991 , 0mm AK , 01/01 , [304.8, 21336, 245, 85] AK , 03/01 , [30.5 , 145] AK , 1992 , [304.8 , 30.5] AK , 1993 , [21336, 145, 85] AK , 1994 , [245] CA , 02/01 , [0] CA , 1991 , [0] TextFile FlatMap GroupByKey Map Output AK , 01/01 , 304.8 AK , 1992 , 304.8 AK , 03/01 , 30.5 AK , 1992 , 30.5 AK , 01/01 , 21336 AK , 1993 , 21336 AK , 03/01 , 145 AK , 1993 , 145 AK , 01/01 , 245 AK , 1994 , 245 …. …. AK , 01/01 , 21251 AK , 03/01 , 114.5 AK , 1992 , 274.3 AK , 1993 , 21251 AK , 1994 , 0 CA , 02/01 , 0 CA , 1991 , 0 In case of multiple faulty outputs, BigSift overlaps two backward traces to minimize the scope of fault-inducing input records
  • 27. Optimization 3: Bitmap based Test Memoization • Observation: Delta debugging may try running a program on the same subset of input redundantly. • BigSift leverages bitmap to compactly encode the offsets of original input to refer to an input subset #Res3SAIS 0 1 0 1 0 0 0 0 1 1 Input Data Bitmap ! Test Outcome We use a bitmap based test memoization technique to avoid redundant testing of the same input dataset.
  • 28. Demo
  • 29. Debugging Time 29#Res3SAIS Subject Program Running Time (sec) Debugging Time (sec) Subject Program Fault Original Job DD BigSift Improvement Movie Histogram Code 56.2 232.8 17.3 13.5X Inverted Index Code 107.7 584.2 13.4 43.6X Rating Histogram Code 40.3 263.4 16.6 15.9X Sequence Count Code 356.0 13772.1 208.8 66.0X Rating Frequency Code 77.5 437.9 14.9 29.5X College Student Data 53.1 235.3 31.8 7.4X Weather Analysis Data 238.5 999.1 89.9 11.1X Transit Analysis Code 45.5 375.8 20.2 18.6X BigSift provides up to a 66X speed up in isolating the precise fault-inducing input records, in comparison to the baseline DD
  • 30. Debugging Time 30#Res3SAIS Subject Program Running Time (sec) Debugging Time (sec) Subject Program Fault Original Job DD BigSift Improvement Movie Histogram Code 56.2 232.8 17.3 13.5X Inverted Index Code 107.7 584.2 13.4 43.6X Rating Histogram Code 40.3 263.4 16.6 15.9X Sequence Count Code 356.0 13772.1 208.8 66.0X Rating Frequency Code 77.5 437.9 14.9 29.5X College Student Data 53.1 235.3 31.8 7.4X Weather Analysis Data 238.5 999.1 89.9 11.1X Transit Analysis Code 45.5 375.8 20.2 18.6X On average, BigSift takes 62% less time to debug a single faulty output than the time taken for a single run on the entire data.
  • 31. Debugging Time 31#Res3SAIS 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 0 2000 4000 6000 8000 10000 12000 14000 #offault-inducinginput records Fault Localization Time (s) Sequence Count Delta Debugging BigSift Test Driven Data Provenance Data Provenance On average, BigSift takes 62% less time to debug a single faulty output than the time taken for a single run on the entire data.
  • 32. Fault Localizability over Data Provenance 32#Res3SAIS 143796 6487290 520904 23411 5800 15003060 2554788 350 2 1350 15 13 1 1 1 1 1 1 2 1 10 100 1000 10000 100000 1000000 10000000 100000000 Movie Historgram Inverted Index Rating Histogram Sequence Count Rating Frequency College Students Weather Analysis #offault-inducinginputrecords Data Provenance Test Driven Data Provenance BigSift & DD BigSift leverages DD after DP to continue fault isolation, achieving several orders of magnitude 103 to 107 better precision.
  • 33. Summary • BigSift is the first piece of work in automated debugging of big data analytics in DISC. • BigSift provides 103X – 107X more precision than data provenance in terms of fault localizability. • It provides up to 66X speed up in debugging time over baseline Delta Debugging. • In our evaluation we have observed that, on average, BigSift finds the faulty input in 62% less than the original job execution time. 33#Res3SAIS