Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data-Intensive Scalable Science:
Beyond MapReduce
Bill Howe, UW
QuickTime™ and a
decompressor
are needed to see this pictu...
QuickTime™ and a
decompressor
are needed to see this picture.
http://escience.washington.edu
01/30/15 Bill Howe, UW 4
01/30/15 Bill Howe, UW 5
01/30/15 Bill Howe, UW 6
Science is reducing to a database problem
Old model: “Query the world” (Data acquisition coupled ...
01/30/15 Bill Howe, UW 7
Biology
Oceanography
Astronomy
Two dimensions#ofbytes
# of apps
LSST
SDSS
Galaxy
BioMart
GEO
IOOS...
01/30/15 Bill Howe, UW 8
Roadmap
 Introduction
 Context: RDBMS, MapReduce, etc.
 New Extensions for Science
 Spatial C...
01/30/15 Bill Howe, UW 9
What Does Scalable Mean?
 In the past: Out-of-core
 “Works even if data doesn’t fit in main mem...
01/30/15 Bill Howe, UW 10
Taxonomy of Parallel Architectures
Easiest to program, but
$$$$
Scales to 1000s of computers
01/30/15 Bill Howe, UW 11
Design Space
11
ThroughputLatency
Internet
Private
data
center
Data-
parallel
Shared
memory
DISC...
01/30/15 Bill Howe, UW 12
Some distributed algorithm…
Map
(Shuffle)
Reduce
01/30/15 Bill Howe, UW 13
MapReduce Programming Model
 Input & Output: each a set of key/value pairs
 Programmer specifi...
01/30/15 Bill Howe, UW 14
Example: What does this do?
map(String input_key, String input_value):
// input_key: document na...
01/30/15 Bill Howe, UW 15
Example: Rendering
Bronson et al. Vis 2010 (submitted)
01/30/15 Bill Howe, UW 16
Example: Isosurface Extraction
Bronson et al. Vis 2010 (submitted)
01/30/15 Bill Howe, UW 17
Large-Scale Data Processing
 Many tasks process big data, produce big data
 Want to use hundre...
01/30/15 Bill Howe, UW 18
What’s wrong with MapReduce?
 Literally Map then Reduce and that’s it…
 Reducers write to repl...
01/30/15 Bill Howe, UW 19
Realistic Job = Directed Acyclic Graph
Processing
vertices
Channels
(file, pipe,
shared
memory)
...
01/30/15 Bill Howe, UW 20
Pre-Relational: if your data changed, your application broke.
Early RDBMS were buggy and slow (a...
01/30/15 Bill Howe, UW 21
Key Idea: Data Independence
physical data independence
logical data independence
files and
point...
01/30/15 Bill Howe, UW 22
Key Idea: Indexes
 Databases are especially, but exclusively, effective at
“Needle in Haystack”...
01/30/15 Bill Howe, UW 23
Key Idea: An Algebra of Tables
select
project
join join
Other operators: aggregate, union, diffe...
01/30/15 Bill Howe, UW 24
Key Idea: Algebraic Optimization
N = ((z*2)+((z*3)+0))/1
Algebraic Laws:
1. (+) identity: x+0 = ...
01/30/15 Bill Howe, UW 25
Key Idea: Declarative Languages
SELECT *
FROM Order o, Item i
WHERE o.item = i.item
AND o.date =...
01/30/15 Bill Howe, UW 26
Shared Nothing Parallel Databases
 Teradata
 Greenplum
 Netezza
 Aster Data Systems
 Datall...
01/30/15 Bill Howe, UW 27
Example System: Teradata
AMP = unit of parallelism
01/30/15 Bill Howe, UW 28
Example System: Teradata
SELECT *
FROM Orders o, Lines i
WHERE o.item = i.item
AND o.date = toda...
01/30/15 Bill Howe, UW 29
Example System: Teradata
AMP 1 AMP 2 AMP 3
select
date=today()
select
date=today()
select
date=t...
01/30/15 Bill Howe, UW 30
Example System: Teradata
AMP 1 AMP 2 AMP 3
scan
Item i
AMP 4 AMP 5 AMP 6
hash
h(item)
scan
Item ...
01/30/15 Bill Howe, UW 31
Example System: Teradata
AMP 4 AMP 5 AMP 6
join join join
o.item = i.item o.item = i.item o.item...
01/30/15 Bill Howe, UW 32
MapReduce Contemporaries
 Dryad (Microsoft)
 Relational Algebra
 Pig (Yahoo)
 Near Relationa...
01/30/15 Bill Howe, UW 33
Example System: Yahoo Pig
Pig Latin
program
01/30/15 Bill Howe, UW 34
MapReduce vs RDBMS
 RDBMS
 Declarative query languages
 Schemas
 Logical Data Independence
...
01/30/15 Bill Howe, UW 35
ComparisonData Model Prog. Model Services
GPL * * Typing (maybe)
Workflow * dataflow typing, pro...
01/30/15 Bill Howe, UW 36
Roadmap
 Introduction
 Context: RDBMS, MapReduce, etc.
 New Extensions for Science
 Recursiv...
01/30/15 Bill Howe, UW 37
PageRank
url rank
www.a.com 1.0
www.b.com 1.0
www.c.com 1.0
www.d.com 1.0
www.e.com 1.0
url_src ...
01/30/15 Bill Howe, UW 38
MapReduce Implementation
mi1
mi2
mi3
r01
r02
mi4
mi5
r03
r04
Ri
L-split1
L-split0
mi6
mi7
ri5
ri...
01/30/15 Bill Howe, UW 39
What’s the problem?
 L is loaded and shuffled in each iteration
 L never changes
m01
m02
m03
r...
01/30/15 Bill Howe, UW 40
HaLoop: Loop-aware Hadoop
 Hadoop: Loop control in client program
 HaLoop: Loop control in mas...
01/30/15 Bill Howe, UW 41
Feature: Inter-iteration Locality
 Mapper Output Cache
 K-means
 Neural network analysis
 Re...
01/30/15 Bill Howe, UW 42
HaLoop Architecture
[Bu et al. VLDB 2010 (submitted)]
01/30/15 Bill Howe, UW 43
Experiments
 Amazon EC2
 20, 50, 90 default small instances
 Datasets
 Billions of Triples (...
01/30/15 Bill Howe, UW 44
Application Run Time
 Transitive Closure
 PageRank
[Bu et al. VLDB 2010 (submitted)]
01/30/15 Bill Howe, UW 45
Join Time
 Transitive Closure
 PageRank
[Bu et al. VLDB 2010 (submitted)]
01/30/15 Bill Howe, UW 46
Run Time Distribution
 Transitive Closure
 PageRank
[Bu et al. VLDB 2010 (submitted)]
01/30/15 Bill Howe, UW 47
Fixpoint Evaluation
 PageRank
[Bu et al. VLDB 2010 (submitted)]
01/30/15 Bill Howe, UW 48
Roadmap
 Introduction
 Context: RDBMS, MapReduce, etc.
 New Extensions for Science
 Recursiv...
01/30/15 Bill Howe, UW 49
N-body Astrophysics Simulation
• 15 years in dev
• 109
particles
• Months to run
• 7.5 million
C...
01/30/15 Bill Howe, UW 50
Q1: Find Hot Gas
SELECT id
FROM gas
WHERE temp > 150000
01/30/15 Bill Howe, UW 51
Single Node: Query 1
169 MB 1.4 GB 36 GB
[IASDS 09]
01/30/15 Bill Howe, UW 52
Multiple Nodes: Query 1
Database Z
[IASDS 09]
01/30/15 Bill Howe, UW 53
Q4: Gas Deletion
SELECT gas1.id
FROM gas1
FULL OUTER JOIN gas2
ON gas1.id=gas2.id
WHERE gas2.id=...
01/30/15 Bill Howe, UW 54
Single Node: Query 4 [IASDS 09]
01/30/15 Bill Howe, UW 55
Multiple Nodes: Query 4 [IASDS 09]
01/30/15 Bill Howe, UW 56
New Task: Scalable Clustering
 Group particles into spatial clusters
QuickTime™ and a
decompres...
01/30/15 Bill Howe, UW 57
Scalable Clustering
QuickTime™ and a
decompressor
are needed to see this picture.
[Kwon SSDBM 20...
01/30/15 Bill Howe, UW 58
Scalable Clustering in Dryad
QuickTime™ and a
decompressor
are needed to see this picture.
[Kwon...
01/30/15 Bill Howe, UW 59
Scalable Clustering in Dryad
YongChul Kwon, Dylan Nunlee, Jeff Gardner, Sarah Loebman, Magda
Bal...
01/30/15 Bill Howe, UW 60
Roadmap
 Introduction
 Context: RDBMS, MapReduce, etc.
 New Extensions for Science
 Recursiv...
01/30/15 Bill Howe, UW 61
Example: Friends of Friends
P1
II
C1 C2
C3
P3
I
P4
P2
I
C4
C5 C6
01/30/15 Bill Howe, UW 62
Example: Friends of Friends
P1
II
C1 C2
C3
P3
I
P4
P2
I
C4
C5 C6
mergemerge
P1
C1 C2
C3
P3
I
P4
...
01/30/15 Bill Howe, UW 63
Example: Friends of Friends
mergemerge
P1
C1 C2
C3
P3
I
P4
P2
C4
C5 C6
C5 → C3
C6 → C4
C4 → C3
C...
01/30/15 Bill Howe, UW 64
What’s going on?!
Local FoF
Merge
Example: Unbalanced Computation
The top red line runs for 1.5 ...
01/30/15 Bill Howe, UW 65
Which one is better?
 How to decompose space?
 How to schedule?
 How to avoid memory overrun?
01/30/15 Bill Howe, UW 66
Optimal Partitioning Plan Non-Trivial
 Fine grained partitions
 Less data = Less skew
 Framew...
01/30/15 Bill Howe, UW 67
Skew Reduce Framework
 User provides three functions
 Plus (optionally) two cost functions
S =...
01/30/15 Bill Howe, UW 68
Skew Reduce Framework
SampleSample
Static
Plan
Static
Plan
Partition Process Merge Finalize
Inpu...
01/30/15 Bill Howe, UW 69
Contiribution: SkewReduce
 Two algorithms: Serial/Merge algorithm
 Two cost functions for each...
01/30/15 Bill Howe, UW 70
Does SkewReduce work?
 Static plan yields 2 ~ 8 times faster running time
Coarse Fine Finer Fin...
01/30/15 Bill Howe, UW 71
Data-Intensive
Scalable Science
01/30/15 Bill Howe, UW 72
BACKUP SLIDES
01/30/15 Bill Howe, UW 73
Visualization + Data Management
“Transferring the whole data generated … to a storage device or ...
01/30/15 Bill Howe, UW 74
Converging Requirements
Core vis techniques (isosurfaces, volume rendering, …)
Emphasis on inter...
01/30/15 Bill Howe, UW 75
Converging Requirements
Declarative languages
Automatic data-parallelism
Algebraic optimization
...
01/30/15 Bill Howe, UW 76
Converging Requirements
Vis: “Query-driven Visualization”
Vis: “In Situ Visualization”
Vis: “Rem...
01/30/15 Bill Howe, UW 77
Desiderata for a “VisDB”
 New Data Model
 Structured and Unstructured
Grids
 New Query Langua...
Upcoming SlideShare
Loading in …5
×

Data-Intensive Scalable Science

1,346 views

Published on

Invited talk to the Scientific Computing and Imaging Institute at the University of Utah on data-intensive scalable computing applied to science.

Published in: Technology
  • Be the first to comment

Data-Intensive Scalable Science

  1. 1. Data-Intensive Scalable Science: Beyond MapReduce Bill Howe, UW QuickTime™ and a decompressor are needed to see this picture. …plus a bunch of people
  2. 2. QuickTime™ and a decompressor are needed to see this picture.
  3. 3. http://escience.washington.edu
  4. 4. 01/30/15 Bill Howe, UW 4
  5. 5. 01/30/15 Bill Howe, UW 5
  6. 6. 01/30/15 Bill Howe, UW 6 Science is reducing to a database problem Old model: “Query the world” (Data acquisition coupled to a specific hypothesis) New model: “Download the world” (Data acquired en masse, in support of many hypotheses)  Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)  Oceanography: high-resolution models, cheap sensors, satellites  Biology: lab automation, high-throughput sequencing,
  7. 7. 01/30/15 Bill Howe, UW 7 Biology Oceanography Astronomy Two dimensions#ofbytes # of apps LSST SDSS Galaxy BioMart GEO IOOS OOI LANL HIVPathway Commons PanSTARRS
  8. 8. 01/30/15 Bill Howe, UW 8 Roadmap  Introduction  Context: RDBMS, MapReduce, etc.  New Extensions for Science  Spatial Clustering  Recursive MapReduce  Skew Handling
  9. 9. 01/30/15 Bill Howe, UW 9 What Does Scalable Mean?  In the past: Out-of-core  “Works even if data doesn’t fit in main memory”  Now: Parallel  “Can make use of 1000s of independent computers”
  10. 10. 01/30/15 Bill Howe, UW 10 Taxonomy of Parallel Architectures Easiest to program, but $$$$ Scales to 1000s of computers
  11. 11. 01/30/15 Bill Howe, UW 11 Design Space 11 ThroughputLatency Internet Private data center Data- parallel Shared memory DISC This talk slide src: Michael Isard, MSR
  12. 12. 01/30/15 Bill Howe, UW 12 Some distributed algorithm… Map (Shuffle) Reduce
  13. 13. 01/30/15 Bill Howe, UW 13 MapReduce Programming Model  Input & Output: each a set of key/value pairs  Programmer specifies two functions:  Processes input key/value pair  Produces set of intermediate pairs  Combines all intermediate values for a particular key  Produces a set of merged output values (usually just one) map (in_key, in_value) -> list(out_key, intermediate_value) reduce (out_key, list(intermediate_value)) -> list(out_value) Inspired by primitives from functional programming languages such as Lisp, Scheme, and Haskell slide source: Google, Inc.
  14. 14. 01/30/15 Bill Howe, UW 14 Example: What does this do? map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, 1); reduce(String output_key, Iterator intermediate_values): // output_key: word // output_values: ???? int result = 0; for each v in intermediate_values: result += v; Emit(result); slide source: Google, Inc.
  15. 15. 01/30/15 Bill Howe, UW 15 Example: Rendering Bronson et al. Vis 2010 (submitted)
  16. 16. 01/30/15 Bill Howe, UW 16 Example: Isosurface Extraction Bronson et al. Vis 2010 (submitted)
  17. 17. 01/30/15 Bill Howe, UW 17 Large-Scale Data Processing  Many tasks process big data, produce big data  Want to use hundreds or thousands of CPUs  ... but this needs to be easy  Parallel databases exist, but they are expensive, difficult to set up, and do not necessarily scale to hundreds of nodes.  MapReduce is a lightweight framework, providing:  Automatic parallelization and distribution  Fault-tolerance  I/O scheduling  Status and monitoring
  18. 18. 01/30/15 Bill Howe, UW 18 What’s wrong with MapReduce?  Literally Map then Reduce and that’s it…  Reducers write to replicated storage  Complex jobs pipeline multiple stages  No fault tolerance between stages  Map assumes its data is always available: simple!  What else?
  19. 19. 01/30/15 Bill Howe, UW 19 Realistic Job = Directed Acyclic Graph Processing vertices Channels (file, pipe, shared memory) Inputs Outputs slide credit: Michael Isard, MSR
  20. 20. 01/30/15 Bill Howe, UW 20 Pre-Relational: if your data changed, your application broke. Early RDBMS were buggy and slow (and often reviled), but required only 5% of the application code. “Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.” Key Ideas: Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation Relational Database History -- Codd 1979
  21. 21. 01/30/15 Bill Howe, UW 21 Key Idea: Data Independence physical data independence logical data independence files and pointers relations view s SELECT * FROM my_sequences SELECT seq FROM ncbi_sequences WHERE seq = ‘GATTACGATATTA’; f = fopen(‘table_file’); fseek(10030440); while (True) { fread(&buf, 1, 8192, f); if (buf == GATTACGATATTA) { . . .
  22. 22. 01/30/15 Bill Howe, UW 22 Key Idea: Indexes  Databases are especially, but exclusively, effective at “Needle in Haystack” problems:  Extracting small results from big datasets  Transparently provide “old style” scalability  Your query will always* finish, regardless of dataset size.  Indexes are easily built and automatically used when appropriateCREATE INDEX seq_idx ON sequence(seq); SELECT seq FROM sequence WHERE seq = ‘GATTACGATATTA’; *almost
  23. 23. 01/30/15 Bill Howe, UW 23 Key Idea: An Algebra of Tables select project join join Other operators: aggregate, union, difference, cross product
  24. 24. 01/30/15 Bill Howe, UW 24 Key Idea: Algebraic Optimization N = ((z*2)+((z*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*z two operations instead of five, no division operator Same idea works with the Relational Algebra!
  25. 25. 01/30/15 Bill Howe, UW 25 Key Idea: Declarative Languages SELECT * FROM Order o, Item i WHERE o.item = i.item AND o.date = today() join select scan scan date = today() o.item = i.item Order oItem i Find all orders from today, along with the items ordered
  26. 26. 01/30/15 Bill Howe, UW 26 Shared Nothing Parallel Databases  Teradata  Greenplum  Netezza  Aster Data Systems  Datallegro  Vertica  MonetDB Microsoft Recently commercialized as “Vectorwise”
  27. 27. 01/30/15 Bill Howe, UW 27 Example System: Teradata AMP = unit of parallelism
  28. 28. 01/30/15 Bill Howe, UW 28 Example System: Teradata SELECT * FROM Orders o, Lines i WHERE o.item = i.item AND o.date = today() join select scan scan date = today() o.item = i.item Order oItem i Find all orders from today, along with the items ordered
  29. 29. 01/30/15 Bill Howe, UW 29 Example System: Teradata AMP 1 AMP 2 AMP 3 select date=today() select date=today() select date=today() scan Order o scan Order o scan Order o hash h(item) hash h(item) hash h(item) AMP 4 AMP 5 AMP 6
  30. 30. 01/30/15 Bill Howe, UW 30 Example System: Teradata AMP 1 AMP 2 AMP 3 scan Item i AMP 4 AMP 5 AMP 6 hash h(item) scan Item i hash h(item) scan Item i hash h(item)
  31. 31. 01/30/15 Bill Howe, UW 31 Example System: Teradata AMP 4 AMP 5 AMP 6 join join join o.item = i.item o.item = i.item o.item = i.item contains all orders and all lines where hash(item) = 1 contains all orders and all lines where hash(item) = 2 contains all orders and all lines where hash(item) = 3
  32. 32. 01/30/15 Bill Howe, UW 32 MapReduce Contemporaries  Dryad (Microsoft)  Relational Algebra  Pig (Yahoo)  Near Relational Algebra over MapReduce  HIVE (Facebook)  SQL over MapReduce  Cascading  Relational Algebra  Clustera  U of Wisconsin  Hbase  Indexing on HDFS
  33. 33. 01/30/15 Bill Howe, UW 33 Example System: Yahoo Pig Pig Latin program
  34. 34. 01/30/15 Bill Howe, UW 34 MapReduce vs RDBMS  RDBMS  Declarative query languages  Schemas  Logical Data Independence  Indexing  Algebraic Optimization  Caching/Materialized Views  ACID/Transactions  MapReduce  High Scalability  Fault-tolerance  “One-person deployment” HIVE, Pig, Dryad Dryad, Pig, HIVE Pig, (Dryad, HIVE) Hbase
  35. 35. 01/30/15 Bill Howe, UW 35 ComparisonData Model Prog. Model Services GPL * * Typing (maybe) Workflow * dataflow typing, provenance, scheduling, caching, task parallelism, reuse Relational Algebra Relations Select, Project, Join, Aggregate, … optimization, physical data independence, data parallelism MapReduce [(key,value)] Map, Reduce massive data parallelism, fault tolerance MS Dryad IQueryable, IEnumerable RA + Apply + Partitioning typing, massive data parallelism, fault tolerance MPI Arrays/ Matrices 70+ ops data parallelism, full control
  36. 36. 01/30/15 Bill Howe, UW 36 Roadmap  Introduction  Context: RDBMS, MapReduce, etc.  New Extensions for Science  Recursive MapReduce  Skew Handling
  37. 37. 01/30/15 Bill Howe, UW 37 PageRank url rank www.a.com 1.0 www.b.com 1.0 www.c.com 1.0 www.d.com 1.0 www.e.com 1.0 url_src url_dest www.a.com www.b.com www.a.com www.c.com www.c.com www.a.com www.e.com www.c.com www.d.com www.b.com www.c.com www.e.com www.e.com www.c.om www.a.com www.d.com Rank Table R0 Linkage Table L url rank www.a.com 2.13 www.b.com 3.89 www.c.com 2.60 www.d.com 2.60 www.e.com 2.13 Rank Table R3 Ri L Ri.rank = Ri.rank/γurlCOUNT(url_dest) Ri.url = L.url_src π(url_dest, γurl_destSUM(rank)) Ri+1 [Bu et al. VLDB 2010 (submitted)]
  38. 38. 01/30/15 Bill Howe, UW 38 MapReduce Implementation mi1 mi2 mi3 r01 r02 mi4 mi5 r03 r04 Ri L-split1 L-split0 mi6 mi7 ri5 ri6 Not done ! i=i+1 Converged? Join & compute rank Aggregate fixpoint evaluation Client done ! [Bu et al. VLDB 2010 (submitted)]
  39. 39. 01/30/15 Bill Howe, UW 39 What’s the problem?  L is loaded and shuffled in each iteration  L never changes m01 m02 m03 r01 r02 Ri L-split1 L-split0 [Bu et al. VLDB 2010 (submitted)]
  40. 40. 01/30/15 Bill Howe, UW 40 HaLoop: Loop-aware Hadoop  Hadoop: Loop control in client program  HaLoop: Loop control in master node Hadoop HaLoop [Bu et al. VLDB 2010 (submitted)]
  41. 41. 01/30/15 Bill Howe, UW 41 Feature: Inter-iteration Locality  Mapper Output Cache  K-means  Neural network analysis  Reducer Input Cache  Recursive join  PageRank  HITs  Social network analysis  Reducer Output Cache  Fixpiont evaluation [Bu et al. VLDB 2010 (submitted)]
  42. 42. 01/30/15 Bill Howe, UW 42 HaLoop Architecture [Bu et al. VLDB 2010 (submitted)]
  43. 43. 01/30/15 Bill Howe, UW 43 Experiments  Amazon EC2  20, 50, 90 default small instances  Datasets  Billions of Triples (120GB)  Freebase (12GB)  Livejournal social network (18GB)  Queries  Transitive Closure  PageRank  k-means [Bu et al. VLDB 2010 (submitted)]
  44. 44. 01/30/15 Bill Howe, UW 44 Application Run Time  Transitive Closure  PageRank [Bu et al. VLDB 2010 (submitted)]
  45. 45. 01/30/15 Bill Howe, UW 45 Join Time  Transitive Closure  PageRank [Bu et al. VLDB 2010 (submitted)]
  46. 46. 01/30/15 Bill Howe, UW 46 Run Time Distribution  Transitive Closure  PageRank [Bu et al. VLDB 2010 (submitted)]
  47. 47. 01/30/15 Bill Howe, UW 47 Fixpoint Evaluation  PageRank [Bu et al. VLDB 2010 (submitted)]
  48. 48. 01/30/15 Bill Howe, UW 48 Roadmap  Introduction  Context: RDBMS, MapReduce, etc.  New Extensions for Science  Recursive MapReduce  Skew Handling
  49. 49. 01/30/15 Bill Howe, UW 49 N-body Astrophysics Simulation • 15 years in dev • 109 particles • Months to run • 7.5 million CPU hours • 500 timesteps • Big Bang to now Simulations from Tom Quinn’s Lab, work by Sarah Loebman, YongChul Kwon, Bill Howe, Jeff Gardner, Magda Balazinska
  50. 50. 01/30/15 Bill Howe, UW 50 Q1: Find Hot Gas SELECT id FROM gas WHERE temp > 150000
  51. 51. 01/30/15 Bill Howe, UW 51 Single Node: Query 1 169 MB 1.4 GB 36 GB [IASDS 09]
  52. 52. 01/30/15 Bill Howe, UW 52 Multiple Nodes: Query 1 Database Z [IASDS 09]
  53. 53. 01/30/15 Bill Howe, UW 53 Q4: Gas Deletion SELECT gas1.id FROM gas1 FULL OUTER JOIN gas2 ON gas1.id=gas2.id WHERE gas2.id=NULL Particles removed between two timesteps [IASDS 09]
  54. 54. 01/30/15 Bill Howe, UW 54 Single Node: Query 4 [IASDS 09]
  55. 55. 01/30/15 Bill Howe, UW 55 Multiple Nodes: Query 4 [IASDS 09]
  56. 56. 01/30/15 Bill Howe, UW 56 New Task: Scalable Clustering  Group particles into spatial clusters QuickTime™ and a decompressor are needed to see this picture. [Kwon SSDBM 2010]
  57. 57. 01/30/15 Bill Howe, UW 57 Scalable Clustering QuickTime™ and a decompressor are needed to see this picture. [Kwon SSDBM 2010]
  58. 58. 01/30/15 Bill Howe, UW 58 Scalable Clustering in Dryad QuickTime™ and a decompressor are needed to see this picture. [Kwon SSDBM 2010]
  59. 59. 01/30/15 Bill Howe, UW 59 Scalable Clustering in Dryad YongChul Kwon, Dylan Nunlee, Jeff Gardner, Sarah Loebman, Magda Balazinska, Bill Howe QuickTime™ and a decompressor are needed to see this picture. non-skewed skewed
  60. 60. 01/30/15 Bill Howe, UW 60 Roadmap  Introduction  Context: RDBMS, MapReduce, etc.  New Extensions for Science  Recursive MapReduce  Skew Handling
  61. 61. 01/30/15 Bill Howe, UW 61 Example: Friends of Friends P1 II C1 C2 C3 P3 I P4 P2 I C4 C5 C6
  62. 62. 01/30/15 Bill Howe, UW 62 Example: Friends of Friends P1 II C1 C2 C3 P3 I P4 P2 I C4 C5 C6 mergemerge P1 C1 C2 C3 P3 I P4 P2 I C4 C5 C6 C5 → C3 C6 → C4 Merge P1, P3 Merge P2, P4
  63. 63. 01/30/15 Bill Howe, UW 63 Example: Friends of Friends mergemerge P1 C1 C2 C3 P3 I P4 P2 C4 C5 C6 C5 → C3 C6 → C4 C4 → C3 C5 → C3 C6 → C3 P1 C1 C2 C3 P3 I P4 P2 I C4 C5 C6 Merge P1-P3, P2-P4
  64. 64. 01/30/15 Bill Howe, UW 64 What’s going on?! Local FoF Merge Example: Unbalanced Computation The top red line runs for 1.5 hours 5 minutes
  65. 65. 01/30/15 Bill Howe, UW 65 Which one is better?  How to decompose space?  How to schedule?  How to avoid memory overrun?
  66. 66. 01/30/15 Bill Howe, UW 66 Optimal Partitioning Plan Non-Trivial  Fine grained partitions  Less data = Less skew  Framework overhead dominates  Finding optimal point is time consuming  No guarantee of successful merge phase Can we find a good partitioning plan without trial and error?
  67. 67. 01/30/15 Bill Howe, UW 67 Skew Reduce Framework  User provides three functions  Plus (optionally) two cost functions S = sample of the input block; α and B are metadata about the block
  68. 68. 01/30/15 Bill Howe, UW 68 Skew Reduce Framework SampleSample Static Plan Static Plan Partition Process Merge Finalize InputInput Local ResultLocal Result OutputOutput •User supplied cost function •Could run in offline •User supplied cost function •Could run in offline • Hierarchically reconcile local result • Hierarchically reconcile local result • Update local result and produce final result • Update local result and produce final result Process Merge Local ResultLocal Result Data at boundary + Reconcile State Local Result Intermediate Reconciliation State
  69. 69. 01/30/15 Bill Howe, UW 69 Contiribution: SkewReduce  Two algorithms: Serial/Merge algorithm  Two cost functions for each algorithm  Find a good partition plan and schedule Serial Algorithm Merge Algorithm Cost functions Cost functions
  70. 70. 01/30/15 Bill Howe, UW 70 Does SkewReduce work?  Static plan yields 2 ~ 8 times faster running time Coarse Fine Finer Finest Manual Opt 14.1 8.8 4.1 5.7 2.0 1.6 87.2 63.1 77.7 98.7 - 14.1 Hours Minutes
  71. 71. 01/30/15 Bill Howe, UW 71 Data-Intensive Scalable Science
  72. 72. 01/30/15 Bill Howe, UW 72 BACKUP SLIDES
  73. 73. 01/30/15 Bill Howe, UW 73 Visualization + Data Management “Transferring the whole data generated … to a storage device or a visualization machine could become a serious bottleneck, because I/O would take most of the … time. A more feasible approach is to reduce and prepare the data in situ for subsequent visualization and data analysis tasks.” -- SciDAC Review We can no longer afford two separate systems
  74. 74. 01/30/15 Bill Howe, UW 74 Converging Requirements Core vis techniques (isosurfaces, volume rendering, …) Emphasis on interactive performance Mesh data as a first-class citizen Vis DB
  75. 75. 01/30/15 Bill Howe, UW 75 Converging Requirements Declarative languages Automatic data-parallelism Algebraic optimization Vis DB
  76. 76. 01/30/15 Bill Howe, UW 76 Converging Requirements Vis: “Query-driven Visualization” Vis: “In Situ Visualization” Vis: “Remote Visualization” DB: “Push the computation to the data” Vis DB
  77. 77. 01/30/15 Bill Howe, UW 77 Desiderata for a “VisDB”  New Data Model  Structured and Unstructured Grids  New Query Language  Scalable grid-aware operators  Native visualization primitives  New indexing and optimization techniques  “Smart” Query Results  Interactive Apps/Dashboards

×