SlideShare a Scribd company logo
1 of 50
Analyzing and Querying Big
Scientific Data
Thomas Heinis
Data-Driven Scientific Discovery
2
Human
Brain
ProjectSDSS
LHC ATLAS
Scientists Are Overwhelmed with Big Data
Large Hadron Collider
12 Petabytes / experiment
Sloan Digital Sky Survey
4 Petabytes / year
Human Brain Project
~100 Gigabytes / sec
Scientific Data Growth
3
0
1
2
3
4
5
6
7
8
9
10
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
CumulativeSizeofDatasets
[Petabytes]
Year
Astronomy [NRAO]
Physics [LHC]
Simulation [ICESS]
Gene Sequencing [EBI]
Scientific Data Grows Exponentially!
Data in the Simulation Sciences
4
COVERAGE
RESOLUTION
Increasinglevelofdetail
Dimensions are Multiplicative!
Increasing model size by order of magnitude
What is the Human Brain Project?
A 10-year European initiative to
understand the human brain,
enabling advances in neuroscience,
medicine and future computing.
A consortium of 250+ Scientists, 135
Research Groups, from over 80
institutions, and more than 20
countries in Europe and beyond.
Human Brain Project - Vision
 Future Medicine
 Symptom-based to biology-based classification
 Unique signatures of diseases
 Early diagnosis
 Future Neuroscience
 Multi-level view of brain
 Causal chain of events from genes to cognition
 Future Computing
 Supercomputing as scientific method
 Human like intelligence
Brain Simulation – Wet Lab
7
Neuron structure & electrophysiological
properties:
Simulating the Brain
Spatial
Analysis
Static 3D
Exploration
Interactive 3D
Exploration
Simulation Science Data Challenges
9
Simulation
Observational
Data
Post
Simulation
Data
Dynamic 3D
Exploration
Need Scalable Spatial Access Methods
Spatial
Modeling
Spatial
Analysis
Static 3D
Exploration
Interactive 3D
Exploration
Simulation Science Data Challenges
10
Simulation
Observational
Data
Post
Simulation
Data
Dynamic 3D
Exploration
Need Scalable Spatial Access Methods
Spatial
Modeling
Static Exploration
11
Neural Tissue Model
Single
Neuron
3D Model
Efficient Spatial Index is Crucial
3D Spatial
Range Query
State-of-the-Art Spatial Indexes
12
R-Tree: Hierarchy of Minimum Bounding Rectangles (MBR)
R-Trees Variants:
Hilbert packed R-Tree
STR R-Tree
PR-Tree
Overlap
Range
Query
Structural Overlap Degrades Performance
0
50
100
150
200
250
300
50 100 150 200 250 300 350 400 450
Time[seconds]
Dataset Density [Million of Elements per unit Volume]
Hilbert R-Tree
STR R-Tree
PR-Tree
13
Scalability Challenge
Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk.
Range Queries: Uniform Random 500 for each experiment.
Spatial Density Increases with Dataset SizeState of the Art Does Not Scale with Density
FLAT: A Two Phase Spatial Index
2) CRAWLING: Traverse neighborhood
1) SEEDING: Find any one object
Requires
Reachability
14
Use Connectivity To Avoid Overlap
Key Idea: Two phases, each
independent of overlap:
Earthquake
simulations
datasets
No
Problem!
FLAT: Reachability Problem
Convex Dataset Geometry
Never crawl outside the query bound
15
Connectivity
For accessing neighboring objects
in data.
REQUIREMENTS:
Not every dataset satisfies this requirement!
No path
inside query
No
Connectivity
FLAT: Reachability
16
1) Partitioning
Group spatially close
elements
2) Linking
Connect neighboring
partitions
Add Connectivity → Enable Recursive
Crawling
Index Building:
FLAT: Seeding Phase
17
Seed
R-Tree
R-Tree for seeding, but will it scale with density?
Seeding phase avoids overlap overhead in R-Tree
Overlap
Seed query picks
one child arbitrarily
Seed
Query
Seeding is fast
page reads = ~height of tree.
Range Query: Find ALL element inside query
Seed Query: Find ANY ONE element inside query
Seed
Partition
FLAT: Crawling Phase
The neighbor links are used for recursive graph traversal
Starting from the seed page
18Linear complexity in terms of graph edges
Range Query
0
50
100
150
200
250
300
50 100 150 200 250 300 350 400 450
Time[seconds]
Dataset Density [Million of Elements per unit Volume]
Hilbert R-Tree
STR R-Tree
PR-Tree
FLAT
19
FLAT: Performance Evaluation
Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk.
Range Queries: Uniform Random 500 for each experiment.
Spatial Density Increases with Dataset SizeDecouples Execution Time from Density
7.8 x
FLAT: Scalability
20
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
50 100 150 200 250 300 350 400 450
TimeperResultObject[ms]
Dataset Density [Million of Elements per Unit Volume]
Hilbert R-Tree
STR R-Tree
PR-Tree
FLAT
Seeding cost amortizes with
increase in result cardinality
Trend is “FLAT”, Scales With Density
Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk.
Range Queries: Uniform Random 500 for each experiment.
FLAT: iPad Implementation
21
http://www.youtube.com/watch?v=zaUEARq-IY0
Static 3D
Exploration
Interactive 3D
Exploration
Simulation Science Data Challenges
22
Simulation
Observational
Data
Post
Simulation
Data
Dynamic 3D
Exploration
Spatial
Modeling
Interactive Exploration
2323
Bronchial Tree of the Lung
Arterial Tree of the Heart
Spatial Range Query
SequencesGuiding
Path
Guided Analysis Ubiquitous in Scientific Applications
Neural Network
Guiding paths are not known in advance
Interactive execution of query sequence
Interactive Query Execution
24
DISK
CPU
Retrieve Query ResultsProcess Results
Time
1st Query 2nd Query 3rd Query
Predictive Prefetching Hides Data Retrieval Cost
Prefetching Opportunity
1st Query 3rd Query2nd Query
Path decided after
processing results
Prefetch DataPrediction
Predict next query location in the sequence
Prefetch data of next query into prefetch cache
Existing techniques:
Extrapolate past query locations
Exponential Weighted Moving Average (EWMA)
Straight Line
Hilbert Prefetching
Predictive Prefetching
25
Large Volume
Queries
Small Volume
Queries
0
5
10
15
20
25
30
35
40
45
50
10k 80k 150k 220k
CacheHitRate[%]
Volume of Query [µm3]
Neuroscience Data set
25 query in sequence
Not Efficient With Arbitrary Query Volume!
SCOUT: Content Aware Prefetching
26
Key Insight: Use previous query content!
Approach:
1. Inspect query results
2. Identify guiding path
3. Predict next query using guiding path
Need to Identify Guiding Path
?
SCOUT: How paths are defined
27
Query results = many primitive spatial objects.
Idea: Graph Framework
G(V,E) such that, Vertices = spatial objects,
Edges between nearby objects.
Independence from data representation
Exact graph
N2 comparisons!
Grid Hash based construction
Approximate Graph Representation
Range Query
Paths
Candidate set
SCOUT: Guiding Path Identification
Iterative Candidate Pruning
Key Insight: Guiding path goes through all queries!
28
n
n+1
n+2
n+3
Guiding path
Predicted
Query
Longer Sequence → Better Prediction
Prefetch duration not known in advance.
Query dimension not known in advance.
Idea: Incremental Prefetching
Repeatedly prefetch growing regions
By extrapolating guiding path
nth query in sequence
SCOUT: Where to Prefetch
29Independence from query size
Guiding
Path
Exit
….
.
p1 p2 pn
Policy = safest region first
0
10
20
30
40
50
60
70
80
90
100CacheHitRate[%] EWMA Straight Line
Hilbert SCOUT
SCOUT: Prediction Accuracy
30
Sequence 1 Sequence 2
Visualization
Cache Hit Rate = Amount of data retrieved from cache
Total amount of data retrieved
x 100
80K [μm3]
32
Query Volume:
Sequence Length:
20K [μm3]
32
Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk
72% - 91% Prediction Accuracy
SCOUT speeds up sequences up to 14.7x
Speedup 2x
Speedup 14.7x
SCOUT: Scalability
31
Increase in Data set Size
0
20
40
60
80
100
50M 150M 250M 350M 450M
Data set Size
[# of spatial objects]
SCOUT
CacheHitRate[%]
SCOUT scales with increase in data set size
CPU
DISK
Retrieve Query ResultsProcessing Results
Time
3rd Query2nd Query
PredictionPrefetching
SCOUT Overhead
0
50
100
150
200
50M 150M 250M 350M 450M
Time[sec]
Data set Size [# of spatial objects]
Prediction
Retrieve Query Results
15-16%
Static 3D
Exploration
Interactive 3D
Exploration
Simulation Science Data Challenges
32
Simulation
Observational
Data
Post
Simulation
Data
Dynamic 3D
Exploration
Spatial
Modeling
Dynamic Exploration
33
Mesh: Collection of 3D Connected Polyhedra
Mesh → Enable High Precision 3D Models
Polyhedra Connected Polyhedra Volumetric
Mesh Model
3D Vertices Shared
Faces
Challenge: Monitoring Memory Resident Spatial Mesh Models
Monitoring Mesh Simulations
34Problem: Efficiently Execute Range Queries
Time step 1 Time step 2 Time step 3
timeSimulation
Time step
Simulation
Time step
Updates Queries
Monitor Monitor
Data Challenge
35Need: Solution That Scales
Mesh Detail:
Highly Dynamic:
Unpredictable Mesh Movement
Updates Affect Entire Dataset
Mesh Detail Increases
With Dataset Size
Now Future
Timestep 2Timestep 1
State of the Art
36
Moving Object Indexes
TPR-Tree, STRIPES
Neither Scales with Size nor Detail!
Mesh Movement
is Inherently
Unpredictable
Static Spatial Indexes
R-Tree, LUR-Tree, QU-
Trade
Linear Scan
Coarse Grained Fine Grained
Performance Evaluation
37
Linear Scan Outperforms Indexed Approaches
Not Enough Queries to Invest
on Index Maintenance
Monitor
timeSimulation
Time step
Monitor
Simulation
Time step
Few
Queries
Massive
Updates
SETUP:
Neural Mesh Dataset: 1.32 Billion
Tetrahedral Mesh (33GB)
15 Queries per 60 simulation time step
0
1000
2000
3000
4000
5000
6000
7000
8000
Statistical Analysis
Microbenchmark
TotalQueryResponseTime[sec]
LinearScan OCTREE
LUR-Tree QU-Trade
99.5%
80%
72%
Maintenance
Can We Do Better?
38Mesh Connectivity → Query Execution
Reduce Search Space → Index Approach
No Maintenance → Linear Scan
Best of Both Worlds
Not Rely on External Data Structure:
→ Directly use in-memory Mesh Data
Mesh Graph Traversal:
→ Retrieve Results in Spatial Proximity
OCTOPUS: Idea
Vertices
Edges
Mesh Graph
Key Insight: Use Mesh Connectivity to Retrieve
Query Results!
OCTOPUS
39
Range Query
Update Oblivious Query Execution
Time step 1 Time step 2 Time step 3
What About
Non-Convex Meshes?
OCTOPUS: Non-Convex Meshes
40Using Mesh Surface Guarantees Accuracy
?
No Reachability! Surface Scan
OCTOPUS: Mesh Deformation
41
Deformation: Zero Cost of surface maintenance
Scales With Massive Updates
Time step 1 Time step 2 Time step 3
Graph changes
OCTOPUS: Mesh Detail
42Scales with Mesh Resolution
Quadratic Increase
Surface Points
Cubic Increase
Non-Surface Points
Scalability: Surface grows slower than volume (and therefore
dataset size)!
OCTOPUS: Performance
437.3-8X Speedup
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Visualization
MicrobenchMark
Statistical Analysis
Microbenchmark
TotalQueryExecutionTime[sec]
OCTOPUS
LinearScan
OCTREE
LUR-Tree
QU-Trade
8X 7.3X
Visualization
Microbenchmark
OCTOPUS: Scalability
44
0
20
40
60
80
100
120
140
0.13 0.17 0.26 0.52 1.32
TotalQueryExecutionTime[sec]
Mesh Detail
[Tetrahedrals in Billions]
Graph Traversal
Surface Scan
OCTOPUS Breakdown
64%
41%
0
350
700
1050
1400
0.13 0.17 0.26 0.52 1.32
LinearScan
OCTOPUS
Mesh Detail
[Tetrahedrals in Billions]
TotalQueryExecutionTime[sec]
Scales with Mesh Detail
SETUP:
Queries: Uniform random
15 per time step, 60 time steps
8X 10X
Algorithm Overview
45
Simulation
Observational
Data
Post
Simulation
Data
Spatial
Analysis
Model
Validation
Spatial Modeling
OCTOPUS: ICDE’14
FLAT: ICDE’12
SCOUT: VLDB’12
TOUCH: SIGMOD’13
GIPSY: SSDBM ‘13
Human Brain Project:
Part of the toolset used every day
February 2013: first 10 million neuron model built
Still 4 orders of magnitude smaller than human brain
General Applicability:
Material Sciences
Astronomy
Geographical Information
Systems
Impact
46
2010
2008
2006
0
10
20
30
1K 10K 100K 10M
ModelSize[GB]
Simulation Size [# Neurons]
2013
(2.5 TB)
Future Challenges
47
Enable Scientific Breakthroughs via Scalable Data
Analysis!
 Address Scientific Data Trends:
→ Progressively Complex Datasets
→ Increasingly Complex Scientific Queries
→ Modern Hardware
 Approximate Queries on Big Data:
→ Use Mechanism of Learning & Forgetting to
manage Data Synopses
 Data Privacy/Anonymization
 Scalable Querying of Petascale Data
 Cloud Analytics
 Quick & efficient access to raw data
 Distributed Workflow Execution
 Provenance/Reproducibility
 Data Personalization
HBP Data Management Challenges
48
Conclusions
49
 Enabling data exploration is key to scientific
discovery.
 Prior spatial access methods do not scale with
data growth.
 Use Spatial Connectivity to achieve
scalability.
→ Explicitly Added (FLAT & TOUCH)
→ Implicitly Present in the Dataset (OCTOPUS
& SCOUT)
 Many exciting big data management
50
Thank You!
Collaborators:
Farhan Tauheed, Anastasia Ailamaki,
Felix Schürmann, Henry Markram,
Sadegh Nobari, Panagiotis Karras,
Laurynas Biveinis, Mirjana Pavlovic

More Related Content

What's hot

Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...PyData
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseYongyao Jiang
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdfZixunZhou
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Dominic Suciu
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...aimsnist
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催Preferred Networks
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueVijayananda Mohire
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining496573
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 

What's hot (20)

Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
MUDROD - Ranking
MUDROD - RankingMUDROD - Ranking
MUDROD - Ranking
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 

Viewers also liked

ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...eXascale Infolab
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearcheXascale Infolab
 
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...eXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
TECHNO MODULAR ENCLOSURE - BCH Electric Limited
TECHNO MODULAR ENCLOSURE - BCH Electric LimitedTECHNO MODULAR ENCLOSURE - BCH Electric Limited
TECHNO MODULAR ENCLOSURE - BCH Electric LimitedBCH Electric Limited
 
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundoOrdineGesu
 
Berlin jewish journey(2)
Berlin jewish journey(2)Berlin jewish journey(2)
Berlin jewish journey(2)shvax
 
JUMA, mbaya CV - new
JUMA, mbaya CV - newJUMA, mbaya CV - new
JUMA, mbaya CV - newMbaya Juma
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
Open Educational Resources & MOOC
Open Educational Resources & MOOCOpen Educational Resources & MOOC
Open Educational Resources & MOOCMazhar Laliwala
 
08 2 la proporció de la figura humana
08   2 la proporció de la figura humana08   2 la proporció de la figura humana
08 2 la proporció de la figura humanaslegna3
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 

Viewers also liked (18)

ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web Search
 
Dagstuhl2014
Dagstuhl2014Dagstuhl2014
Dagstuhl2014
 
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Chá com letras
Chá com letrasChá com letras
Chá com letras
 
TECHNO MODULAR ENCLOSURE - BCH Electric Limited
TECHNO MODULAR ENCLOSURE - BCH Electric LimitedTECHNO MODULAR ENCLOSURE - BCH Electric Limited
TECHNO MODULAR ENCLOSURE - BCH Electric Limited
 
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo
046b - Consecuencias funestas del islamismo radical árabe sobre la paz del mundo
 
Berlin jewish journey(2)
Berlin jewish journey(2)Berlin jewish journey(2)
Berlin jewish journey(2)
 
JUMA, mbaya CV - new
JUMA, mbaya CV - newJUMA, mbaya CV - new
JUMA, mbaya CV - new
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
The life of others and the cult of Feliks Dzierzynsky/La vida de otros y el c...
The life of others and the cult of Feliks Dzierzynsky/La vida de otros y el c...The life of others and the cult of Feliks Dzierzynsky/La vida de otros y el c...
The life of others and the cult of Feliks Dzierzynsky/La vida de otros y el c...
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
Open Educational Resources & MOOC
Open Educational Resources & MOOCOpen Educational Resources & MOOC
Open Educational Resources & MOOC
 
08 2 la proporció de la figura humana
08   2 la proporció de la figura humana08   2 la proporció de la figura humana
08 2 la proporció de la figura humana
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Fashion market studies_thailand_22042015
Fashion market studies_thailand_22042015Fashion market studies_thailand_22042015
Fashion market studies_thailand_22042015
 

Similar to Braintalk cuso nm

Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper PresentationShubham Singh
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsUmeå University
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of ScienceGlobus
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualizationbigdataviz_bay
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataOscar Corcho
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTimeScience
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Big Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningBig Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningJulien TREGUER
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discoveryruss9595
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 

Similar to Braintalk cuso nm (20)

Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper Presentation
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Maps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flowsMaps of sparse memory networks reveal overlapping communities in network flows
Maps of sparse memory networks reveal overlapping communities in network flows
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Data mining
Data mining Data mining
Data mining
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Big Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningBig Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learning
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discovery
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 
Crowdsourcing is for the tail
Crowdsourcing is for the tailCrowdsourcing is for the tail
Crowdsourcing is for the taileXascale Infolab
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data FrameworkseXascale Infolab
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web SearcheXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 
Crowdsourcing is for the tail
Crowdsourcing is for the tailCrowdsourcing is for the tail
Crowdsourcing is for the tail
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web Search
 

Recently uploaded

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Recently uploaded (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Braintalk cuso nm

  • 1. Analyzing and Querying Big Scientific Data Thomas Heinis
  • 2. Data-Driven Scientific Discovery 2 Human Brain ProjectSDSS LHC ATLAS Scientists Are Overwhelmed with Big Data Large Hadron Collider 12 Petabytes / experiment Sloan Digital Sky Survey 4 Petabytes / year Human Brain Project ~100 Gigabytes / sec
  • 3. Scientific Data Growth 3 0 1 2 3 4 5 6 7 8 9 10 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 CumulativeSizeofDatasets [Petabytes] Year Astronomy [NRAO] Physics [LHC] Simulation [ICESS] Gene Sequencing [EBI] Scientific Data Grows Exponentially!
  • 4. Data in the Simulation Sciences 4 COVERAGE RESOLUTION Increasinglevelofdetail Dimensions are Multiplicative! Increasing model size by order of magnitude
  • 5. What is the Human Brain Project? A 10-year European initiative to understand the human brain, enabling advances in neuroscience, medicine and future computing. A consortium of 250+ Scientists, 135 Research Groups, from over 80 institutions, and more than 20 countries in Europe and beyond.
  • 6. Human Brain Project - Vision  Future Medicine  Symptom-based to biology-based classification  Unique signatures of diseases  Early diagnosis  Future Neuroscience  Multi-level view of brain  Causal chain of events from genes to cognition  Future Computing  Supercomputing as scientific method  Human like intelligence
  • 7. Brain Simulation – Wet Lab 7 Neuron structure & electrophysiological properties:
  • 9. Spatial Analysis Static 3D Exploration Interactive 3D Exploration Simulation Science Data Challenges 9 Simulation Observational Data Post Simulation Data Dynamic 3D Exploration Need Scalable Spatial Access Methods Spatial Modeling
  • 10. Spatial Analysis Static 3D Exploration Interactive 3D Exploration Simulation Science Data Challenges 10 Simulation Observational Data Post Simulation Data Dynamic 3D Exploration Need Scalable Spatial Access Methods Spatial Modeling
  • 11. Static Exploration 11 Neural Tissue Model Single Neuron 3D Model Efficient Spatial Index is Crucial 3D Spatial Range Query
  • 12. State-of-the-Art Spatial Indexes 12 R-Tree: Hierarchy of Minimum Bounding Rectangles (MBR) R-Trees Variants: Hilbert packed R-Tree STR R-Tree PR-Tree Overlap Range Query Structural Overlap Degrades Performance
  • 13. 0 50 100 150 200 250 300 50 100 150 200 250 300 350 400 450 Time[seconds] Dataset Density [Million of Elements per unit Volume] Hilbert R-Tree STR R-Tree PR-Tree 13 Scalability Challenge Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk. Range Queries: Uniform Random 500 for each experiment. Spatial Density Increases with Dataset SizeState of the Art Does Not Scale with Density
  • 14. FLAT: A Two Phase Spatial Index 2) CRAWLING: Traverse neighborhood 1) SEEDING: Find any one object Requires Reachability 14 Use Connectivity To Avoid Overlap Key Idea: Two phases, each independent of overlap:
  • 15. Earthquake simulations datasets No Problem! FLAT: Reachability Problem Convex Dataset Geometry Never crawl outside the query bound 15 Connectivity For accessing neighboring objects in data. REQUIREMENTS: Not every dataset satisfies this requirement! No path inside query No Connectivity
  • 16. FLAT: Reachability 16 1) Partitioning Group spatially close elements 2) Linking Connect neighboring partitions Add Connectivity → Enable Recursive Crawling Index Building:
  • 17. FLAT: Seeding Phase 17 Seed R-Tree R-Tree for seeding, but will it scale with density? Seeding phase avoids overlap overhead in R-Tree Overlap Seed query picks one child arbitrarily Seed Query Seeding is fast page reads = ~height of tree. Range Query: Find ALL element inside query Seed Query: Find ANY ONE element inside query
  • 18. Seed Partition FLAT: Crawling Phase The neighbor links are used for recursive graph traversal Starting from the seed page 18Linear complexity in terms of graph edges Range Query
  • 19. 0 50 100 150 200 250 300 50 100 150 200 250 300 350 400 450 Time[seconds] Dataset Density [Million of Elements per unit Volume] Hilbert R-Tree STR R-Tree PR-Tree FLAT 19 FLAT: Performance Evaluation Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk. Range Queries: Uniform Random 500 for each experiment. Spatial Density Increases with Dataset SizeDecouples Execution Time from Density 7.8 x
  • 20. FLAT: Scalability 20 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 50 100 150 200 250 300 350 400 450 TimeperResultObject[ms] Dataset Density [Million of Elements per Unit Volume] Hilbert R-Tree STR R-Tree PR-Tree FLAT Seeding cost amortizes with increase in result cardinality Trend is “FLAT”, Scales With Density Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk. Range Queries: Uniform Random 500 for each experiment.
  • 22. Static 3D Exploration Interactive 3D Exploration Simulation Science Data Challenges 22 Simulation Observational Data Post Simulation Data Dynamic 3D Exploration Spatial Modeling
  • 23. Interactive Exploration 2323 Bronchial Tree of the Lung Arterial Tree of the Heart Spatial Range Query SequencesGuiding Path Guided Analysis Ubiquitous in Scientific Applications Neural Network
  • 24. Guiding paths are not known in advance Interactive execution of query sequence Interactive Query Execution 24 DISK CPU Retrieve Query ResultsProcess Results Time 1st Query 2nd Query 3rd Query Predictive Prefetching Hides Data Retrieval Cost Prefetching Opportunity 1st Query 3rd Query2nd Query Path decided after processing results Prefetch DataPrediction Predict next query location in the sequence Prefetch data of next query into prefetch cache
  • 25. Existing techniques: Extrapolate past query locations Exponential Weighted Moving Average (EWMA) Straight Line Hilbert Prefetching Predictive Prefetching 25 Large Volume Queries Small Volume Queries 0 5 10 15 20 25 30 35 40 45 50 10k 80k 150k 220k CacheHitRate[%] Volume of Query [µm3] Neuroscience Data set 25 query in sequence Not Efficient With Arbitrary Query Volume!
  • 26. SCOUT: Content Aware Prefetching 26 Key Insight: Use previous query content! Approach: 1. Inspect query results 2. Identify guiding path 3. Predict next query using guiding path Need to Identify Guiding Path ?
  • 27. SCOUT: How paths are defined 27 Query results = many primitive spatial objects. Idea: Graph Framework G(V,E) such that, Vertices = spatial objects, Edges between nearby objects. Independence from data representation Exact graph N2 comparisons! Grid Hash based construction Approximate Graph Representation Range Query
  • 28. Paths Candidate set SCOUT: Guiding Path Identification Iterative Candidate Pruning Key Insight: Guiding path goes through all queries! 28 n n+1 n+2 n+3 Guiding path Predicted Query Longer Sequence → Better Prediction
  • 29. Prefetch duration not known in advance. Query dimension not known in advance. Idea: Incremental Prefetching Repeatedly prefetch growing regions By extrapolating guiding path nth query in sequence SCOUT: Where to Prefetch 29Independence from query size Guiding Path Exit …. . p1 p2 pn Policy = safest region first
  • 30. 0 10 20 30 40 50 60 70 80 90 100CacheHitRate[%] EWMA Straight Line Hilbert SCOUT SCOUT: Prediction Accuracy 30 Sequence 1 Sequence 2 Visualization Cache Hit Rate = Amount of data retrieved from cache Total amount of data retrieved x 100 80K [μm3] 32 Query Volume: Sequence Length: 20K [μm3] 32 Dataset: 100K neurons, 450 Million 3D cylinders, 27 GB on disk 72% - 91% Prediction Accuracy SCOUT speeds up sequences up to 14.7x Speedup 2x Speedup 14.7x
  • 31. SCOUT: Scalability 31 Increase in Data set Size 0 20 40 60 80 100 50M 150M 250M 350M 450M Data set Size [# of spatial objects] SCOUT CacheHitRate[%] SCOUT scales with increase in data set size CPU DISK Retrieve Query ResultsProcessing Results Time 3rd Query2nd Query PredictionPrefetching SCOUT Overhead 0 50 100 150 200 50M 150M 250M 350M 450M Time[sec] Data set Size [# of spatial objects] Prediction Retrieve Query Results 15-16%
  • 32. Static 3D Exploration Interactive 3D Exploration Simulation Science Data Challenges 32 Simulation Observational Data Post Simulation Data Dynamic 3D Exploration Spatial Modeling
  • 33. Dynamic Exploration 33 Mesh: Collection of 3D Connected Polyhedra Mesh → Enable High Precision 3D Models Polyhedra Connected Polyhedra Volumetric Mesh Model 3D Vertices Shared Faces Challenge: Monitoring Memory Resident Spatial Mesh Models
  • 34. Monitoring Mesh Simulations 34Problem: Efficiently Execute Range Queries Time step 1 Time step 2 Time step 3 timeSimulation Time step Simulation Time step Updates Queries Monitor Monitor
  • 35. Data Challenge 35Need: Solution That Scales Mesh Detail: Highly Dynamic: Unpredictable Mesh Movement Updates Affect Entire Dataset Mesh Detail Increases With Dataset Size Now Future Timestep 2Timestep 1
  • 36. State of the Art 36 Moving Object Indexes TPR-Tree, STRIPES Neither Scales with Size nor Detail! Mesh Movement is Inherently Unpredictable Static Spatial Indexes R-Tree, LUR-Tree, QU- Trade Linear Scan Coarse Grained Fine Grained
  • 37. Performance Evaluation 37 Linear Scan Outperforms Indexed Approaches Not Enough Queries to Invest on Index Maintenance Monitor timeSimulation Time step Monitor Simulation Time step Few Queries Massive Updates SETUP: Neural Mesh Dataset: 1.32 Billion Tetrahedral Mesh (33GB) 15 Queries per 60 simulation time step 0 1000 2000 3000 4000 5000 6000 7000 8000 Statistical Analysis Microbenchmark TotalQueryResponseTime[sec] LinearScan OCTREE LUR-Tree QU-Trade 99.5% 80% 72% Maintenance
  • 38. Can We Do Better? 38Mesh Connectivity → Query Execution Reduce Search Space → Index Approach No Maintenance → Linear Scan Best of Both Worlds Not Rely on External Data Structure: → Directly use in-memory Mesh Data Mesh Graph Traversal: → Retrieve Results in Spatial Proximity OCTOPUS: Idea Vertices Edges Mesh Graph Key Insight: Use Mesh Connectivity to Retrieve Query Results!
  • 39. OCTOPUS 39 Range Query Update Oblivious Query Execution Time step 1 Time step 2 Time step 3 What About Non-Convex Meshes?
  • 40. OCTOPUS: Non-Convex Meshes 40Using Mesh Surface Guarantees Accuracy ? No Reachability! Surface Scan
  • 41. OCTOPUS: Mesh Deformation 41 Deformation: Zero Cost of surface maintenance Scales With Massive Updates Time step 1 Time step 2 Time step 3 Graph changes
  • 42. OCTOPUS: Mesh Detail 42Scales with Mesh Resolution Quadratic Increase Surface Points Cubic Increase Non-Surface Points Scalability: Surface grows slower than volume (and therefore dataset size)!
  • 43. OCTOPUS: Performance 437.3-8X Speedup 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Visualization MicrobenchMark Statistical Analysis Microbenchmark TotalQueryExecutionTime[sec] OCTOPUS LinearScan OCTREE LUR-Tree QU-Trade 8X 7.3X Visualization Microbenchmark
  • 44. OCTOPUS: Scalability 44 0 20 40 60 80 100 120 140 0.13 0.17 0.26 0.52 1.32 TotalQueryExecutionTime[sec] Mesh Detail [Tetrahedrals in Billions] Graph Traversal Surface Scan OCTOPUS Breakdown 64% 41% 0 350 700 1050 1400 0.13 0.17 0.26 0.52 1.32 LinearScan OCTOPUS Mesh Detail [Tetrahedrals in Billions] TotalQueryExecutionTime[sec] Scales with Mesh Detail SETUP: Queries: Uniform random 15 per time step, 60 time steps 8X 10X
  • 46. Human Brain Project: Part of the toolset used every day February 2013: first 10 million neuron model built Still 4 orders of magnitude smaller than human brain General Applicability: Material Sciences Astronomy Geographical Information Systems Impact 46 2010 2008 2006 0 10 20 30 1K 10K 100K 10M ModelSize[GB] Simulation Size [# Neurons] 2013 (2.5 TB)
  • 47. Future Challenges 47 Enable Scientific Breakthroughs via Scalable Data Analysis!  Address Scientific Data Trends: → Progressively Complex Datasets → Increasingly Complex Scientific Queries → Modern Hardware  Approximate Queries on Big Data: → Use Mechanism of Learning & Forgetting to manage Data Synopses
  • 48.  Data Privacy/Anonymization  Scalable Querying of Petascale Data  Cloud Analytics  Quick & efficient access to raw data  Distributed Workflow Execution  Provenance/Reproducibility  Data Personalization HBP Data Management Challenges 48
  • 49. Conclusions 49  Enabling data exploration is key to scientific discovery.  Prior spatial access methods do not scale with data growth.  Use Spatial Connectivity to achieve scalability. → Explicitly Added (FLAT & TOUCH) → Implicitly Present in the Dataset (OCTOPUS & SCOUT)  Many exciting big data management
  • 50. 50 Thank You! Collaborators: Farhan Tauheed, Anastasia Ailamaki, Felix Schürmann, Henry Markram, Sadegh Nobari, Panagiotis Karras, Laurynas Biveinis, Mirjana Pavlovic