SlideShare a Scribd company logo
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Optimization of Incremental Queries in
the Cloud
József Makai, Gábor Szárnyas, Ákos Horváth,
István Ráth, Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES
Incremental Query Evaluation by RETE
 AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
Invalid model fragment
 Instance model
Valid model fragment
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes
Read the changes in the
result set (deltas)
Incremental Query Evaluation by RETE
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal
Goals of IncQuery-D
 Objectives
o Distributed incremental pattern matching
o Adaptation of IncQuery tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
 Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)
Database
shard 0
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Server 0
Rete net
Indexer
layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Server 0
Indexer
layer
INCQUERY-D
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Akka
Triple store (4store),
Document DB (Mongo),
RDF over Column family
(Cumulus)
RETE Deployment Process
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
route: Route sp: SwitchPosition
Switch:Switchsensor:Sensor
switchPosition
switch
sensor
routeDefinition
RETE Deployment Process
 Construct language-
independent constraints
 Resolution of
o syntactic sugar
o type information
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
Variables route sp switch
Parameter sensor
Constraints
Edge: SwitchPosition.switch
Edge: TrackElement.sensor
Edge: Route.switchPosition
Negation: head
RETE Deployment Process
 Construct RETE structure
(platform independently)
 Optimizations:
o Model statistics
o Expected usage profile
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
join
join
join
RETE Deployment Process
 Architecture model
(Cloud infrastructure)
o Virtual Machines
• Memory limits
• CPU speed
• Storage capacity
o Communication Channels
• Bandwidth
 Specified by a textual DSL
(Xtext)
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
RETE Deployment Process
Machine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
Join1
Join3
Join2
In1 In2 In3 In4
Allocation can be optimized for
query performance and other
beneficial system characteristics!
RETE Deployment Process
 Configuration scripts for
o Deployment
o Communication
middleware
 Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
ALLOCATION OPTIMIZATION IN
INCQUERY-D
Motivation for Allocation Optimization
 Considering data-intensive
systems
o Over usage of resources
o Cost of the system
o Overhead of network
communication
Job Job
t
Local job
execution time
t’
Data transmission time
is significant component
in global execution time
~
Job
Job
Job
Network links can have
different capacities
4000 MB
Process
2000 MB
Process
500 MB
Process
2400 MB
$$$
Poor utilization leads
to expensive system
The Allocation Problem
 Inputs
 Allocation constraints
 Output: Valid allocation
 Optimization targets
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
5000 MB6000 MB
1 2
• Rete network for the query
organized to processes
• Resource consumption
Available infrastructure with
important resource parameters
Opt. Target: Communication Minimization
1 × 1,000,000
3 × 200,000 3 × 200,000
Communication = 2,200,000
6000 MB
5000 MB
1
2500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1,000,000200,000
200,000
1 2
3
4
3 × 1,000,000
1 × 200,000
1 × 200,000
Communication = 3,400,000
5000 MB
6000 MB
1
2
Largest volume of data is
sent through faster local link
Opt. Target: Cost Minimization
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 10
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 12
Heuristics in Optimization
Worker node
Production
node
Input node
Worker node
Input nodeInput node
Worker node
Production
node
Production
node
Worker node
Model
database
Number of model
elements
?? MB
Input node
Memory consumption of
Rete nodes and processes
1 1 1
1 1 1
1
Memory usage of Input
nodes can be estimated
Communication
intensity of network
communication
channels2 2
2
2
2 2
3 3
3
3 3
4 4
Performance Impact of Optimization
61K 213K 867K 3M 13M
Model size (number of elements)
Time(sec)
First evaluation time of a complex query
28
45
72
114
182
290
463
739
Max. memory
Naive
optimization
Communication
optimization
739
616
194
144
2 minutes gain!
This approach
doesn’t work for
larger models!
Network Traffic Statistics
300
349 371
1020
248 280
347
875
14
2
74
90
24
20
190
234
0
200
400
600
800
1000
1200
vm0 vm1 vm2 total vm0 vm1 vm2 total
Network Traffic in Megabytes
Remote Local
Unoptimized Optimized
 Unoptimized:
o Remote Traffic:
1020
o Local Traffic: 90
o Total Traffic: 1110
 Optimized:
o Remote Traffic:
875
o Local Traffic: 234
o Total Traffic: 1109
Conclusion and Future Work
 Results
o Novel approach for application-specific resource allocation optimization for
distributed Rete
o CPLEX-based implementation for IncQuery-D
o Preliminary evaluation results
• Significant improvements for local resource management
• Performance gains especially over slow / inhomogeneous networks
• Efficient optimization execution (supported by runtime cutoff in CPLEX)
 Future work
o Hadoop / YARN support (new IncQuery-D developments)
• Support configuration optimization for other Hadoop-based cloud apps
o Static allocation  Dynamic reallocation
• Take existing configuration as a starting constraint set
• Optimize for changed workload conditions
New INCQUERY-D Architecture
Docker container 1
Database
shard 1
Docker container 2
Database
shard 2
Docker container 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Docker container 0
Indexer
layer
New INCQUERY-D: “Hadoop over Docker”
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
• YARN resource
management
• ZooKeeper
monitoring
Akka actors
embedded into long-
running Hadoop jobs

More Related Content

Similar to Optimization of Incremental Queries CloudMDE2015

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
Vishal Sharma, Ph.D.
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
Gábor Szárnyas
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
Vishal Sharma, Ph.D.
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENEWorkshop
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
Henry Muccini
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
Gábor Szárnyas
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
Marco Santambrogio
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
IRJET Journal
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)
Louis Abalu
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
Vikas Deolaliker
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
Daniel Varro
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Deepak Shankar
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
igeeks1234
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
igeeks1234
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
igeeks1234
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_final
Aamir Habib
 
Features
FeaturesFeatures
Features
kq4pgkim1e
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
Gábor Szárnyas
 

Similar to Optimization of Incremental Queries CloudMDE2015 (20)

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_final
 
Features
FeaturesFeatures
Features
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Recently uploaded

Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

Optimization of Incremental Queries CloudMDE2015

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Optimization of Incremental Queries in the Cloud József Makai, Gábor Szárnyas, Ákos Horváth, István Ráth, Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  • 3. Incremental Query Evaluation by RETE  AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal Invalid model fragment  Instance model Valid model fragment
  • 4. Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes Read the changes in the result set (deltas) Incremental Query Evaluation by RETE join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  • 5. Goals of IncQuery-D  Objectives o Distributed incremental pattern matching o Adaptation of IncQuery tooling to graph DBs o Executed over cloud infrastructure (COTS hardware)  Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  • 6. Database shard 0 INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Server 0 Rete net Indexer layer INCQUERY-D Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication
  • 7. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Server 0 Indexer layer INCQUERY-D Indexer Indexer Indexer Indexer Join Join Antijoin Akka Triple store (4store), Document DB (Mongo), RDF over Column family (Cumulus)
  • 8. RETE Deployment Process Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } route: Route sp: SwitchPosition Switch:Switchsensor:Sensor switchPosition switch sensor routeDefinition
  • 9. RETE Deployment Process  Construct language- independent constraints  Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables route sp switch Parameter sensor Constraints Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head
  • 10. RETE Deployment Process  Construct RETE structure (platform independently)  Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  • 11. RETE Deployment Process  Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth  Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4
  • 12. RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4 Join1 Join3 Join2 In1 In2 In3 In4 Allocation can be optimized for query performance and other beneficial system characteristics!
  • 13. RETE Deployment Process  Configuration scripts for o Deployment o Communication middleware  Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  • 15. Motivation for Allocation Optimization  Considering data-intensive systems o Over usage of resources o Cost of the system o Overhead of network communication Job Job t Local job execution time t’ Data transmission time is significant component in global execution time ~ Job Job Job Network links can have different capacities 4000 MB Process 2000 MB Process 500 MB Process 2400 MB $$$ Poor utilization leads to expensive system
  • 16. The Allocation Problem  Inputs  Allocation constraints  Output: Valid allocation  Optimization targets 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 5000 MB6000 MB 1 2 • Rete network for the query organized to processes • Resource consumption Available infrastructure with important resource parameters
  • 17. Opt. Target: Communication Minimization 1 × 1,000,000 3 × 200,000 3 × 200,000 Communication = 2,200,000 6000 MB 5000 MB 1 2500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1,000,000200,000 200,000 1 2 3 4 3 × 1,000,000 1 × 200,000 1 × 200,000 Communication = 3,400,000 5000 MB 6000 MB 1 2 Largest volume of data is sent through faster local link
  • 18. Opt. Target: Cost Minimization 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 10 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 12
  • 19. Heuristics in Optimization Worker node Production node Input node Worker node Input nodeInput node Worker node Production node Production node Worker node Model database Number of model elements ?? MB Input node Memory consumption of Rete nodes and processes 1 1 1 1 1 1 1 Memory usage of Input nodes can be estimated Communication intensity of network communication channels2 2 2 2 2 2 3 3 3 3 3 4 4
  • 20. Performance Impact of Optimization 61K 213K 867K 3M 13M Model size (number of elements) Time(sec) First evaluation time of a complex query 28 45 72 114 182 290 463 739 Max. memory Naive optimization Communication optimization 739 616 194 144 2 minutes gain! This approach doesn’t work for larger models!
  • 21. Network Traffic Statistics 300 349 371 1020 248 280 347 875 14 2 74 90 24 20 190 234 0 200 400 600 800 1000 1200 vm0 vm1 vm2 total vm0 vm1 vm2 total Network Traffic in Megabytes Remote Local Unoptimized Optimized  Unoptimized: o Remote Traffic: 1020 o Local Traffic: 90 o Total Traffic: 1110  Optimized: o Remote Traffic: 875 o Local Traffic: 234 o Total Traffic: 1109
  • 22. Conclusion and Future Work  Results o Novel approach for application-specific resource allocation optimization for distributed Rete o CPLEX-based implementation for IncQuery-D o Preliminary evaluation results • Significant improvements for local resource management • Performance gains especially over slow / inhomogeneous networks • Efficient optimization execution (supported by runtime cutoff in CPLEX)  Future work o Hadoop / YARN support (new IncQuery-D developments) • Support configuration optimization for other Hadoop-based cloud apps o Static allocation  Dynamic reallocation • Take existing configuration as a starting constraint set • Optimize for changed workload conditions
  • 23. New INCQUERY-D Architecture Docker container 1 Database shard 1 Docker container 2 Database shard 2 Docker container 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Docker container 0 Indexer layer New INCQUERY-D: “Hadoop over Docker” Indexer Indexer Indexer Indexer Join Join Antijoin • YARN resource management • ZooKeeper monitoring Akka actors embedded into long- running Hadoop jobs

Editor's Notes

  1. Ez szuper jól bemutatja azokat a fogalmakat, amivel mi is dolgozunk a végén, szóval ezt hasznos lenne bemutatni.
  2. Kulcsgondolatok: Erőforrások túlhasználását el kell kerülni, de a rossz kihasználtság meg drága rendszerhez vezet Adatküldés ideje jelentős összetevő a globális végrehajtási időben, hálózati linkek is különböző sebességűek lehetnek  erre optimalizálunk
  3. Ennél el kell majd mondani mit jelentenek a számok az egyes “éleken”.
  4. Normalized tuple-t használjuk becsléshez. Egy node-nál ez következőképpen néz ki: megnézzük mennyi adat várható bemeneti csatornákon (először input node-nál, ahol biztosan tudjuk is azt) Abból közelítjük memória fogyasztást processeknek lineáris regresszióval Kiszámoljuk node típusa és bemeneti adat mennyiségének függvényében a kimenő csatornákra jutó adat mennyiségét (mindegyiken ugyanannyi lesz), input node-ra ezt is tudjuk tutira, mert mindent továbbít Ezt végezzük szintről szintre, háló szélességi bejárásával Ezt kellene itt összefoglalni