SlideShare a Scribd company logo
1 of 23
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Optimization of Incremental Queries in
the Cloud
József Makai, Gábor Szárnyas, Ákos Horváth,
István Ráth, Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES
Incremental Query Evaluation by RETE
 AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
Invalid model fragment
 Instance model
Valid model fragment
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes
Read the changes in the
result set (deltas)
Incremental Query Evaluation by RETE
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal
Goals of IncQuery-D
 Objectives
o Distributed incremental pattern matching
o Adaptation of IncQuery tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
 Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)
Database
shard 0
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Server 0
Rete net
Indexer
layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Server 0
Indexer
layer
INCQUERY-D
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Akka
Triple store (4store),
Document DB (Mongo),
RDF over Column family
(Cumulus)
RETE Deployment Process
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
route: Route sp: SwitchPosition
Switch:Switchsensor:Sensor
switchPosition
switch
sensor
routeDefinition
RETE Deployment Process
 Construct language-
independent constraints
 Resolution of
o syntactic sugar
o type information
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
Variables route sp switch
Parameter sensor
Constraints
Edge: SwitchPosition.switch
Edge: TrackElement.sensor
Edge: Route.switchPosition
Negation: head
RETE Deployment Process
 Construct RETE structure
(platform independently)
 Optimizations:
o Model statistics
o Expected usage profile
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
join
join
join
RETE Deployment Process
 Architecture model
(Cloud infrastructure)
o Virtual Machines
• Memory limits
• CPU speed
• Storage capacity
o Communication Channels
• Bandwidth
 Specified by a textual DSL
(Xtext)
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
RETE Deployment Process
Machine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
Join1
Join3
Join2
In1 In2 In3 In4
Allocation can be optimized for
query performance and other
beneficial system characteristics!
RETE Deployment Process
 Configuration scripts for
o Deployment
o Communication
middleware
 Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
ALLOCATION OPTIMIZATION IN
INCQUERY-D
Motivation for Allocation Optimization
 Considering data-intensive
systems
o Over usage of resources
o Cost of the system
o Overhead of network
communication
Job Job
t
Local job
execution time
t’
Data transmission time
is significant component
in global execution time
~
Job
Job
Job
Network links can have
different capacities
4000 MB
Process
2000 MB
Process
500 MB
Process
2400 MB
$$$
Poor utilization leads
to expensive system
The Allocation Problem
 Inputs
 Allocation constraints
 Output: Valid allocation
 Optimization targets
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
5000 MB6000 MB
1 2
• Rete network for the query
organized to processes
• Resource consumption
Available infrastructure with
important resource parameters
Opt. Target: Communication Minimization
1 × 1,000,000
3 × 200,000 3 × 200,000
Communication = 2,200,000
6000 MB
5000 MB
1
2500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1,000,000200,000
200,000
1 2
3
4
3 × 1,000,000
1 × 200,000
1 × 200,000
Communication = 3,400,000
5000 MB
6000 MB
1
2
Largest volume of data is
sent through faster local link
Opt. Target: Cost Minimization
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 10
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 12
Heuristics in Optimization
Worker node
Production
node
Input node
Worker node
Input nodeInput node
Worker node
Production
node
Production
node
Worker node
Model
database
Number of model
elements
?? MB
Input node
Memory consumption of
Rete nodes and processes
1 1 1
1 1 1
1
Memory usage of Input
nodes can be estimated
Communication
intensity of network
communication
channels2 2
2
2
2 2
3 3
3
3 3
4 4
Performance Impact of Optimization
61K 213K 867K 3M 13M
Model size (number of elements)
Time(sec)
First evaluation time of a complex query
28
45
72
114
182
290
463
739
Max. memory
Naive
optimization
Communication
optimization
739
616
194
144
2 minutes gain!
This approach
doesn’t work for
larger models!
Network Traffic Statistics
300
349 371
1020
248 280
347
875
14
2
74
90
24
20
190
234
0
200
400
600
800
1000
1200
vm0 vm1 vm2 total vm0 vm1 vm2 total
Network Traffic in Megabytes
Remote Local
Unoptimized Optimized
 Unoptimized:
o Remote Traffic:
1020
o Local Traffic: 90
o Total Traffic: 1110
 Optimized:
o Remote Traffic:
875
o Local Traffic: 234
o Total Traffic: 1109
Conclusion and Future Work
 Results
o Novel approach for application-specific resource allocation optimization for
distributed Rete
o CPLEX-based implementation for IncQuery-D
o Preliminary evaluation results
• Significant improvements for local resource management
• Performance gains especially over slow / inhomogeneous networks
• Efficient optimization execution (supported by runtime cutoff in CPLEX)
 Future work
o Hadoop / YARN support (new IncQuery-D developments)
• Support configuration optimization for other Hadoop-based cloud apps
o Static allocation  Dynamic reallocation
• Take existing configuration as a starting constraint set
• Optimize for changed workload conditions
New INCQUERY-D Architecture
Docker container 1
Database
shard 1
Docker container 2
Database
shard 2
Docker container 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Docker container 0
Indexer
layer
New INCQUERY-D: “Hadoop over Docker”
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
• YARN resource
management
• ZooKeeper
monitoring
Akka actors
embedded into long-
running Hadoop jobs

More Related Content

Similar to Optimization of Incremental Queries CloudMDE2015

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsVishal Sharma, Ph.D.
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesGábor Szárnyas
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Vishal Sharma, Ph.D.
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesGábor Szárnyas
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCIRJET Journal
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Louis Abalu
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Deepak Shankar
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiigeeks1234
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalAamir Habib
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 

Similar to Optimization of Incremental Queries CloudMDE2015 (20)

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_final
 
Features
FeaturesFeatures
Features
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Recently uploaded

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 

Recently uploaded (20)

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 

Optimization of Incremental Queries CloudMDE2015

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Optimization of Incremental Queries in the Cloud József Makai, Gábor Szárnyas, Ákos Horváth, István Ráth, Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  • 3. Incremental Query Evaluation by RETE  AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal Invalid model fragment  Instance model Valid model fragment
  • 4. Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes Read the changes in the result set (deltas) Incremental Query Evaluation by RETE join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  • 5. Goals of IncQuery-D  Objectives o Distributed incremental pattern matching o Adaptation of IncQuery tooling to graph DBs o Executed over cloud infrastructure (COTS hardware)  Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  • 6. Database shard 0 INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Server 0 Rete net Indexer layer INCQUERY-D Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication
  • 7. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Server 0 Indexer layer INCQUERY-D Indexer Indexer Indexer Indexer Join Join Antijoin Akka Triple store (4store), Document DB (Mongo), RDF over Column family (Cumulus)
  • 8. RETE Deployment Process Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } route: Route sp: SwitchPosition Switch:Switchsensor:Sensor switchPosition switch sensor routeDefinition
  • 9. RETE Deployment Process  Construct language- independent constraints  Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables route sp switch Parameter sensor Constraints Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head
  • 10. RETE Deployment Process  Construct RETE structure (platform independently)  Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  • 11. RETE Deployment Process  Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth  Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4
  • 12. RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4 Join1 Join3 Join2 In1 In2 In3 In4 Allocation can be optimized for query performance and other beneficial system characteristics!
  • 13. RETE Deployment Process  Configuration scripts for o Deployment o Communication middleware  Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  • 15. Motivation for Allocation Optimization  Considering data-intensive systems o Over usage of resources o Cost of the system o Overhead of network communication Job Job t Local job execution time t’ Data transmission time is significant component in global execution time ~ Job Job Job Network links can have different capacities 4000 MB Process 2000 MB Process 500 MB Process 2400 MB $$$ Poor utilization leads to expensive system
  • 16. The Allocation Problem  Inputs  Allocation constraints  Output: Valid allocation  Optimization targets 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 5000 MB6000 MB 1 2 • Rete network for the query organized to processes • Resource consumption Available infrastructure with important resource parameters
  • 17. Opt. Target: Communication Minimization 1 × 1,000,000 3 × 200,000 3 × 200,000 Communication = 2,200,000 6000 MB 5000 MB 1 2500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1,000,000200,000 200,000 1 2 3 4 3 × 1,000,000 1 × 200,000 1 × 200,000 Communication = 3,400,000 5000 MB 6000 MB 1 2 Largest volume of data is sent through faster local link
  • 18. Opt. Target: Cost Minimization 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 10 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 12
  • 19. Heuristics in Optimization Worker node Production node Input node Worker node Input nodeInput node Worker node Production node Production node Worker node Model database Number of model elements ?? MB Input node Memory consumption of Rete nodes and processes 1 1 1 1 1 1 1 Memory usage of Input nodes can be estimated Communication intensity of network communication channels2 2 2 2 2 2 3 3 3 3 3 4 4
  • 20. Performance Impact of Optimization 61K 213K 867K 3M 13M Model size (number of elements) Time(sec) First evaluation time of a complex query 28 45 72 114 182 290 463 739 Max. memory Naive optimization Communication optimization 739 616 194 144 2 minutes gain! This approach doesn’t work for larger models!
  • 21. Network Traffic Statistics 300 349 371 1020 248 280 347 875 14 2 74 90 24 20 190 234 0 200 400 600 800 1000 1200 vm0 vm1 vm2 total vm0 vm1 vm2 total Network Traffic in Megabytes Remote Local Unoptimized Optimized  Unoptimized: o Remote Traffic: 1020 o Local Traffic: 90 o Total Traffic: 1110  Optimized: o Remote Traffic: 875 o Local Traffic: 234 o Total Traffic: 1109
  • 22. Conclusion and Future Work  Results o Novel approach for application-specific resource allocation optimization for distributed Rete o CPLEX-based implementation for IncQuery-D o Preliminary evaluation results • Significant improvements for local resource management • Performance gains especially over slow / inhomogeneous networks • Efficient optimization execution (supported by runtime cutoff in CPLEX)  Future work o Hadoop / YARN support (new IncQuery-D developments) • Support configuration optimization for other Hadoop-based cloud apps o Static allocation  Dynamic reallocation • Take existing configuration as a starting constraint set • Optimize for changed workload conditions
  • 23. New INCQUERY-D Architecture Docker container 1 Database shard 1 Docker container 2 Database shard 2 Docker container 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Docker container 0 Indexer layer New INCQUERY-D: “Hadoop over Docker” Indexer Indexer Indexer Indexer Join Join Antijoin • YARN resource management • ZooKeeper monitoring Akka actors embedded into long- running Hadoop jobs

Editor's Notes

  1. Ez szuper jól bemutatja azokat a fogalmakat, amivel mi is dolgozunk a végén, szóval ezt hasznos lenne bemutatni.
  2. Kulcsgondolatok: Erőforrások túlhasználását el kell kerülni, de a rossz kihasználtság meg drága rendszerhez vezet Adatküldés ideje jelentős összetevő a globális végrehajtási időben, hálózati linkek is különböző sebességűek lehetnek  erre optimalizálunk
  3. Ennél el kell majd mondani mit jelentenek a számok az egyes “éleken”.
  4. Normalized tuple-t használjuk becsléshez. Egy node-nál ez következőképpen néz ki: megnézzük mennyi adat várható bemeneti csatornákon (először input node-nál, ahol biztosan tudjuk is azt) Abból közelítjük memória fogyasztást processeknek lineáris regresszióval Kiszámoljuk node típusa és bemeneti adat mennyiségének függvényében a kimenő csatornákra jutó adat mennyiségét (mindegyiken ugyanannyi lesz), input node-ra ezt is tudjuk tutira, mert mindent továbbít Ezt végezzük szintről szintre, háló szélességi bejárásával Ezt kellene itt összefoglalni