The document discusses distributed query processing. It begins by defining what a query and query processor are. It then describes the main functions, problems, and characteristics of a query processor. The document outlines the main layers of query processing as query decomposition, data localization, global query optimization, and distributed execution. It provides details on each of these layers and processes involved like normalization, analysis, and optimization.
Query Processing : Query Processing Problem, Layers of Query Processing Query Processing in Centralized Systems – Parsing & Translation, Optimization, Code generation, Example Query Processing in Distributed Systems – Mapping global query to local, Optimization,
Query Processing : Query Processing Problem, Layers of Query Processing Query Processing in Centralized Systems – Parsing & Translation, Optimization, Code generation, Example Query Processing in Distributed Systems – Mapping global query to local, Optimization,
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
Distributed Database Designs are nothing but multiple, logically related Database systems, physically distributed over several sites, using a Computer Network, which is usually under a centralized site control.
Distributed database design refers to the following problem:
Given a database and its workload, how should the database be split and allocated to sites so as to optimize certain objective function
There are two issues:
(i) Data fragmentation which determines how the data should be fragmented.
(ii) Data allocation which determines how the fragments should be allocated.
Process of minimizing the resource usage.
Aims of Query Decomposition
To transform a high-level query into a Relational Algebra query
&
To check the query is syntactically and semantically correct
It is efficient way to retrieve data
from database.
http://phpexecutor.com
DDBMS, characteristics, Centralized vs. Distributed Database, Homogeneous DDBMS, Heterogeneous DDBMS, Advantages, Disadvantages, What is parallel database, Data fragmentation, Replication, Distribution Transaction
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
Distributed Database Designs are nothing but multiple, logically related Database systems, physically distributed over several sites, using a Computer Network, which is usually under a centralized site control.
Distributed database design refers to the following problem:
Given a database and its workload, how should the database be split and allocated to sites so as to optimize certain objective function
There are two issues:
(i) Data fragmentation which determines how the data should be fragmented.
(ii) Data allocation which determines how the fragments should be allocated.
Process of minimizing the resource usage.
Aims of Query Decomposition
To transform a high-level query into a Relational Algebra query
&
To check the query is syntactically and semantically correct
It is efficient way to retrieve data
from database.
http://phpexecutor.com
DDBMS, characteristics, Centralized vs. Distributed Database, Homogeneous DDBMS, Heterogeneous DDBMS, Advantages, Disadvantages, What is parallel database, Data fragmentation, Replication, Distribution Transaction
Query optimization in oodbms identifying subquery for query managementijdms
This paper is based on relatively newer approach for query optimization in object databases, which uses
query decomposition and cached query results to improve execution a query. Issues that are focused here is
fast retrieval and high reuse of cached queries, Decompose Query into Sub query, Decomposition of
complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By using some
parts of cached results helpful for answering other queries (wider Queries) and combining many cached
queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing a query.
The limitation of this technique is that it’s useful especially in scenarios where data manipulation rate is
very low as compared to data retrieval rate.
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTcsandit
This paper is based on relatively newer approach for query optimization in object databases,
which uses query decomposition and cached query results to improve execution a query. Issues
that are focused here is fast retrieval and high reuse of cached queries, Decompose Query into
Sub query, Decomposition of complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By
using some parts of cached results helpful for answering other queries (wider Queries) and
combining many cached queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing
a query. The limitation of this technique is that it’s useful especially in scenarios where data
manipulation rate is very low as compared to data retrieval rate.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and
integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to
seamlessly handle and scale very large amount of unstructured and structured data from diversified and
heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing
component; 3) the ability to automatically select the most appropriate libraries and tools to compute and
accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high
learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different
application problem domains, with high accuracy, robustness, and scalability. This paper highlights the
research methodologies and research activities that we propose to be conducted by the Big Data
researchers and practitioners in order to develop and support seamless automation and integration of
machine learning capabilities for Big Data analytics.
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to seamlessly handle and scale very large amount of unstructured and structured data from diversified and heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing component; 3) the ability to automatically select the most appropriate libraries and tools to compute and accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different application problem domains, with high accuracy, robustness, and scalability. This paper highlights the research methodologies and research activities that we propose to be conducted by the Big Data researchers and practitioners in order to develop and support seamless automation and integration of machine learning capabilities for Big Data analytics.
In this paper we explore the issue of store determination in a portable shared specially appointed system. In our vision reserve determination ought to fulfill the accompanying prerequisites: (i) it ought to bring about low message overhead and (ii) the data ought to be recovered with least postponement. In this paper, we demonstrate that these objectives can be accomplished by part the one bounce neighbors into two sets in view of the transmission run. The proposed approach lessens the quantity of messages overflowed into the system to discover the asked for information. This plan is completely circulated and comes requiring little to no effort as far as store overhead. The test comes about gives a promising outcome in view of the measurements of studies.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
GlobalSoft is a MDM-focused software consultancy, specializing in Informatica MDM. GlobalSoft has been a long-term strategic partner of Informatica since the days of Siperian, providing project delivery and training services, as well as support and engineering services from our US & India offfices. Today, GlobalSoft has leveraged its deep product knowledge gained over the past 8 years and over 40 MDM projects into the preeminent service provider for Informatica MDM, and has used this knowledge to develop and offer specialized services and products for MDM.GlobalSoft headquartered in San Jose, CA maintains expert staff in the US and in India is capable of managing and delivering projects or augmenting existing project teams.
MLDM provides an original scientific position in Europe on problems related to pattern recognition, machine learning, classification, modelling, knowledge extraction and data mining. These issues have a strong employability potential for students trained in the field of modelling, prediction or decision support, as well as in the area of the Web, image and video processing, health informatics, etc.
For graphs of mathematical functions, see Graph of a function. For other uses, see Graph (disambiguation). A drawing of a graph. In mathematics graph theory is the study of graphs, which are mathematical structures used.In mathematics, and more specifically in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. In other words, any acyclic connected graph is a tree. A forest is a disjoint union of trees.
Synchronous Optical Networking (SONET) and Synchronous Digital Hierarchy (SDH) are standardized multiplexing protocols that transfer multiple digital bit streams over optical fiber using lasers or light-emitting diodes (LEDs). Lower data rates can also be transferred via an electrical interface.
Discrete Mathematics - Sets. ... He had defined a set as a collection of definite and distinguishable objects selected by the means of certain rules or description. Set theory forms the basis of several other fields of study like counting theory, relations, graph theory and finite state machines.
Discrete Mathematics - Sets. ... He had defined a set as a collection of definite and distinguishable objects selected by the means of certain rules or description. Set theory forms the basis of several other fields of study like counting theory, relations, graph theory and finite state machines.
The Parallel RLC Circuit is the exact opposite to the series circuit we looked at in the previous tutorial although some of the previous concepts and equations still apply.
The Parallel RLC Circuit is the exact opposite to the series circuit we looked at in the previous tutorial although some of the previous concepts and equations still apply.
Discrete Mathematics - Relations. ... Relations may exist between objects of the same set or between objects of two or more sets. Definition and Properties. A binary relation R from set x to y (written as x R y o r R ( x , y ) ) is a subset of the Cartesian product x × y .
Propositional calculus (also called propositional logic, sentential calculus, sentential logic, or sometimes zeroth-order logic) is the branch of logic concerned with the study of propositions (whether they are true or false) that are formed by other propositions with the use of logical connectives, and how their value depends on the truth value of their components. Logical connectives are found in natural languages.
Propositional calculus (also called propositional logic, sentential calculus, sentential logic, or sometimes zeroth-order logic) is the branch of logic concerned with the study of propositions (whether they are true or false) that are formed by other propositions with the use of logical connectives, and how their value depends on the truth value of their components. Logical connectives are found in natural languages.
In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized.
Discrete Mathematics is a branch of mathematics involving discrete elements that uses algebra and arithmetic. It is increasingly being applied in the practical fields of mathematics and computer science. It is a very good tool for improving reasoning and problem-solving capabilities.
A numeral system (or system of numeration) is a writing system for expressing numbers; that is, a mathematical notation for representing numbers of a given set, using digits or other symbols in a consistent manner.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
2. 8/10/2017 2Md. Golam Moazzam, Dept. of CSE, JU
OUTLINE
What is Query ?
What is Query Processor?
Main Function of a Query Processor
Main Problems of Query Processing
Characteristics of Query Processor
Main layers of Query Processing
Query Decomposition
Data Localization
Global Query Optimization
Distributed Execution
3. What is Query ?
A query is a statement requesting the retrieval of
information.
The portion of a DML that involves information retrieval is
called a query language.
8/10/2017 3Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
4. What is Query Processor?
In relational database technology, users perform the task of
data processing and data manipulation with the help of high-
level non-procedural language (e.g. SQL).
This high-level query hides the low-level details from the
user about the physical organization of the data and presents
such an environment so that the user can handle the tasks of
even complex queries in an easy, concise and simple
fashion.
The inner procedure of query-task is performed by a
DBMS module called a Query Processor. Hence the users
are relieved from query optimization which is a time-
consuming task that is actually performed by the query
processor.
8/10/2017 4Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
5. What is Query Processor?
Understanding Query processing in distributed database
environments is very difficult instead of centralized
database, because there are many elements or parameters
involved that affect the overall performance of distributed
queries. Moreover, in distributed environment, the query
processor may have to access in many sites. Thus, query
response time may become very high.
That is why, Query processing problem is divided into
several sub-problems/ steps which are easier to solve
individually.
8/10/2017 5Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
6. Main Function/Objective of a Query Processor
Main function of a query processor is to transform a high-
level-query (also called calculus query) into an
equivalent lower-level query (also called algebraic
query).
The conversion must be correct and efficient. The
conversion will be correct if low-level query has the
same semantics as the original query, i.e. if both queries
produce the same result.
8/10/2017 6Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
7. Main Problems of Query Processing
Main problem of query processing is query optimization.
It is a time consuming task, because many execution
strategies are involved to minimize (optimize) computer
recourse consumption.
Total cost also affects query processing. Cost depends on
the time required for processing operations of the query at
various sites, utilization of various computer resources etc,
time needed for exchanging data between sites participating
in the execution of the query etc.
Time and space required to process the query is also an
important factor for the performance of the query
processing.
8/10/2017 7Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
8. Important Characteristics of Query Processor
Languages
Types of Optimization
Optimization Timing
Statistics
Decision Sites
Exploitation of the Network Topology
Exploitation of Replicated Fragments
Use of Semi-join
8/10/2017 8Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
9. Important Characteristics of Query Processor
Languages:
Input language to the query processor is based on high-level
relational DBMS, i.e. input is in calculus query.
On the other hand, output language is based on lower-level
relational DMBS, i.e. output is in algebraic query.
The operations of the output language are implemented
directly in the computer system.
Query processing must perform efficient mapping from the
input language to the output language.
8/10/2017 9Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
10. Important Characteristics of Query Processor
Types of Optimization:
Among all possible strategies for executing query, the one in
which less time and space are required is the best solution
for the optimization of query.
A comparative study for all the strategies should perform
such that how much cost will be required for each solution.
That strategy should be chosen in which there will be less
cost. But keep in mind that choosing the low-cost strategy
may require much space. So care should be taken in this
case also.
8/10/2017 10Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
11. Important Characteristics of Query Processor
Optimization Timing:
The actual time required to optimize the execution of a
query is an important factor. If less time is required, then it
is the best solution for query processing.
8/10/2017 11Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
12. Important Characteristics of Query Processor
Statistics:
The effectiveness of query optimization relies on statistical
information of the database, i.e. how many fragments
query will be needed, which operation should be done first,
size and number of distinct values of each attribute of the
relation, histograms of the attributes for minimizing
probability error etc.
8/10/2017 12Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
13. Important Characteristics of Query Processor
Decision Sites:
In a distributed database, several sites may participate for
answering the query. Most systems use centralized decision
approach, in which a single site generates the strategy.
However, the decision process could be distributed among
various sites.
Exploitation of Network Topology:
Distributed query processor uses computer network, so its
performance depends also on which topology it is using.
The cost function, speed, utilization of various network
resources are important factors for executing query
processor in a distributed environment.
8/10/2017 13Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
14. Important Characteristics of Query Processor
Exploitation of Replicated Fragments:
A distributed relation is usually divided into relation
fragments.
Distributed queries expressed on global relations are
mapped into queries on physical fragments of relations by
translating relations into fragments. We call this process
localization because its main function is to localize the data
involved in the query.
For higher reliability and better read performance, it is
useful to have fragments replicated at different sites.
8/10/2017 14Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
15. Important Characteristics of Query Processor
Use of Semi-join operation:
In distributed system, it is better to use semi-join operation
instead of join operation for minimizing data
communication.
8/10/2017 15Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
16. Layers of Query Processing
Processing of Query in distributed DBMS instead of
centralized (Local) DBMS.
Understanding Query processing in distributed database
environments is very difficult instead of centralized
database, because there are many elements involved.
So, Query processing problem is divided into several sub-
problems/ steps which are easier to solve individually.
A general layering scheme for describing distributed query
processing is given below:
8/10/2017 16Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
17. Main layers of Query Processing
Query processing involves 4 main layers:
• Query Decomposition
• Data Localization
• Global Query Optimization
• Distributed Execution
8/10/2017 17Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
18. Main layers of Query Processing
8/10/2017 18Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
Query Decomposition
Calculus Query on Global Relations
Algebraic Query on Global Relations
Data Localization
Algebraic Query on Fragments
Global Optimization
Distributed Query Execution Plan
Distributed Execution
Global
Schema
Fragment
Schema
Allocation
Schema
Control Site
Local Sites
Fig. Generic Layering Scheme for Distributed Query Processing
19. Main layers of Query Processing
The input is a query on global data expressed in relational
calculus. This query is posed on global (distributed)
relations, meaning that data distribution is hidden.
The first three layers map the input query into an optimized
distributed query execution plan. They perform the
functions of query decomposition, data localization, and
global query optimization.
Query decomposition and data localization correspond to
query rewriting.
The first three layers are performed by a central control site
and use schema information stored in the global directory.
The fourth layer performs distributed query execution by
executing the plan and returns the answer to the query. It is
done by the local sites and the control site.
8/10/2017 19Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
20. Query Decomposition
The first layer decomposes the calculus query into an
algebraic query on global relations. The information needed
for this transformation is found in the global conceptual
schema describing the global relations.
Both input and output queries refer to global relations,
without knowledge of the distribution of data. Therefore,
query decomposition is the same for centralized and
distributed systems.
Query decomposition can be viewed as four successive
steps:
1) Normalization, 2) Analysis,
3) Elimination of redundancy, and 4) Rewriting.
8/10/2017 20Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
21. Query Decomposition
First, the calculus query is rewritten in a normalized form
that is suitable for subsequent manipulation. Normalization
of a query generally involves the manipulation of the query
quantifiers and of the query qualification by applying
logical operator priority.
Second, the normalized query is analyzed semantically so
that incorrect queries are detected and rejected as early as
possible. Typically, they use some sort of graph that
captures the semantics of the query.
8/10/2017 21Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
22. Query Decomposition
Third, the correct query is simplified. One way to simplify a
query is to eliminate redundant predicates.
Fourth, the calculus query is restructured as an algebraic
query. Several algebraic queries can be derived from the
same calculus query, and that some algebraic queries are
“better” than others. The quality of an algebraic query is
defined in terms of expected performance.
8/10/2017 22Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
23. Normalization
The input query may be arbitrarily complex.
It is the goal of normalization to transform the query to a
normalized form to facilitate further processing.
With relational languages such as SQL, the most important
transformation is that of the query qualification (the
WHERE clause), which may be an arbitrarily complex,
quantifier-free predicate, preceded by all necessary
quantifiers ( or ).
There are two possible normal forms for the predicate, one
giving precedence to the AND (^) and the other to the OR
(˅). The conjunctive normal form is a conjunction (^
predicate) of disjunctions (˅ predicates) as follows:
(p11 ˅ p12 ˅ . . . ˅ p1n) ^ . . . . ^ (pm1 ˅ pm2 ˅ . . . ˅ pmn)
where pi j is a simple predicate.
8/10/2017 23Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
24. Normalization
A qualification in disjunctive normal form, on the other
hand, is as follows:
(p11 ^ p12 ^ . . . ^ p1n) ˅ . . . . ˅ (pm1 ^ pm2 ^ . . . ^ pmn)
The transformation is straightforward using the well-known
equivalence rules for logical operations (^, ˅ and ¬):
1. p1 ^ p2 p2 ^ p1
2. p1 ˅ p2 p2 ˅ p1
3. p1 ^ (p2 ^ p3) (p1 ^ p2) ^ p3
4. p1 ˅ (p2 ˅ p3) (p1 ˅ p2) ˅ p3
5. p1 ^ (p2 ˅ p3) (p1 ^ p2) ˅ (p1 ^ p3)
6. p1 ˅ (p2 ^ p3) (p1 ˅ p2) ^ (p1 ˅ p3)
7. ¬ (p1 ^ p2) ¬ p1 ˅ ¬ p2
8. ¬ (p1 ˅ p2) ¬ p1 ^ ¬ p2
9. ¬ (¬ p) p
8/10/2017 24Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
25. Example: Let us consider the following query on the
engineering database that we have been referring to:
“Find the names of employees who have been working on
project P1 for 12 or 24 months”
Engineering Database:
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET).
SAL(TITLE, AMT)
ASG(ENO, PNO, RESP, DUR)
;SAL=Salary, AMT=Amount
; Employees assigned to projects
; RESP=Responsibility, DUR=Duration
8/10/2017 25Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
26. Example: Let us consider the following query on the
engineering database that we have been referring to:
“Find the names of employees who have been working on
project P1 for 12 or 24 months”
The query expressed in SQL is
SELECT ENAME
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = "P1"
AND DUR = 12 OR DUR = 24
8/10/2017 26Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
27. Example: Let us consider the following query on the
engineering database that we have been referring to:
“Find the names of employees who have been working on
project P1 for 12 or 24 months”
The qualification in conjunctive normal form is
EMP.ENO = ASG.ENO ^ ASG.PNO = “P1” ^ (DUR = 12 ˅
DUR = 24)
The qualification in disjunctive normal form is
(EMP.ENO = ASG.ENO ^ ASG.PNO = “P1” ^ DUR = 12) ˅
(EMP.ENO = ASG.ENO ^ ASG.PNO = “P1” ^ DUR = 24)
8/10/2017 27Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
28. Analysis
Query analysis enables rejection of normalized queries for
which further processing is either impossible or
unnecessary.
The main reasons for rejection are that the query is type
incorrect or semantically incorrect.
A query is type incorrect if any of its attribute or relation
names are not defined in the global schema, or if operations
are being applied to attributes of the wrong type.
The technique used to detect type incorrect queries is similar
to type checking for programming languages.
8/10/2017 28Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
29. Analysis
Example: The following SQL query on the engineering
database is type incorrect for two reasons. First, attribute E# is
not declared in the schema. Second, the operation “>200” is
incompatible with the type string of ENAME.
SELECT E#
FROM EMP
WHERE ENAME > 200
8/10/2017 29Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
30. Analysis
A query is semantically incorrect if its components do not
contribute in any way to the generation of the result.
This is based on the representation of the query as a graph,
called a query graph or connection graph.
In a query graph, one node indicates the result relation, and
any other node indicates an operand relation. An edge
between two nodes one of which does not correspond to the
result represents a join, whereas an edge whose destination
node is the result represents a project.
An important subgraph of the query graph is the join graph,
in which only the joins are considered.
8/10/2017 30Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
31. Analysis
Example: Let us consider the following query:
“Find the names and responsibilities of programmers who have
been working on the CAD/CAM project for more than 3 years.”
The query expressed in SQL is
SELECT ENAME, RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer“
8/10/2017 31Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
34. Analysis
A query is semantically incorrect if its query graph is not
connected.
Example: Let us consider the following SQL query:
SELECT ENAME, RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR _ 36
AND TITLE = "Programmer"
8/10/2017 34Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
35. Analysis
It’s query graph, shown below is disconnected, which tells us
that the query is semantically incorrect.
8/10/2017 35Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
PNAME = “CAD/CAM”
ASG
PROJ
EMP
RESULT
EMP.ENO = ASG.ENO
TITLE = “Programmer”
ENAME
RESP
Fig.: Query Graph
DUR ≥ 36
36. Elimination of Redundancy
Simplify the query by eliminating redundancies, e.g.,
redundant predicates.
Redundancies are often due to semantic integrity constraints
expressed in the query language.
Transformation rules are used:
8/10/2017 36Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
37. Elimination of Redundancy
Example:
SELECT TITLE
FROM EMP
WHERE (NOT (TITLE = "Programmer")
AND (TITLE = "Programmer"
OR TITLE = "Elect. Eng.")
AND NOT (TITLE = "Elect. Eng."))
OR ENAME = "J. Doe"
8/10/2017 37Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
38. Elimination of Redundancy
Example:
Can be simplified using the previous rules to become
SELECT TITLE
FROM EMP
WHERE ENAME = "J. Doe“
The simplification proceeds as follows:
Let p1 be TITLE = “Programmer”,
p2 be TITLE = “Elect. Eng.”, and
p3 be ENAME = “J. Doe”.
The query qualification is:
(¬ p1 ^ (p1 ˅ p2) ^ ¬ p2) ˅ p3
8/10/2017 38Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
39. Elimination of Redundancy
The disjunctive normal form for this qualification is obtained by
applying rule 5 which yields:
(¬ p1 ^ ((p1 ^ ¬ p2) ˅ (p2 ^ ¬ p2))) ˅ p3
and then rule 3 yields:
(¬ p1 ^ p1 ^ ¬ p2) ˅ (¬ p1 ^ p2 ^ ¬ p2) ˅ p3
By applying rule 7, we obtain
(false ^ ¬ p2) ˅ (¬ p1 ^ false) ˅ p3
By applying the same rule, we get
(false ˅ false) ˅ p3
which is equivalent to p3 by rule 4.
8/10/2017 39Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
40. Rewriting
The last step of query decomposition rewrites the query in
relational algebra.
For the sake of clarity it is customary to represent the
relational algebra query graphically by an operator tree.
An operator tree is a tree in which a leaf node is a relation
stored in the database, and a non-leaf node is an
intermediate relation produced by a relational algebra
operator. The sequence of operations is directed from the
leaves to the root, which represents the answer to the query.
8/10/2017 40Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
41. Rewriting
The transformation of a tuple relational calculus query into an
operator tree can easily be achieved as follows:
First, a different leaf is created for each different tuple
variable (corresponding to a relation). In SQL, the leaves
are immediately available in the FROM clause.
Second, the root node is created as a project operation
involving the result attributes. These are found in the
SELECT clause in SQL.
Third, the qualification (SQL WHERE clause) is translated
into the appropriate sequence of relational operations
(select, join, union, etc.) going from the leaves to the root.
The sequence can be given directly by the order of
appearance of the predicates and operators.
8/10/2017 41Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
42. Rewriting
Example:
“Find the names of employees other than J. Doe who worked on
the CAD/CAM project for either one or two years” whose SQL
expression is:
SELECT ENAME
FROM PROJ, ASG, EMP
WHERE ASG.ENO = EMP.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME != "J. Doe"
AND PROJ.PNAME = "CAD/CAM"
AND (DUR = 12 OR DUR = 24)
8/10/2017 42Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
43. Rewriting
Operator Tree:
8/10/2017 43Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
ΠENAME
σPNAME=”CAD/CAM”
σDUR=12 ˅ DUR=24
σENAME≠”J. Doe”
PNO
ENO
PROJ ASG EMP
Fig: Operator tree
Project
Select
Join
44. Localization of Distributed Data
Output of the first layer is an algebraic query on distributed
relations which is input to the second layer.
The main role of this layer is to localize the query’s data
using data distribution information.
We know that relations are fragmented and stored in disjoint
subsets, called fragments where each fragment is stored at
different site.
8/10/2017 44Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
45. Localization of Distributed Data
This layer determines which fragments are involved in
the query and transforms the distributed query into a
fragment query.
A naive way to localize a distributed query is to generate a
query where each global relation is substituted by its
localization program. This can be viewed as replacing the
leaves of the operator tree of the distributed query with
subtrees corresponding to the localization programs. We call
the query obtained this way the localized query.
8/10/2017 45Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
46. Global Query Optimization
The input to the third layer is a fragment algebraic query.
The goal of this layer is to find an execution strategy for
the algebraic fragment query which is close to optimal.
Query optimization consists of
i) Finding the best ordering of operations in the fragment
query,
ii) Finding the communication operations which minimize
a cost function.
8/10/2017 46Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
47. Global Query Optimization
The cost function refers to computing resources such as
disk space, disk I/Os, buffer space, CPU cost,
communication cost, and so on.
Query optimization is achieved through the semi-join
operator instead of join operators.
8/10/2017 47Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing
48. Distributed Execution
The last layer is performed by all the sites having
fragments involved in the query.
Each subquery, called a local query, is executing at one
site. It is then optimized using the local schema of the
site.
8/10/2017 48Md. Golam Moazzam, Dept. of CSE, JU
Distributed Query Processing