Query processing and Query OptimizationNiraj Gandha
This presentation on query processing and query optimization is made with many efforts. According to me, I have used the most basic/ fundamental examples and topics for the explanation.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Query processing and Query OptimizationNiraj Gandha
This presentation on query processing and query optimization is made with many efforts. According to me, I have used the most basic/ fundamental examples and topics for the explanation.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Inter Task Communication On Volatile Nodesnagarajan_ka
Idle desktop computers are already used for high performance computing. But there is a lack of wider use for parallel computing due to the limitations of the programming models available. We have built a new communication library that facilitates execution of parallel scientific applications on virtual clusters composed of volatile ordinary PC nodes.
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...ijcsit
Scheduling various processes is one of the most fundamental functions of the operating system. In that
context one of the most common scheduling algorithms used in most operating systems is the Round Robin
method in which, the ready processes waiting in the ready queue, take control of the processor for a short
period of time known as the time quantum (or time slice) circularly. Here we discuss the use of statistics
and develop a mathematical model to determine the most efficient value for time quantum. This is strictly
theoretical as we do not know the values of times for the various processes beforehand. However the
proposed approach is compared with the recent developed algorithms to this regard to determine the
efficiency of the proposed algorithm.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
Now in touch:The linear control system functions in MATLAB.
CONTENTS:
INTRODUCTION OF MATLAB
CONTROL SYSTEM TOOLBOX
TRANSFER FUNCTION
Poles & Zeroes
Multiplication Of Transfer Functions
Closed-loop Transfer Function
TIME RESPONSE OF A CONTROL SYSTEM
Impulse
Step
Ramp
STATE SPACE REPRESENTATION
State space to transfer function
Transfer function to state space
presented at Proc. of 11th International Conference on Information Integration and Web-based Applications & Services (iiWAS2009), Kuala Lumpur,Malaysia, December 14-16, 2009
Ph d defense_Department of Information Technology, Uppsala University, SwedenSabesan Manivasakan
Querying Data Providing Web Services
Manivasakan Sabesan
Department of Information Technology
Uppsala University
Sweden.
Abstract
Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis.
The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services.
For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage.
To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser.
WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.
Inter Task Communication On Volatile Nodesnagarajan_ka
Idle desktop computers are already used for high performance computing. But there is a lack of wider use for parallel computing due to the limitations of the programming models available. We have built a new communication library that facilitates execution of parallel scientific applications on virtual clusters composed of volatile ordinary PC nodes.
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...ijcsit
Scheduling various processes is one of the most fundamental functions of the operating system. In that
context one of the most common scheduling algorithms used in most operating systems is the Round Robin
method in which, the ready processes waiting in the ready queue, take control of the processor for a short
period of time known as the time quantum (or time slice) circularly. Here we discuss the use of statistics
and develop a mathematical model to determine the most efficient value for time quantum. This is strictly
theoretical as we do not know the values of times for the various processes beforehand. However the
proposed approach is compared with the recent developed algorithms to this regard to determine the
efficiency of the proposed algorithm.
A report on designing a model for improving CPU Scheduling by using Machine L...MuskanRath1
Disclaimer: Please let me know in case some of the portions of the article match your research. I would include the link to your research in the description section of my article.
Description:
The main concern of our paper describes that we are proposing a model for a uniprocessor system for improving CPU scheduling. Our model is implemented at low-level language or assembly language and LINUX is used for the implementation of the model as it is an open-source environment and its kernel is editable.
There are several methods to predict the length of the CPU bursts, such as the exponential averaging method, however, these methods may not give accurate or reliable predicted values. In this paper, we will propose a Machine Learning (ML) based on the best approach to estimate the length of the CPU bursts for processes. We will make use of Bayesian Theory for our model as a classifier tool that will decide which process will execute first in the ready queue. The proposed approach aims to select the most significant attributes of the process using feature selection techniques and then predicts the CPU-burst for the process in the grid. Furthermore, applying attribute selection techniques improves the performance in terms of space, time, and estimation.
Now in touch:The linear control system functions in MATLAB.
CONTENTS:
INTRODUCTION OF MATLAB
CONTROL SYSTEM TOOLBOX
TRANSFER FUNCTION
Poles & Zeroes
Multiplication Of Transfer Functions
Closed-loop Transfer Function
TIME RESPONSE OF A CONTROL SYSTEM
Impulse
Step
Ramp
STATE SPACE REPRESENTATION
State space to transfer function
Transfer function to state space
presented at Proc. of 11th International Conference on Information Integration and Web-based Applications & Services (iiWAS2009), Kuala Lumpur,Malaysia, December 14-16, 2009
Ph d defense_Department of Information Technology, Uppsala University, SwedenSabesan Manivasakan
Querying Data Providing Web Services
Manivasakan Sabesan
Department of Information Technology
Uppsala University
Sweden.
Abstract
Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis.
The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services.
For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage.
To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser.
WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.
A study of the Behavior of Floating-Point Errorsijpla
The dangers of programs performing floating-point computations are well known. This is due to numerical reliability issues resulting from rounding errors arising during the computations. In general, these round-off errors are neglected because they are small. However, they can be accumulated and propagated and lead to faulty execution and failures. Typically, in critical embedded systems scenario, these faults may cause dramatic damages (eg. failures of Ariane 5 launch and Patriot Rocket mission). The ufp (unit in the first place) and ulp (unit in the last place) functions are used to estimate maximum value of round-off errors. In this paper, the idea consists in studying the behavior of round-off errors, checking their numerical stability using a set of constraints and ensuring that the computation results of round-off errors do not become larger when solving constraints about the ufp and ulp values.
Timo Klerx and Kalman Graffi. Bootstrapping Skynet: Calibration and Autonomic Self-Control of Structured Peer-to-Peer Networks. In IEEE P2P ’13: Proceedings of the International Conference on Peer-to-Peer Computing, 2013.
Abstract—Peer-to-peer systems scale to millions of nodes and provide routing and storage functions with best effort quality. In order to provide a guaranteed quality of the overlay functions, even under strong dynamics in the network with regard to peer capacities, online participation and usage patterns, we propose to calibrate the peer-to-peer overlay and to autonomously learn which qualities can be reached. For that, we simulate the peer- to-peer overlay systematically under a wide range of parameter configurations and use neural networks to learn the effects of the configurations on the quality metrics. Thus, by choosing a specific quality setting by the overlay operator, the network can tune itself to the learned parameter configurations that lead to the desired quality. Evaluation shows that the presented self-calibration succeeds in learning the configuration-quality interdependencies and that peer-to-peer systems can learn and adapt their behavior according to desired quality goals.
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxjacksnathalie
E2 – Fundamentals, Functions & Arrays
Please refer to announcements for details about this exam. Make sure you fill the information below to avoid not being graded properly;
Last Name
First Name
Student ID #
Here is the grading matrix where the TA will leave feedback. If you scored 100%, you will most likely not see any feedback (
Question
# points
Feedback
Max
Scored
1
Tracing
3
2
Testing
2
3
Refactoring
2
4
Debugging
3
Interlude – How to Trace Recursive Programs
To help you fill the trace table in question #1, we start with an example of a small recursive program & provide you with the trace table we expect you to draw.
Example Program to Trace
This program implements a power function recursively. We do not have local variables in either function, thus making the information in each activation record a bit shorter.
1 #include <stdio.h>
2 #include <stdlib.h>
3 int pwr(int x, int y){
4
if( y == 0 )
return 1;
5
if( y == 1 )
return x;
6
return x * pwr(x, y-1);
7 }
8 int main(){
9
printf("%d to power %d is %d\n", 5, 3, pwr(5,3));
10
return EXIT_SUCCESS;
11 }
Example Trace Table
Please note the following about the table below;
· We only write down the contents of the stack when it is changed, i.e. when we enter a function or assign a new value to a variable in the current activation record.
· When we write the contents of the stack, we write the contents of the whole stack, including previous activation records.
· Each activation record is identified by a bold line specifying the name of the function & the parameters passed to it when it was invoked.
· It is followed by a bullet list with one item per parameter or local variable.
· New activation records are added at the end of the contents of the previous stack
Line #
What happens?
Stack is
10
Entering main function
main’s activation record
· No local vars / parameters
11
Invoking function pwr as part of executing the printf
Stack is the same, no need to repeat it
4
Entering pwr function with arguments 5 & 3
main’s activation record
· No local vars / parameters
pwr(5,3) activation record
· x is 5
· y is 3
5
Testing if y is 0 ( false
6
Testing if y is 1 ( false
7
Invoking pwr(5,2) as part of return statement
4
Entering pwr function with arguments 5 & 2
main’s activation record
· No local vars / parameters
pwr(5,3) activation record
· x is 5
· y is 3
pwr(5,2) activation record
· x is 5
· y is 2
5
Testing if y is 0 ( false
6
Testing if y is 1 ( false
7
Invoking pwr(5,1)
4
Entering pwr function with arguments 5 & 1
main’s activation record
· No local vars / parameters
pwr(5,3) activation record
· x is 5
· y is 3
pwr(5,2) activation record
· x is 5
· y is 2
pwr(5,1) activation record
· x is 5
· y is 1
5
Testing if y is 0 ( false
6
Testing if y is 1 ( true
6
Return value x which is 5
7
Back from invocation of pwr(5,1) with result 5.
main’s activation record
· No local vars / parameters
pwr(5,3) activation r ...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
Regular Expression to Deterministic Finite Automata
Adaptive Parallelization of Queries over Dependent Web Service Calls
1. Adaptive Parallelization of Queries over Dependent Web Service Calls Manivasakan Sabesan and Tore Risch Uppsala Database Laboratory Dept. of Information Technology Uppsala University Sweden
2.
3. WSMED System (Web Service MEDiator) WSMED OWF 1 WSDL metadata 1 WS Operation 1 WS Operation n WS Operation 1 WS Operation m WS 1 WS n WSDL metadata n SOAP call Import metadata SQL Query OWF n Automatically generated O peration W rapper F unction(OWF) makes web services queryable. Meta store 1 3 3 1 2
4.
5.
6.
7.
8.
9.
10. Query Processing in WSMED Parallel query plan SQL query Calculus Generator Parallel pipeliner Plan function generator Central plan creator Plan splitter Phase 1 Phase 2
12. Plan Splitting and Plan Function Generation - Phase2 <city, state2> γ GetPlacesWithin(Atlanta’, st1, 15.0, ‘City’) γ concat(city,’, ‘, state2) <str > PF1 PF1(Charstring st1) -> Stream of Charstring str γ GetPlaceList(str,100,’true’) <pl, st> PF2 PF2(Charstring str) -> Stream of <Charstring pl, Charstring st>
13. WSMED Process Tree q i - query process (i=0,1,......n) Level 2 q0 q1 q3 q4 q2 PF1 GetAllStates PF2 q5 q8 q7 q6 Coordinator Level 1 Query1
14. Make Parallel Pipeline < str > < st1 > FF_APPLYLP( PF2, 3,str ) <pl, st> γ GetAllStates() FF_ APPLYP( PF1, 2, st1 ) Manually set fanouts on both levels
15.
16.
17.
18.
19.
20.
21.
22.
23.
24. .......... 2. A monitoring cycle for a non-leaf query process is defined when number of received end-of-call messages equal to number of children. 2.1 After the first monitoring cycle A FF_APPLYP adds p new child processes - an add stage . 3. When an added node has several levels of children, the init stages of A FF_APPLYP s in the children will produce a binary sub–tree . q0 q1 q3 q4 q2 q5 Coordinator Level 1 q7 q9 q8 q10 Level 2 q6 q11
25. ...... 4. A FF_APPLYP records per monitoring cycle i the average time t i to produce an incoming tuple from the children. 4.1 If t i decreases more than a threshold ( 25% ) the add stage is rerun. 4.2 If t i increases we either stop or run a drop stage that drops one child and its children. q0 q1 q3 q4 q2 q5 Coordinator Level 1 q12 q10 Level 2 q6 q11
We have developed a system , WSMED, provides general query capabilities over data accessible through web services by reading WSDL meta-data descriptions. WSDL uri is given to import meta data to its local store. While importing the meta data it automatically creates OWF as declarative function and looks like regular table. OWF makes web service operation query able. Users then view these OWF using a GUI and it illustrates the signatures of OWFs. Now users can make SQL queries , considering these OWF as regular relations, calling any web service without any programming.
A common need to search information through data providing web services , with out any side effects, returning set of objects for a given set of parameters.
By starting separate query processes each calling a plan function for different parameter tuples
The views can be queried with SQL
Central plan – heuristic cost model- web service signature- assuming web service call is expensive Sequential execution is slow.
Multilevel execution plans generated with several layers of parallelism – process tree fanout central query plan to parallel query plan coordinator initiates communication between child processes and ships plan functions. Then it stream of different parameter tuples results delivered as streams from child processes
End of call message
In these case of queries it is close to a homogenousfanout tree. Properties of web services are unknown