Your SlideShare is downloading. ×
Predictive job scheduling in a connection limited system using parallel genetic algorithm (synopsis)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Predictive job scheduling in a connection limited system using parallel genetic algorithm (synopsis)


Published on

Published in: Technology, Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Predictive Job Scheduling in a Connection Limited System using Parallel Genetic Algorithm (Synopsis)
  • 2. INTRODUCTION Most job-scheduling approaches for parallel machines apply space sharing which means allocating CPUs/nodes to jobs in a dedicated manner and sharing the machine among multiple jobs by allocation on different subsets of nodes. Some approaches apply time sharing (or better to say a combination of time and space sharing), i.e. use multiple time slices per CPU/node. Job scheduling determines when and where to execute the job, given a stream of parallel jobs and set of computing resources. In a standard working model, when a parallel job arrives to the system, the scheduler tries to allocate required number of processors for the duration of runtime to the job and, if available, starts the job immediately. If the requested processors are currently unavailable, the job is queued and scheduled to start at a later time. The most common metrics evaluated include system metrics such as the system utilization, throughput, etc. and users metrics such as turnaround time, wait time, etc. The typical charging model is based on the amount of total resources used (resources $times$ runtime) by any job.
  • 3. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?"
  • 4. Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc.. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: o Massive data collection o Powerful multiprocessor computers o Data mining algorithms Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of
  • 5. respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods.
  • 6. Overview of the System There are mainly two types of scheduling namely the system level scheduling and the application level scheduling. The scheduling system will analyze the load situation of every node and select one node to run the job. The scheduling policy is to optimize the total performance of the whole system. If the system is heavily loaded, the scheduling system has to realize the load balancing and increase the throughput and resource utilization under restricted conditions. This kind of scheduling is known as the system level scheduling. If multiple jobs arrive within a unit scheduling time slot, the scheduling system shall allocate an appropriate number of jobs to every node in order to finish these jobs under a defined objective. Obviously, the objective is usually the minimal average execution time. This scheduling policy is application-oriented so we call it application-level scheduling. A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination).
  • 7. Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached. A typical genetic algorithm requires two things to be defined: 1. a genetic representation of the solution domain, 2. a fitness function to evaluate the solution domain.
  • 8. A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, that facilitates simple crossover operation. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in Genetic programming and free-form representations are explored in HBGA. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem we want to maximize the total value of objects that we can put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.
  • 9. Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, then improve it through repetitive application of mutation, crossover, and selection operators.
  • 10. Abstract Job scheduling is the key feature of any computing environment and the efficiency of computing depends largely on the scheduling technique used. Intelligence is the key factor which is lacking in the job scheduling techniques of today. Genetic algorithms are powerful search techniques based on the mechanisms of natural selection and natural genetics. Multiple jobs are handled by the scheduler and the resource the job needs are in remote locations. Here we assume that the resource a job needs are in a location and not split over nodes and each node that has a resource runs a fixed number of jobs. The existing algorithms used are non predictive and employs greedy based algorithms or a variant of it. The efficiency of the job scheduling process would increase if previous experience and the genetic algorithms are used. In this paper, we propose a model of the scheduling algorithm where the scheduler can learn from previous experiences and an effective job scheduling is achieved as time progresses.
  • 11. Description of Problem The similar system is already available are non predictive and employs greedy based algorithms or a variant of it. That is the existing system will not predict in advance regarding the situation. So we can not schedule the jobs in network in such a way that the resources are utilized at the optimal level. The problem is to reduce the processing overhead during scheduling. The proposed system work to data transfer between computers of two networks. generally,during data transfer between pc's of two different networks. Existing Method The Data mining Algorithms can be categorized into the following :  Association Algorithm  Classification  Clustering Algorithm Classification:
  • 12. The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to specific variable(s) you are trying to predict. For example, a typical classification problem is to divide a database of companies into groups that are as homogeneous as possible with respect to a creditworthiness variable with values "Good" and "Bad." Clustering: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data —
  • 13. quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. • Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.
  • 14. Proposed System Job scheduling is the key feature of any computing environment and the efficiency of computing depends largely on the scheduling technique used. Popular algorithm called genetic concept is used in the systems across the network and scheduling the job according to predicting the load. Here the system will take care of the scheduling of data packets between the source and destination computers. • Job scheduling to route the packets at all the ports in the router • Maintaining queue of data packets and scheduling algorithm is implemented • First Come First Serve scheduling and Genetic algorithm scheduling is called for source and destination • Comparison of two algorithm is shown in this proposed system
  • 15. Hardware specifications: Processor RAM : : Intel Processor IV 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ Samtron color Keyboard Mouse : : 108 mercury keyboard Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – J2sdk1.4.0, JCreator
  • 16. Module Design Simulated Model : The simulated model of network is constructed by keeping group of computer as Network 0 and Network 1. In between the two network the router is placed from where the data from one network flows to other network. First Come First Serve Algorithm: The packet transfer between the network in implemented using FCFS algorithm Genetic Algorithm: The packet transfer between the network in implemented using Genetic algorithm. The algorithm details were discussed in Proposed system design. Projecting Result and Comparison: The data transfer between the network of source and destination is shown by drawing the path between source and destination. For drawing the path , the points across the network is also collected. The comparison of two algorithm result are displayed to the user in separate frame to see the efficiency of Genetic algorithm