SlideShare a Scribd company logo
1 of 3
Download to read offline
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Minimize Staleness and Stretch in Streaming Data
Warehouses
S. M. Subhani1
, M. Nagendramma2
1, 2
Department of CSE, BVSREC, Chimakurthy, A.P, India
Abstract: We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such
as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a
large number of streaming data feeds that are generated by external sources and arrive asynchronously. We discuss update scheduling
in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting,
external sources push append-only data streams into the warehouse with a wide range of inter-arrival times. While traditional data
warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this paper we
develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the
streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables,
and whose objective is to minimize data staleness over time.
Keywords: Data warehouse maintenance, online scheduling
1. Introduction
The goal of a streaming warehouse is to propagate new
data across all the relevant tables and views as quickly as
possible. Once new data are loaded, the applications and
triggers defined on the warehouse can take immediate
action. This allows businesses to make decisions in nearly
real time, which may lead to increased profits, improved
customer satisfaction, and prevention of serious problems
that could develop if no action was taken.
Data warehouses integrate information from multiple
operational databases to enable complex business analyses.
In traditional applications, warehouses are updated
periodically and data analysis is done off-line [3]. In
contrast, real time warehouses [1], also known as active
warehouses [4], continually load incoming data feeds to
support time-critical analyses. For instance, an Internet
Service Provider (ISP) may collect streams of network
configuration and performance data generated by remote
sources in nearly real time. New data must be loaded in a
timely manner and correlated against historical data to
quickly identify network anomalies, denial-of-service
attacks, and inconsistencies among protocol layers.
Similarly, online stock trading applications may discover
profit opportunities by comparing recent transactions in
nearly real time against historical trends. Banks may be
interested in analyzing incoming streams of credit card
transactions to protect customers against identity theft.
Since the effectiveness of a real time warehouse depends on
its ability to ingest new data, we study problems related to
data staleness. In our setting, each table in the warehouse
collects data from an external source. The arrival of a set of
new data releases an update that seeks to append the data to
the corresponding table. Since existing data are not
modified, the processing time of an update is at most
proportional to the amount of new data.
Our first objective is to nonpreemptively1 schedule the
updates on one or more processors in a way that minimizes
the total staleness of all tables. Our first contribution
answers a question implicit in [2] regarding the difficulty of
this problem. We prove that even in the purely online
model, any on-line non preemptive algorithm achieves
staleness at most a constant factor times optimal, provided
that no processor is ever voluntarily idle and provided that
the processors are sufficiently fast.
2. System Model
2.1Warehousing Architecture
Figure 1 illustrates a streaming data warehouse. Each data
stream is generated by an external source, with a batch of
new data, consisting of one or more records being pushed to
the warehouse with period pi. If the period of a stream is
unknown or unpredictable, we let the user choose a period
with which the warehouse should check for new data.
Examples of streams collected by an Internet Service
provider include router performance statistics such as CPU
usage, system logs, routing table updates, link layer alerts
etc.An important property of the data streams in our
motivating applications is that they are append-only,i.e.,
existing records are never modified or deleted. For example,
a stream of average router CPU utilization measurement
may consist of records with fields (timestamp, router name,
CPU utilization) and a new data file with updated CPU
measurement for each router may arrive at the warehouse
every five minutes. [5]
Figure 1: Stream data warehouse
A streaming data warehouse maintains two types of tables:
base and derived. Each table may be stored partially or
Paper ID: 21091302 375
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
wholly on disk. A base table is loaded directly from a data
stream. A derived table is a materialized view defined over
one or more tables. Each base or derived table Tj has a user
–defined priority pj and a time-dependent staleness function
sj(τ) that will be defined shortly. Relationships among
source and derived tables form a (directed and acyclic)
dependency graph. For each table Tj, we define a set of its
ancestor tables as those which directly or indirectly serve as
its sources, and a set of its dependent tables as those which
are directly or indirectly sourced from Tj. For example, T1,
T2 and T3 are ancestors of T4, and T3 and T4 are dependents
of T1.In practice, warehouse tables are horizontally
partitioned by time so that only a small number of recent
partitions are affected by updates [6][7].
2.2 Earliest Deadline First (EDF)
EDF has been proven to be an optimal uniprocessor
scheduling algorithms .This means that if a set of tasks is
unschedulable under EDF, then no other scheduling
algorithm can feasible schedule this task set. The EDF
algorithm chooses for execution at each instant in the time
currently active job(s) that have the nearest deadlines. The
EDF implementation upon uniform parallel machines is
according to the following rules, No Processor is idled while
there are active jobs waiting for execution, when fewer then
m jobs are active, they are required to execute on the fastest
processor while the slowest are idled, and higher priority
jobs are executed on faster processors. In Earliest Deadline
First scheduling, at every scheduling point the task having
the shortest deadline is taken up for scheduling. A task is
schedule under EDF, if and only if it satisfies the condition
that total processor utilization (Ui) due to the task set is less
than 1.
The Aim of this work is to provide a sensitivity analysis for
task deadline context of multiprocessor system by using a
new approach of EFDF (Earliest Feasible Deadline First)
algorithm. In order to decrease the number of migrations we
prevent a job from moving one processor to another
processor if it is among them higher priority jobs. Therefore,
a job will continue its execution on the same processor if
possible (processor affinity). The result of these comparisons
outlines some situations where one scheme is preferable over
the other. Partitioning schemes are better suited for hard real-
time systems, while a global scheme is preferable for soft
real-time systems.
The final EDF – partitioned scheduling algorithm is
following
1.Sort the released jobs by the local algorithm
2.For each job ji in sorted order
a) If ji’s home track is available, schedule ji on its home
track
b) Else, if there is an available free track, schedule ji on the
free track
c) Else, scan through the tracks r such that ji can be
promoted to track r
i) If track r is free and there is no released job
remaining in the sorted list for home track r,
ii) Schedule ji on track r
d) Else, delay the execution of ji
3. Minimizing Staleness
We call an algorithm eager, or work-conserving, if it leaves
no processor idle while at least one pending update exists.
We first state the rather-inscrutable Theorem 3.1, followed
by an easy-to-read corollary, which implies that for any C <
(v3 - 1)/2, there is a constant (dependent on C) such that the
staleness of any eager algorithm is at most that constant
factor times optimal, provided that each ai is at most Cp/t.
Theorem 3.1. Fix p, t. For any and d such that 0 <, d < 1,
define C, d = vd (1 - )/v3 > 0. Given p processors and t
tables, pick any a such that a/ [1 - a/ (p/t)] = C, d p/t. Then
the penalty incurred by an eager algorithm is at most (1 + a)
2(1/ 4) (1/ (1 - d)) times LOW, provided that each ai = a.
Since LOW is a lower bound on the staleness achieved by
any algorithm, even the optimal, prescient one, and penalty
is an upper bound on the staleness achieved by any eager
algorithm, the corollary implies the claimed competitiveness
Proof: B be the set of batches in this run. For some batch Bi
B, let ci be the length of the first update, di be the wait time,
and bi be the total length of the batch, i.e., the sum of the
lengths of its updates. Clearly,
ci = bi = ci + di, (1)
since ci = bi is obvious and since ci + di is the duration in
time from the start (not release) time of the first job in the
batch till the update for the batch starts, and this duration is
clearly at least the length bi of the batch. For the penalty of
this batch, denoted by i, we take the square of the own time,
i.e., the length ci of the first update plus the wait time di
plus the processing time of the entire batch:
i= [(ci+di) +abi] 2, by the definition of penalty (2)
= (1+a) 2(ci+di) 2, by (1). (3)
Figure 2 illustrates the quantities bi, ci and di using the same
example as that in Figure 1; in particular, we consider the
batch consisting of updates arriving at times ri, 1 and ri, 2.
Let A be the set of all updates. From the definition of
LOW, each update i A has a budget of a2 units, where ai
is the length of update i. Our proof requires the use of a
charging scheme. A charging scheme specifies what
fraction of its budget each update pays to a certain batch.
Let us call a batch Bi tardy if ci < (ci + di) (where comes
from Theorem 3.1); otherwise it is punctual. Let us denote
the corresponding sets by Bt and Bp respectively. More
formally, a charging scheme is a matrix (vij) of nonnegative
values, where vij shows the extent of dependence of batch i
on the budget available to batch j, with the following two
properties.
Paper ID: 21091302 376
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Figure 2: A plot of the staleness of table i over time
4. Conclusion
In this paper, we studied the complexity of scheduling data-
loading jobs to minimize the staleness of a real time stream
warehouse. We proved that any on-line non-preemptive
algorithm that is never voluntarily idle achieves a constant
competitive ratio with respect to the total staleness of all
tables in the warehouse, provided that the processors are
sufficiently fast.
We solved the problem of scheduling updates in a real-time
streaming warehouse. We projected the notion of averages
staleness as a scheduling metric and presented scheduling
algorithms designed to handle complex environment of a
streaming data warehouse. We then proposed a scheduling
framework that assigns jobs to processing tracks and also
uses the basic algorithms to schedule jobs within a same.
The main feature of our framework is the ability to reserve
resources for short jobs that dften correspond to important
frequently refreshed tables while avoiding the inefficiencies
associated with partitioned scheduling techniques. Feature
work is needed for choosing the right scheduling
granularity when it is more efficient to update multiple
tables together.
References
[1] L. Golab, T. Johnson, J. S. Seidel and V. Shkapenyuk,
Stream Warehousing with Data Depot, SIGMOD 2009,
847-854.
[2] L. Golab, T. Johnson, and V. Shkapenyuk, Scheduling
Updates in a Real Time Stream Warehouse, ICDE 2009,
1207-1210.
[3] W. Labio, R. Yerneni, and H. Garcia-Molina, Shrinking
the Warehouse Update Window, SIGMOD1999, 383-
394.
[4] N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simits
is, and N.-E. Frantzell, Supporting Streaming Updates in
an Active Data Warehouse, ICDE 2007, 476-485.
[5] Scalable Scheduling of Updates in Streaming Data
Warehouses Lukasz Golab, Theodore Johnson and
Vladislav Shkapenyuk AT&T Labs – Research, Florham
Park, NJ, 0793.
[6] N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S.
Bel lamkonda, S. Shankar, T. Bozkaya, and L. Sheng,
optimizing refresh of a set of materialized views, VLD
B 2005, 1043- 1054.
[7] L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk
Stream warehousing with Data Depot, SIGMOD 2009,
847-854.
Paper ID: 21091302 377

More Related Content

What's hot

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System ManagementIbrahim Amer
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Editor IJCATR
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithmsiosrjce
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced antIJCI JOURNAL
 
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud ComputingJob Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computingijsrd.com
 
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...IRJET Journal
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmenteSAT Publishing House
 
A study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingA study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingeSAT Publishing House
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingLogin Technoligies
 

What's hot (19)

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
E01113138
E01113138E01113138
E01113138
 
Ch13
Ch13Ch13
Ch13
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithms
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced ant
 
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud ComputingJob Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
 
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
RTS
RTSRTS
RTS
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
J0210053057
J0210053057J0210053057
J0210053057
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environment
 
A study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingA study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed scheduling
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process Scheduling
 

Viewers also liked (7)

Lake Water Environment Capacity Analysis Based on Steady-State Model
Lake Water Environment Capacity Analysis Based on Steady-State ModelLake Water Environment Capacity Analysis Based on Steady-State Model
Lake Water Environment Capacity Analysis Based on Steady-State Model
 
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
 
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
 
Detection of Cysts in Ultrasonic Images of Ovary
Detection of Cysts in Ultrasonic Images of OvaryDetection of Cysts in Ultrasonic Images of Ovary
Detection of Cysts in Ultrasonic Images of Ovary
 
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron BeamRadiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
 

Similar to Minimize Staleness and Stretch in Streaming Data Warehouses

Survey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsSurvey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsIOSR Journals
 
A Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsA Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsJim Webb
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga) Nagarajan
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/AAbdul Munam
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...ijceronline
 
capacityshifting1
capacityshifting1capacityshifting1
capacityshifting1Gokul Vasan
 
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...IOSR Journals
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfManjuAppukuttan2
 
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICESCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICEijait
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...Finalyear Projects
 
Scheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic FilterScheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic Filterijait
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed EnvironmentIRJET Journal
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems iosrjce
 

Similar to Minimize Staleness and Stretch in Streaming Data Warehouses (20)

Survey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsSurvey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling Algorithms
 
Updating and Scheduling of Streaming Web Services in Data Warehouses
Updating and Scheduling of Streaming Web Services in Data WarehousesUpdating and Scheduling of Streaming Web Services in Data Warehouses
Updating and Scheduling of Streaming Web Services in Data Warehouses
 
A Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsA Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed Systems
 
Ijariie1161
Ijariie1161Ijariie1161
Ijariie1161
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga)
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/A
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
 
capacityshifting1
capacityshifting1capacityshifting1
capacityshifting1
 
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
 
G017113537
G017113537G017113537
G017113537
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdf
 
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICESCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
Scheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic FilterScheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic Filter
 
Os
OsOs
Os
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed Environment
 
K017617786
K017617786K017617786
K017617786
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
 
Ijetr012051
Ijetr012051Ijetr012051
Ijetr012051
 

More from International Journal of Science and Research (IJSR)

More from International Journal of Science and Research (IJSR) (20)

Innovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart FailureInnovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart Failure
 
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverterDesign and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
 
Polarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material systemPolarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material system
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
 
Ad hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo sAd hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo s
 
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
 
Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...
 
An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...
 
Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...
 
Comparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brainComparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brain
 
T s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusionsT s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusions
 
Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
A new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidthA new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidth
 
Main physical causes of climate change and global warming a general overview
Main physical causes of climate change and global warming   a general overviewMain physical causes of climate change and global warming   a general overview
Main physical causes of climate change and global warming a general overview
 
Performance assessment of control loops
Performance assessment of control loopsPerformance assessment of control loops
Performance assessment of control loops
 
Capital market in bangladesh an overview
Capital market in bangladesh an overviewCapital market in bangladesh an overview
Capital market in bangladesh an overview
 
Faster and resourceful multi core web crawling
Faster and resourceful multi core web crawlingFaster and resourceful multi core web crawling
Faster and resourceful multi core web crawling
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
 
Parallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errorsParallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errors
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 

Recently uploaded (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 

Minimize Staleness and Stretch in Streaming Data Warehouses

  • 1. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net Minimize Staleness and Stretch in Streaming Data Warehouses S. M. Subhani1 , M. Nagendramma2 1, 2 Department of CSE, BVSREC, Chimakurthy, A.P, India Abstract: We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. We discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting, external sources push append-only data streams into the warehouse with a wide range of inter-arrival times. While traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this paper we develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time. Keywords: Data warehouse maintenance, online scheduling 1. Introduction The goal of a streaming warehouse is to propagate new data across all the relevant tables and views as quickly as possible. Once new data are loaded, the applications and triggers defined on the warehouse can take immediate action. This allows businesses to make decisions in nearly real time, which may lead to increased profits, improved customer satisfaction, and prevention of serious problems that could develop if no action was taken. Data warehouses integrate information from multiple operational databases to enable complex business analyses. In traditional applications, warehouses are updated periodically and data analysis is done off-line [3]. In contrast, real time warehouses [1], also known as active warehouses [4], continually load incoming data feeds to support time-critical analyses. For instance, an Internet Service Provider (ISP) may collect streams of network configuration and performance data generated by remote sources in nearly real time. New data must be loaded in a timely manner and correlated against historical data to quickly identify network anomalies, denial-of-service attacks, and inconsistencies among protocol layers. Similarly, online stock trading applications may discover profit opportunities by comparing recent transactions in nearly real time against historical trends. Banks may be interested in analyzing incoming streams of credit card transactions to protect customers against identity theft. Since the effectiveness of a real time warehouse depends on its ability to ingest new data, we study problems related to data staleness. In our setting, each table in the warehouse collects data from an external source. The arrival of a set of new data releases an update that seeks to append the data to the corresponding table. Since existing data are not modified, the processing time of an update is at most proportional to the amount of new data. Our first objective is to nonpreemptively1 schedule the updates on one or more processors in a way that minimizes the total staleness of all tables. Our first contribution answers a question implicit in [2] regarding the difficulty of this problem. We prove that even in the purely online model, any on-line non preemptive algorithm achieves staleness at most a constant factor times optimal, provided that no processor is ever voluntarily idle and provided that the processors are sufficiently fast. 2. System Model 2.1Warehousing Architecture Figure 1 illustrates a streaming data warehouse. Each data stream is generated by an external source, with a batch of new data, consisting of one or more records being pushed to the warehouse with period pi. If the period of a stream is unknown or unpredictable, we let the user choose a period with which the warehouse should check for new data. Examples of streams collected by an Internet Service provider include router performance statistics such as CPU usage, system logs, routing table updates, link layer alerts etc.An important property of the data streams in our motivating applications is that they are append-only,i.e., existing records are never modified or deleted. For example, a stream of average router CPU utilization measurement may consist of records with fields (timestamp, router name, CPU utilization) and a new data file with updated CPU measurement for each router may arrive at the warehouse every five minutes. [5] Figure 1: Stream data warehouse A streaming data warehouse maintains two types of tables: base and derived. Each table may be stored partially or Paper ID: 21091302 375
  • 2. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net wholly on disk. A base table is loaded directly from a data stream. A derived table is a materialized view defined over one or more tables. Each base or derived table Tj has a user –defined priority pj and a time-dependent staleness function sj(τ) that will be defined shortly. Relationships among source and derived tables form a (directed and acyclic) dependency graph. For each table Tj, we define a set of its ancestor tables as those which directly or indirectly serve as its sources, and a set of its dependent tables as those which are directly or indirectly sourced from Tj. For example, T1, T2 and T3 are ancestors of T4, and T3 and T4 are dependents of T1.In practice, warehouse tables are horizontally partitioned by time so that only a small number of recent partitions are affected by updates [6][7]. 2.2 Earliest Deadline First (EDF) EDF has been proven to be an optimal uniprocessor scheduling algorithms .This means that if a set of tasks is unschedulable under EDF, then no other scheduling algorithm can feasible schedule this task set. The EDF algorithm chooses for execution at each instant in the time currently active job(s) that have the nearest deadlines. The EDF implementation upon uniform parallel machines is according to the following rules, No Processor is idled while there are active jobs waiting for execution, when fewer then m jobs are active, they are required to execute on the fastest processor while the slowest are idled, and higher priority jobs are executed on faster processors. In Earliest Deadline First scheduling, at every scheduling point the task having the shortest deadline is taken up for scheduling. A task is schedule under EDF, if and only if it satisfies the condition that total processor utilization (Ui) due to the task set is less than 1. The Aim of this work is to provide a sensitivity analysis for task deadline context of multiprocessor system by using a new approach of EFDF (Earliest Feasible Deadline First) algorithm. In order to decrease the number of migrations we prevent a job from moving one processor to another processor if it is among them higher priority jobs. Therefore, a job will continue its execution on the same processor if possible (processor affinity). The result of these comparisons outlines some situations where one scheme is preferable over the other. Partitioning schemes are better suited for hard real- time systems, while a global scheme is preferable for soft real-time systems. The final EDF – partitioned scheduling algorithm is following 1.Sort the released jobs by the local algorithm 2.For each job ji in sorted order a) If ji’s home track is available, schedule ji on its home track b) Else, if there is an available free track, schedule ji on the free track c) Else, scan through the tracks r such that ji can be promoted to track r i) If track r is free and there is no released job remaining in the sorted list for home track r, ii) Schedule ji on track r d) Else, delay the execution of ji 3. Minimizing Staleness We call an algorithm eager, or work-conserving, if it leaves no processor idle while at least one pending update exists. We first state the rather-inscrutable Theorem 3.1, followed by an easy-to-read corollary, which implies that for any C < (v3 - 1)/2, there is a constant (dependent on C) such that the staleness of any eager algorithm is at most that constant factor times optimal, provided that each ai is at most Cp/t. Theorem 3.1. Fix p, t. For any and d such that 0 <, d < 1, define C, d = vd (1 - )/v3 > 0. Given p processors and t tables, pick any a such that a/ [1 - a/ (p/t)] = C, d p/t. Then the penalty incurred by an eager algorithm is at most (1 + a) 2(1/ 4) (1/ (1 - d)) times LOW, provided that each ai = a. Since LOW is a lower bound on the staleness achieved by any algorithm, even the optimal, prescient one, and penalty is an upper bound on the staleness achieved by any eager algorithm, the corollary implies the claimed competitiveness Proof: B be the set of batches in this run. For some batch Bi B, let ci be the length of the first update, di be the wait time, and bi be the total length of the batch, i.e., the sum of the lengths of its updates. Clearly, ci = bi = ci + di, (1) since ci = bi is obvious and since ci + di is the duration in time from the start (not release) time of the first job in the batch till the update for the batch starts, and this duration is clearly at least the length bi of the batch. For the penalty of this batch, denoted by i, we take the square of the own time, i.e., the length ci of the first update plus the wait time di plus the processing time of the entire batch: i= [(ci+di) +abi] 2, by the definition of penalty (2) = (1+a) 2(ci+di) 2, by (1). (3) Figure 2 illustrates the quantities bi, ci and di using the same example as that in Figure 1; in particular, we consider the batch consisting of updates arriving at times ri, 1 and ri, 2. Let A be the set of all updates. From the definition of LOW, each update i A has a budget of a2 units, where ai is the length of update i. Our proof requires the use of a charging scheme. A charging scheme specifies what fraction of its budget each update pays to a certain batch. Let us call a batch Bi tardy if ci < (ci + di) (where comes from Theorem 3.1); otherwise it is punctual. Let us denote the corresponding sets by Bt and Bp respectively. More formally, a charging scheme is a matrix (vij) of nonnegative values, where vij shows the extent of dependence of batch i on the budget available to batch j, with the following two properties. Paper ID: 21091302 376
  • 3. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net Figure 2: A plot of the staleness of table i over time 4. Conclusion In this paper, we studied the complexity of scheduling data- loading jobs to minimize the staleness of a real time stream warehouse. We proved that any on-line non-preemptive algorithm that is never voluntarily idle achieves a constant competitive ratio with respect to the total staleness of all tables in the warehouse, provided that the processors are sufficiently fast. We solved the problem of scheduling updates in a real-time streaming warehouse. We projected the notion of averages staleness as a scheduling metric and presented scheduling algorithms designed to handle complex environment of a streaming data warehouse. We then proposed a scheduling framework that assigns jobs to processing tracks and also uses the basic algorithms to schedule jobs within a same. The main feature of our framework is the ability to reserve resources for short jobs that dften correspond to important frequently refreshed tables while avoiding the inefficiencies associated with partitioned scheduling techniques. Feature work is needed for choosing the right scheduling granularity when it is more efficient to update multiple tables together. References [1] L. Golab, T. Johnson, J. S. Seidel and V. Shkapenyuk, Stream Warehousing with Data Depot, SIGMOD 2009, 847-854. [2] L. Golab, T. Johnson, and V. Shkapenyuk, Scheduling Updates in a Real Time Stream Warehouse, ICDE 2009, 1207-1210. [3] W. Labio, R. Yerneni, and H. Garcia-Molina, Shrinking the Warehouse Update Window, SIGMOD1999, 383- 394. [4] N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simits is, and N.-E. Frantzell, Supporting Streaming Updates in an Active Data Warehouse, ICDE 2007, 476-485. [5] Scalable Scheduling of Updates in Streaming Data Warehouses Lukasz Golab, Theodore Johnson and Vladislav Shkapenyuk AT&T Labs – Research, Florham Park, NJ, 0793. [6] N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bel lamkonda, S. Shankar, T. Bozkaya, and L. Sheng, optimizing refresh of a set of materialized views, VLD B 2005, 1043- 1054. [7] L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk Stream warehousing with Data Depot, SIGMOD 2009, 847-854. Paper ID: 21091302 377