SlideShare a Scribd company logo
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Minimize Staleness and Stretch in Streaming Data
Warehouses
S. M. Subhani1
, M. Nagendramma2
1, 2
Department of CSE, BVSREC, Chimakurthy, A.P, India
Abstract: We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such
as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a
large number of streaming data feeds that are generated by external sources and arrive asynchronously. We discuss update scheduling
in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting,
external sources push append-only data streams into the warehouse with a wide range of inter-arrival times. While traditional data
warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this paper we
develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the
streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables,
and whose objective is to minimize data staleness over time.
Keywords: Data warehouse maintenance, online scheduling
1. Introduction
The goal of a streaming warehouse is to propagate new
data across all the relevant tables and views as quickly as
possible. Once new data are loaded, the applications and
triggers defined on the warehouse can take immediate
action. This allows businesses to make decisions in nearly
real time, which may lead to increased profits, improved
customer satisfaction, and prevention of serious problems
that could develop if no action was taken.
Data warehouses integrate information from multiple
operational databases to enable complex business analyses.
In traditional applications, warehouses are updated
periodically and data analysis is done off-line [3]. In
contrast, real time warehouses [1], also known as active
warehouses [4], continually load incoming data feeds to
support time-critical analyses. For instance, an Internet
Service Provider (ISP) may collect streams of network
configuration and performance data generated by remote
sources in nearly real time. New data must be loaded in a
timely manner and correlated against historical data to
quickly identify network anomalies, denial-of-service
attacks, and inconsistencies among protocol layers.
Similarly, online stock trading applications may discover
profit opportunities by comparing recent transactions in
nearly real time against historical trends. Banks may be
interested in analyzing incoming streams of credit card
transactions to protect customers against identity theft.
Since the effectiveness of a real time warehouse depends on
its ability to ingest new data, we study problems related to
data staleness. In our setting, each table in the warehouse
collects data from an external source. The arrival of a set of
new data releases an update that seeks to append the data to
the corresponding table. Since existing data are not
modified, the processing time of an update is at most
proportional to the amount of new data.
Our first objective is to nonpreemptively1 schedule the
updates on one or more processors in a way that minimizes
the total staleness of all tables. Our first contribution
answers a question implicit in [2] regarding the difficulty of
this problem. We prove that even in the purely online
model, any on-line non preemptive algorithm achieves
staleness at most a constant factor times optimal, provided
that no processor is ever voluntarily idle and provided that
the processors are sufficiently fast.
2. System Model
2.1Warehousing Architecture
Figure 1 illustrates a streaming data warehouse. Each data
stream is generated by an external source, with a batch of
new data, consisting of one or more records being pushed to
the warehouse with period pi. If the period of a stream is
unknown or unpredictable, we let the user choose a period
with which the warehouse should check for new data.
Examples of streams collected by an Internet Service
provider include router performance statistics such as CPU
usage, system logs, routing table updates, link layer alerts
etc.An important property of the data streams in our
motivating applications is that they are append-only,i.e.,
existing records are never modified or deleted. For example,
a stream of average router CPU utilization measurement
may consist of records with fields (timestamp, router name,
CPU utilization) and a new data file with updated CPU
measurement for each router may arrive at the warehouse
every five minutes. [5]
Figure 1: Stream data warehouse
A streaming data warehouse maintains two types of tables:
base and derived. Each table may be stored partially or
Paper ID: 21091302 375
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
wholly on disk. A base table is loaded directly from a data
stream. A derived table is a materialized view defined over
one or more tables. Each base or derived table Tj has a user
–defined priority pj and a time-dependent staleness function
sj(τ) that will be defined shortly. Relationships among
source and derived tables form a (directed and acyclic)
dependency graph. For each table Tj, we define a set of its
ancestor tables as those which directly or indirectly serve as
its sources, and a set of its dependent tables as those which
are directly or indirectly sourced from Tj. For example, T1,
T2 and T3 are ancestors of T4, and T3 and T4 are dependents
of T1.In practice, warehouse tables are horizontally
partitioned by time so that only a small number of recent
partitions are affected by updates [6][7].
2.2 Earliest Deadline First (EDF)
EDF has been proven to be an optimal uniprocessor
scheduling algorithms .This means that if a set of tasks is
unschedulable under EDF, then no other scheduling
algorithm can feasible schedule this task set. The EDF
algorithm chooses for execution at each instant in the time
currently active job(s) that have the nearest deadlines. The
EDF implementation upon uniform parallel machines is
according to the following rules, No Processor is idled while
there are active jobs waiting for execution, when fewer then
m jobs are active, they are required to execute on the fastest
processor while the slowest are idled, and higher priority
jobs are executed on faster processors. In Earliest Deadline
First scheduling, at every scheduling point the task having
the shortest deadline is taken up for scheduling. A task is
schedule under EDF, if and only if it satisfies the condition
that total processor utilization (Ui) due to the task set is less
than 1.
The Aim of this work is to provide a sensitivity analysis for
task deadline context of multiprocessor system by using a
new approach of EFDF (Earliest Feasible Deadline First)
algorithm. In order to decrease the number of migrations we
prevent a job from moving one processor to another
processor if it is among them higher priority jobs. Therefore,
a job will continue its execution on the same processor if
possible (processor affinity). The result of these comparisons
outlines some situations where one scheme is preferable over
the other. Partitioning schemes are better suited for hard real-
time systems, while a global scheme is preferable for soft
real-time systems.
The final EDF – partitioned scheduling algorithm is
following
1.Sort the released jobs by the local algorithm
2.For each job ji in sorted order
a) If ji’s home track is available, schedule ji on its home
track
b) Else, if there is an available free track, schedule ji on the
free track
c) Else, scan through the tracks r such that ji can be
promoted to track r
i) If track r is free and there is no released job
remaining in the sorted list for home track r,
ii) Schedule ji on track r
d) Else, delay the execution of ji
3. Minimizing Staleness
We call an algorithm eager, or work-conserving, if it leaves
no processor idle while at least one pending update exists.
We first state the rather-inscrutable Theorem 3.1, followed
by an easy-to-read corollary, which implies that for any C <
(v3 - 1)/2, there is a constant (dependent on C) such that the
staleness of any eager algorithm is at most that constant
factor times optimal, provided that each ai is at most Cp/t.
Theorem 3.1. Fix p, t. For any and d such that 0 <, d < 1,
define C, d = vd (1 - )/v3 > 0. Given p processors and t
tables, pick any a such that a/ [1 - a/ (p/t)] = C, d p/t. Then
the penalty incurred by an eager algorithm is at most (1 + a)
2(1/ 4) (1/ (1 - d)) times LOW, provided that each ai = a.
Since LOW is a lower bound on the staleness achieved by
any algorithm, even the optimal, prescient one, and penalty
is an upper bound on the staleness achieved by any eager
algorithm, the corollary implies the claimed competitiveness
Proof: B be the set of batches in this run. For some batch Bi
B, let ci be the length of the first update, di be the wait time,
and bi be the total length of the batch, i.e., the sum of the
lengths of its updates. Clearly,
ci = bi = ci + di, (1)
since ci = bi is obvious and since ci + di is the duration in
time from the start (not release) time of the first job in the
batch till the update for the batch starts, and this duration is
clearly at least the length bi of the batch. For the penalty of
this batch, denoted by i, we take the square of the own time,
i.e., the length ci of the first update plus the wait time di
plus the processing time of the entire batch:
i= [(ci+di) +abi] 2, by the definition of penalty (2)
= (1+a) 2(ci+di) 2, by (1). (3)
Figure 2 illustrates the quantities bi, ci and di using the same
example as that in Figure 1; in particular, we consider the
batch consisting of updates arriving at times ri, 1 and ri, 2.
Let A be the set of all updates. From the definition of
LOW, each update i A has a budget of a2 units, where ai
is the length of update i. Our proof requires the use of a
charging scheme. A charging scheme specifies what
fraction of its budget each update pays to a certain batch.
Let us call a batch Bi tardy if ci < (ci + di) (where comes
from Theorem 3.1); otherwise it is punctual. Let us denote
the corresponding sets by Bt and Bp respectively. More
formally, a charging scheme is a matrix (vij) of nonnegative
values, where vij shows the extent of dependence of batch i
on the budget available to batch j, with the following two
properties.
Paper ID: 21091302 376
International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Figure 2: A plot of the staleness of table i over time
4. Conclusion
In this paper, we studied the complexity of scheduling data-
loading jobs to minimize the staleness of a real time stream
warehouse. We proved that any on-line non-preemptive
algorithm that is never voluntarily idle achieves a constant
competitive ratio with respect to the total staleness of all
tables in the warehouse, provided that the processors are
sufficiently fast.
We solved the problem of scheduling updates in a real-time
streaming warehouse. We projected the notion of averages
staleness as a scheduling metric and presented scheduling
algorithms designed to handle complex environment of a
streaming data warehouse. We then proposed a scheduling
framework that assigns jobs to processing tracks and also
uses the basic algorithms to schedule jobs within a same.
The main feature of our framework is the ability to reserve
resources for short jobs that dften correspond to important
frequently refreshed tables while avoiding the inefficiencies
associated with partitioned scheduling techniques. Feature
work is needed for choosing the right scheduling
granularity when it is more efficient to update multiple
tables together.
References
[1] L. Golab, T. Johnson, J. S. Seidel and V. Shkapenyuk,
Stream Warehousing with Data Depot, SIGMOD 2009,
847-854.
[2] L. Golab, T. Johnson, and V. Shkapenyuk, Scheduling
Updates in a Real Time Stream Warehouse, ICDE 2009,
1207-1210.
[3] W. Labio, R. Yerneni, and H. Garcia-Molina, Shrinking
the Warehouse Update Window, SIGMOD1999, 383-
394.
[4] N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simits
is, and N.-E. Frantzell, Supporting Streaming Updates in
an Active Data Warehouse, ICDE 2007, 476-485.
[5] Scalable Scheduling of Updates in Streaming Data
Warehouses Lukasz Golab, Theodore Johnson and
Vladislav Shkapenyuk AT&T Labs – Research, Florham
Park, NJ, 0793.
[6] N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S.
Bel lamkonda, S. Shankar, T. Bozkaya, and L. Sheng,
optimizing refresh of a set of materialized views, VLD
B 2005, 1043- 1054.
[7] L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk
Stream warehousing with Data Depot, SIGMOD 2009,
847-854.
Paper ID: 21091302 377

More Related Content

What's hot

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System ManagementIbrahim Amer
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Editor IJCATR
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithmsiosrjce
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced antIJCI JOURNAL
 
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud ComputingJob Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computingijsrd.com
 
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...IRJET Journal
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmenteSAT Publishing House
 
A study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingA study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingeSAT Publishing House
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingLogin Technoligies
 

What's hot (19)

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
E01113138
E01113138E01113138
E01113138
 
Ch13
Ch13Ch13
Ch13
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithms
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced ant
 
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud ComputingJob Resource Ratio Based Priority Driven Scheduling in Cloud Computing
Job Resource Ratio Based Priority Driven Scheduling in Cloud Computing
 
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
A Heterogeneous Static Hierarchical Expected Completion Time Based Scheduling...
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
RTS
RTSRTS
RTS
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
J0210053057
J0210053057J0210053057
J0210053057
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environment
 
A study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed schedulingA study of load distribution algorithms in distributed scheduling
A study of load distribution algorithms in distributed scheduling
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process Scheduling
 

Viewers also liked (7)

Lake Water Environment Capacity Analysis Based on Steady-State Model
Lake Water Environment Capacity Analysis Based on Steady-State ModelLake Water Environment Capacity Analysis Based on Steady-State Model
Lake Water Environment Capacity Analysis Based on Steady-State Model
 
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
Impacts of Agricultural Activities on Water Quality in the Dufuya Dambos, Low...
 
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
Design of Remote Video Monitoring and Motion Detection System based on Arm-Li...
 
Detection of Cysts in Ultrasonic Images of Ovary
Detection of Cysts in Ultrasonic Images of OvaryDetection of Cysts in Ultrasonic Images of Ovary
Detection of Cysts in Ultrasonic Images of Ovary
 
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
Providing Accident Detection in Vehicular Networks through OBD-II Devices and...
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron BeamRadiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
Radiochemical Properties of Irradiated PVA\AgNO3 Film by Electron Beam
 

Similar to Minimize Staleness and Stretch in Streaming Data Warehouses

Survey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsSurvey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsIOSR Journals
 
A Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsA Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsJim Webb
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga) Nagarajan
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/AAbdul Munam
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...ijceronline
 
capacityshifting1
capacityshifting1capacityshifting1
capacityshifting1Gokul Vasan
 
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...IOSR Journals
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfManjuAppukuttan2
 
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICESCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICEijait
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...Finalyear Projects
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
Scheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic FilterScheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic Filterijait
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed EnvironmentIRJET Journal
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems iosrjce
 

Similar to Minimize Staleness and Stretch in Streaming Data Warehouses (20)

Survey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling AlgorithmsSurvey of Real Time Scheduling Algorithms
Survey of Real Time Scheduling Algorithms
 
Updating and Scheduling of Streaming Web Services in Data Warehouses
Updating and Scheduling of Streaming Web Services in Data WarehousesUpdating and Scheduling of Streaming Web Services in Data Warehouses
Updating and Scheduling of Streaming Web Services in Data Warehouses
 
A Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed SystemsA Deterministic Model Of Time For Distributed Systems
A Deterministic Model Of Time For Distributed Systems
 
Ijariie1161
Ijariie1161Ijariie1161
Ijariie1161
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga)
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/A
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
 
capacityshifting1
capacityshifting1capacityshifting1
capacityshifting1
 
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...An Autonomous Self-Assessment Application to Track the Efficiency of a System...
An Autonomous Self-Assessment Application to Track the Efficiency of a System...
 
G017113537
G017113537G017113537
G017113537
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdf
 
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICESCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
Scheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic FilterScheduling and Allocation Algorithm for an Elliptic Filter
Scheduling and Allocation Algorithm for an Elliptic Filter
 
Os
OsOs
Os
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed Environment
 
K017617786
K017617786K017617786
K017617786
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
 
Ijetr012051
Ijetr012051Ijetr012051
Ijetr012051
 

More from International Journal of Science and Research (IJSR)

More from International Journal of Science and Research (IJSR) (20)

Innovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart FailureInnovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart Failure
 
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverterDesign and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
 
Polarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material systemPolarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material system
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
 
Ad hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo sAd hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo s
 
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
 
Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...
 
An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...
 
Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...
 
Comparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brainComparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brain
 
T s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusionsT s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusions
 
Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
A new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidthA new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidth
 
Main physical causes of climate change and global warming a general overview
Main physical causes of climate change and global warming   a general overviewMain physical causes of climate change and global warming   a general overview
Main physical causes of climate change and global warming a general overview
 
Performance assessment of control loops
Performance assessment of control loopsPerformance assessment of control loops
Performance assessment of control loops
 
Capital market in bangladesh an overview
Capital market in bangladesh an overviewCapital market in bangladesh an overview
Capital market in bangladesh an overview
 
Faster and resourceful multi core web crawling
Faster and resourceful multi core web crawlingFaster and resourceful multi core web crawling
Faster and resourceful multi core web crawling
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
 
Parallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errorsParallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errors
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfjoachimlavalley1
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfTamralipta Mahavidyalaya
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxJisc
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsparmarsneha2
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativePeter Windle
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...Jisc
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxakshayaramakrishnan21
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxDenish Jangid
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 

Recently uploaded (20)

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 

Minimize Staleness and Stretch in Streaming Data Warehouses

  • 1. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net Minimize Staleness and Stretch in Streaming Data Warehouses S. M. Subhani1 , M. Nagendramma2 1, 2 Department of CSE, BVSREC, Chimakurthy, A.P, India Abstract: We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. We discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting, external sources push append-only data streams into the warehouse with a wide range of inter-arrival times. While traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this paper we develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time. Keywords: Data warehouse maintenance, online scheduling 1. Introduction The goal of a streaming warehouse is to propagate new data across all the relevant tables and views as quickly as possible. Once new data are loaded, the applications and triggers defined on the warehouse can take immediate action. This allows businesses to make decisions in nearly real time, which may lead to increased profits, improved customer satisfaction, and prevention of serious problems that could develop if no action was taken. Data warehouses integrate information from multiple operational databases to enable complex business analyses. In traditional applications, warehouses are updated periodically and data analysis is done off-line [3]. In contrast, real time warehouses [1], also known as active warehouses [4], continually load incoming data feeds to support time-critical analyses. For instance, an Internet Service Provider (ISP) may collect streams of network configuration and performance data generated by remote sources in nearly real time. New data must be loaded in a timely manner and correlated against historical data to quickly identify network anomalies, denial-of-service attacks, and inconsistencies among protocol layers. Similarly, online stock trading applications may discover profit opportunities by comparing recent transactions in nearly real time against historical trends. Banks may be interested in analyzing incoming streams of credit card transactions to protect customers against identity theft. Since the effectiveness of a real time warehouse depends on its ability to ingest new data, we study problems related to data staleness. In our setting, each table in the warehouse collects data from an external source. The arrival of a set of new data releases an update that seeks to append the data to the corresponding table. Since existing data are not modified, the processing time of an update is at most proportional to the amount of new data. Our first objective is to nonpreemptively1 schedule the updates on one or more processors in a way that minimizes the total staleness of all tables. Our first contribution answers a question implicit in [2] regarding the difficulty of this problem. We prove that even in the purely online model, any on-line non preemptive algorithm achieves staleness at most a constant factor times optimal, provided that no processor is ever voluntarily idle and provided that the processors are sufficiently fast. 2. System Model 2.1Warehousing Architecture Figure 1 illustrates a streaming data warehouse. Each data stream is generated by an external source, with a batch of new data, consisting of one or more records being pushed to the warehouse with period pi. If the period of a stream is unknown or unpredictable, we let the user choose a period with which the warehouse should check for new data. Examples of streams collected by an Internet Service provider include router performance statistics such as CPU usage, system logs, routing table updates, link layer alerts etc.An important property of the data streams in our motivating applications is that they are append-only,i.e., existing records are never modified or deleted. For example, a stream of average router CPU utilization measurement may consist of records with fields (timestamp, router name, CPU utilization) and a new data file with updated CPU measurement for each router may arrive at the warehouse every five minutes. [5] Figure 1: Stream data warehouse A streaming data warehouse maintains two types of tables: base and derived. Each table may be stored partially or Paper ID: 21091302 375
  • 2. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net wholly on disk. A base table is loaded directly from a data stream. A derived table is a materialized view defined over one or more tables. Each base or derived table Tj has a user –defined priority pj and a time-dependent staleness function sj(τ) that will be defined shortly. Relationships among source and derived tables form a (directed and acyclic) dependency graph. For each table Tj, we define a set of its ancestor tables as those which directly or indirectly serve as its sources, and a set of its dependent tables as those which are directly or indirectly sourced from Tj. For example, T1, T2 and T3 are ancestors of T4, and T3 and T4 are dependents of T1.In practice, warehouse tables are horizontally partitioned by time so that only a small number of recent partitions are affected by updates [6][7]. 2.2 Earliest Deadline First (EDF) EDF has been proven to be an optimal uniprocessor scheduling algorithms .This means that if a set of tasks is unschedulable under EDF, then no other scheduling algorithm can feasible schedule this task set. The EDF algorithm chooses for execution at each instant in the time currently active job(s) that have the nearest deadlines. The EDF implementation upon uniform parallel machines is according to the following rules, No Processor is idled while there are active jobs waiting for execution, when fewer then m jobs are active, they are required to execute on the fastest processor while the slowest are idled, and higher priority jobs are executed on faster processors. In Earliest Deadline First scheduling, at every scheduling point the task having the shortest deadline is taken up for scheduling. A task is schedule under EDF, if and only if it satisfies the condition that total processor utilization (Ui) due to the task set is less than 1. The Aim of this work is to provide a sensitivity analysis for task deadline context of multiprocessor system by using a new approach of EFDF (Earliest Feasible Deadline First) algorithm. In order to decrease the number of migrations we prevent a job from moving one processor to another processor if it is among them higher priority jobs. Therefore, a job will continue its execution on the same processor if possible (processor affinity). The result of these comparisons outlines some situations where one scheme is preferable over the other. Partitioning schemes are better suited for hard real- time systems, while a global scheme is preferable for soft real-time systems. The final EDF – partitioned scheduling algorithm is following 1.Sort the released jobs by the local algorithm 2.For each job ji in sorted order a) If ji’s home track is available, schedule ji on its home track b) Else, if there is an available free track, schedule ji on the free track c) Else, scan through the tracks r such that ji can be promoted to track r i) If track r is free and there is no released job remaining in the sorted list for home track r, ii) Schedule ji on track r d) Else, delay the execution of ji 3. Minimizing Staleness We call an algorithm eager, or work-conserving, if it leaves no processor idle while at least one pending update exists. We first state the rather-inscrutable Theorem 3.1, followed by an easy-to-read corollary, which implies that for any C < (v3 - 1)/2, there is a constant (dependent on C) such that the staleness of any eager algorithm is at most that constant factor times optimal, provided that each ai is at most Cp/t. Theorem 3.1. Fix p, t. For any and d such that 0 <, d < 1, define C, d = vd (1 - )/v3 > 0. Given p processors and t tables, pick any a such that a/ [1 - a/ (p/t)] = C, d p/t. Then the penalty incurred by an eager algorithm is at most (1 + a) 2(1/ 4) (1/ (1 - d)) times LOW, provided that each ai = a. Since LOW is a lower bound on the staleness achieved by any algorithm, even the optimal, prescient one, and penalty is an upper bound on the staleness achieved by any eager algorithm, the corollary implies the claimed competitiveness Proof: B be the set of batches in this run. For some batch Bi B, let ci be the length of the first update, di be the wait time, and bi be the total length of the batch, i.e., the sum of the lengths of its updates. Clearly, ci = bi = ci + di, (1) since ci = bi is obvious and since ci + di is the duration in time from the start (not release) time of the first job in the batch till the update for the batch starts, and this duration is clearly at least the length bi of the batch. For the penalty of this batch, denoted by i, we take the square of the own time, i.e., the length ci of the first update plus the wait time di plus the processing time of the entire batch: i= [(ci+di) +abi] 2, by the definition of penalty (2) = (1+a) 2(ci+di) 2, by (1). (3) Figure 2 illustrates the quantities bi, ci and di using the same example as that in Figure 1; in particular, we consider the batch consisting of updates arriving at times ri, 1 and ri, 2. Let A be the set of all updates. From the definition of LOW, each update i A has a budget of a2 units, where ai is the length of update i. Our proof requires the use of a charging scheme. A charging scheme specifies what fraction of its budget each update pays to a certain batch. Let us call a batch Bi tardy if ci < (ci + di) (where comes from Theorem 3.1); otherwise it is punctual. Let us denote the corresponding sets by Bt and Bp respectively. More formally, a charging scheme is a matrix (vij) of nonnegative values, where vij shows the extent of dependence of batch i on the budget available to batch j, with the following two properties. Paper ID: 21091302 376
  • 3. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Volume 2 Issue 9, September 2013 www.ijsr.net Figure 2: A plot of the staleness of table i over time 4. Conclusion In this paper, we studied the complexity of scheduling data- loading jobs to minimize the staleness of a real time stream warehouse. We proved that any on-line non-preemptive algorithm that is never voluntarily idle achieves a constant competitive ratio with respect to the total staleness of all tables in the warehouse, provided that the processors are sufficiently fast. We solved the problem of scheduling updates in a real-time streaming warehouse. We projected the notion of averages staleness as a scheduling metric and presented scheduling algorithms designed to handle complex environment of a streaming data warehouse. We then proposed a scheduling framework that assigns jobs to processing tracks and also uses the basic algorithms to schedule jobs within a same. The main feature of our framework is the ability to reserve resources for short jobs that dften correspond to important frequently refreshed tables while avoiding the inefficiencies associated with partitioned scheduling techniques. Feature work is needed for choosing the right scheduling granularity when it is more efficient to update multiple tables together. References [1] L. Golab, T. Johnson, J. S. Seidel and V. Shkapenyuk, Stream Warehousing with Data Depot, SIGMOD 2009, 847-854. [2] L. Golab, T. Johnson, and V. Shkapenyuk, Scheduling Updates in a Real Time Stream Warehouse, ICDE 2009, 1207-1210. [3] W. Labio, R. Yerneni, and H. Garcia-Molina, Shrinking the Warehouse Update Window, SIGMOD1999, 383- 394. [4] N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simits is, and N.-E. Frantzell, Supporting Streaming Updates in an Active Data Warehouse, ICDE 2007, 476-485. [5] Scalable Scheduling of Updates in Streaming Data Warehouses Lukasz Golab, Theodore Johnson and Vladislav Shkapenyuk AT&T Labs – Research, Florham Park, NJ, 0793. [6] N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bel lamkonda, S. Shankar, T. Bozkaya, and L. Sheng, optimizing refresh of a set of materialized views, VLD B 2005, 1043- 1054. [7] L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk Stream warehousing with Data Depot, SIGMOD 2009, 847-854. Paper ID: 21091302 377