This document summarizes a research presentation on estimating aggregates over dynamic hidden web databases. It introduces the challenges of estimating aggregates over databases that change frequently, as opposed to static databases. It presents two algorithms for aggregate estimation: REISSUE-ESTIMATOR, which tries to infer how search query answers change between rounds, and RS-ESTIMATOR, which automatically maintains a sample of the database according to how it changes. Experimental results show that RS-ESTIMATOR performs better by adapting the sample as the database evolves.
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Vldb14
1. Aggregate Estimation Over Dynamic
Hidden Web Databases
Presenter: Weimo Liu (The George Washington University)
Joint work with Saravanan Thirumuruganathan (University
of Texas at Arlington), Nan Zhang (The George Washington
University), and Gautam Das (University of Texas at Arlington)
1
2. Outline
Background and Motivation
REISSUE-ESTIMATOR
RS-ESTIMATOR
SYSTEM DESIGN
Experimental Results
Conclusion
2
4. Search Queries vs Aggregate Queries
Search Queries
SELECT * FROM D WHERE ac1 = vc1 &···& acu = vcu
e.g., List 2006 Ford F-150 with 4WD and 5.4L engine in Cargiant’s inventory
Answered by hidden database with top-k restriction
Aggregate Queries
SELECT AGGR(*) FROM D WHERE ac1 = vc1 &···& acu = vcu,
e.g., How many vehicles in Cargiant’s inventory have MPG > 30?
Cannot be answered through the public web interface
Search query
Aggregate query
Web interface
Hidden database
4
5. Challenges
Prior work is over a static hidden database. Problems
exist in the simple approach to tackle the dynamic case
by repeatedly executing (at certain time interval) the
existing “static” algorithms:
Daily limit number of search queries per-IP
Repeated executions waste a lot of search queries
5
6. Outline of Technical Results
Baseline
Repeated executions of existing “static” algorithm [DJJ+10]
Two Algorithms
REISSUE-ESTIMATOR
We try to infer whether and how search query answers received in the
last round change in this round.
RS-ESTIMATOR
Automatically maintains a sample of a database according to how the
database changes.
6
7. Model of Dynamic Hidden Web Databases
Hidden Web Database and Query Interface
A hidden database D with m attributes A1, …, Am. Let Ui be the
domain for attribute Ai. For a tuple t Î D, we use t[Ai] Î Ui to
denote the value of Ai for t.
SELECT * FROM D WHERE Ai1 = ui1 AND … AND Ais = uis
where i1, …, in Î [1, m] and uij Î Uij . Let Sel(q) Î D be the
tuples matching q.
Dynamic Hidden Databases
In most part of the paper, we consider a round-update model
where modifications occur at the beginning instant of each
round.
7
8. Objectives of Aggregate Estimation
In this paper, we consider two types of aggregate
estimation tasks over a dynamic hidden database:
Single-round aggregates
In one round
Average, Count, Sum
Trans-round aggregates
The current ROUND and the previous ROUND
|Di|-|Di-1|
9. Outline
Background and Motivation
REISSUE-ESTIMATOR
RS-ESTIMATOR
SYSTEM DESIGN
Experimental Results
Conclusion
9
15. Key Question: Reissue or Restart?
Example 1 (No change)
The queries issued by REISSUE-ESTIMATOR are always a
subset of those issued by RESTART-ESTIMATOR
16. Key Question: Reissue or Restart?
Example 2 (Total change)
REISSUE-ESTIMATOR might end up performing worse
than RESTART-ESTIMATOR
18. Outline
Background and Motivation
REISSUE-ESTIMATOR
RS-ESTIMATOR
SYSTEM DESIGN
Experimental Results
Conclusion
18
19. Problem of REISSUE-ESTIMATOR
Example (No Change)
One does not need to issue many queries before realizing the
database has changed little, and therefore reallocate the
remaining query budget to initiate new drill downs
Reservoir Sampling [V85]
How much change should happen to the sample being
maintained depends on how much incoming data are inserted
to the database.
24. Outline
Background and Motivation
REISSUE-ESTIMATOR
RS-ESTIMATOR
SYSTEM DESIGN
Experimental Results
Conclusion
24
25. CONCLUSION AND FUTURE WORK
A study of estimating aggregates over
dynamic hidden web databases
Query reissuing
Bootstrapping-based query-plan adjustment
Future Work
A study of how meta data such as COUNT can be used to guide
the design of drill downs in future rounds;
Given a workload of aggregate queries, how to minimize the
total query cost for estimating all of them;
How to leverage both keyword search and form-like search
interfaces provided by many web databases to further improve
the performance of aggregate estimations.
26. References
[DJJ+10]Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan
Zhang, and Gautam Das, Unbiased Estimation of Size and
Other Aggregates Over Hidden Web Databases, in SIGMOD
2010.
[V85] J. S. Vitter, Random sampling with a reservoir. ACM
Trans. Math. Software., 11(1):37–57, Mar. 1985.
26