SlideShare a Scribd company logo
1 of 27
Aggregate Estimation Over Dynamic 
Hidden Web Databases 
Presenter: Weimo Liu (The George Washington University) 
Joint work with Saravanan Thirumuruganathan (University 
of Texas at Arlington), Nan Zhang (The George Washington 
University), and Gautam Das (University of Texas at Arlington) 
1
Outline 
 Background and Motivation 
 REISSUE-ESTIMATOR 
 RS-ESTIMATOR 
 SYSTEM DESIGN 
 Experimental Results 
 Conclusion 
2
Hidden Databases: Used Car Inventory 
 Form-like interface 
 Return top-k tuples 
3
Search Queries vs Aggregate Queries 
 Search Queries 
 SELECT * FROM D WHERE ac1 = vc1 &···& acu = vcu 
 e.g., List 2006 Ford F-150 with 4WD and 5.4L engine in Cargiant’s inventory 
 Answered by hidden database with top-k restriction 
 Aggregate Queries 
 SELECT AGGR(*) FROM D WHERE ac1 = vc1 &···& acu = vcu, 
 e.g., How many vehicles in Cargiant’s inventory have MPG > 30? 
 Cannot be answered through the public web interface 
Search query 
Aggregate query 
Web interface 
Hidden database 
4
Challenges 
 Prior work is over a static hidden database. Problems 
exist in the simple approach to tackle the dynamic case 
by repeatedly executing (at certain time interval) the 
existing “static” algorithms: 
 Daily limit number of search queries per-IP 
 Repeated executions waste a lot of search queries 
5
Outline of Technical Results 
 Baseline 
 Repeated executions of existing “static” algorithm [DJJ+10] 
 Two Algorithms 
 REISSUE-ESTIMATOR 
 We try to infer whether and how search query answers received in the 
last round change in this round. 
 RS-ESTIMATOR 
 Automatically maintains a sample of a database according to how the 
database changes. 
6
Model of Dynamic Hidden Web Databases 
 Hidden Web Database and Query Interface 
 A hidden database D with m attributes A1, …, Am. Let Ui be the 
domain for attribute Ai. For a tuple t Î D, we use t[Ai] Î Ui to 
denote the value of Ai for t. 
 SELECT * FROM D WHERE Ai1 = ui1 AND … AND Ais = uis 
where i1, …, in Î [1, m] and uij Î Uij . Let Sel(q) Î D be the 
tuples matching q. 
 Dynamic Hidden Databases 
 In most part of the paper, we consider a round-update model 
where modifications occur at the beginning instant of each 
round. 
7
Objectives of Aggregate Estimation 
 In this paper, we consider two types of aggregate 
estimation tasks over a dynamic hidden database: 
 Single-round aggregates 
 In one round 
 Average, Count, Sum 
 Trans-round aggregates 
 The current ROUND and the previous ROUND 
 |Di|-|Di-1|
Outline 
 Background and Motivation 
 REISSUE-ESTIMATOR 
 RS-ESTIMATOR 
 SYSTEM DESIGN 
 Experimental Results 
 Conclusion 
9
Query Reissuing for Multiple Rounds
The Initial Round
Valid to Overflow
Valid to Underflow (1)
Valid to Underflow (2)
Key Question: Reissue or Restart? 
 Example 1 (No change) 
 The queries issued by REISSUE-ESTIMATOR are always a 
subset of those issued by RESTART-ESTIMATOR
Key Question: Reissue or Restart? 
 Example 2 (Total change) 
 REISSUE-ESTIMATOR might end up performing worse 
than RESTART-ESTIMATOR
Algorithm REISSUE-ESTIMATOR
Outline 
 Background and Motivation 
 REISSUE-ESTIMATOR 
 RS-ESTIMATOR 
 SYSTEM DESIGN 
 Experimental Results 
 Conclusion 
18
Problem of REISSUE-ESTIMATOR 
 Example (No Change) 
 One does not need to issue many queries before realizing the 
database has changed little, and therefore reallocate the 
remaining query budget to initiate new drill downs 
 Reservoir Sampling [V85] 
 How much change should happen to the sample being 
maintained depends on how much incoming data are inserted 
to the database.
Key Ideas of RS-ESTIMATOR
Algorithm RS-ESTIMATOR
Outline 
 Background and Motivation 
 REISSUE-ESTIMATOR 
 RS-ESTIMATOR 
 SYSTEM DESIGN 
 Experimental Results 
 Conclusion 
22
Experimental Results
Outline 
 Background and Motivation 
 REISSUE-ESTIMATOR 
 RS-ESTIMATOR 
 SYSTEM DESIGN 
 Experimental Results 
 Conclusion 
24
CONCLUSION AND FUTURE WORK 
A study of estimating aggregates over 
dynamic hidden web databases 
 Query reissuing 
 Bootstrapping-based query-plan adjustment 
Future Work 
 A study of how meta data such as COUNT can be used to guide 
the design of drill downs in future rounds; 
 Given a workload of aggregate queries, how to minimize the 
total query cost for estimating all of them; 
 How to leverage both keyword search and form-like search 
interfaces provided by many web databases to further improve 
the performance of aggregate estimations.
References 
 [DJJ+10]Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan 
Zhang, and Gautam Das, Unbiased Estimation of Size and 
Other Aggregates Over Hidden Web Databases, in SIGMOD 
2010. 
 [V85] J. S. Vitter, Random sampling with a reservoir. ACM 
Trans. Math. Software., 11(1):37–57, Mar. 1985. 
26
THANK YOU 
27

More Related Content

What's hot

Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)Alexey Kovyazin
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsNatasha Mandal
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6Mahesh Vallampati
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship PresentationCharles Naut
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationIJERA Editor
 
Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query OptimizationNiraj Gandha
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Bond Graph of a One Stage Reduction Gearbox
Bond Graph of a One Stage Reduction GearboxBond Graph of a One Stage Reduction Gearbox
Bond Graph of a One Stage Reduction GearboxGehendra Sharma
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimizationSaranya Natarajan
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenEdureka!
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query OptimizationRavinder Kamboj
 

What's hot (20)

Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial Operations
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6
 
Dfg & sg ppt (1)
Dfg & sg ppt (1)Dfg & sg ppt (1)
Dfg & sg ppt (1)
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship Presentation
 
Query trees
Query treesQuery trees
Query trees
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix Multiplication
 
Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query Optimization
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Bond Graph of a One Stage Reduction Gearbox
Bond Graph of a One Stage Reduction GearboxBond Graph of a One Stage Reduction Gearbox
Bond Graph of a One Stage Reduction Gearbox
 
Agreggates i
Agreggates iAgreggates i
Agreggates i
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimization
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query Optimization
 

Viewers also liked

Lookbook ss13 lmc-updated showrooms
Lookbook ss13 lmc-updated showroomsLookbook ss13 lmc-updated showrooms
Lookbook ss13 lmc-updated showroomsSteven Bonamassa
 
Bonamassa New York Lookbook SS 2012
Bonamassa New York Lookbook SS 2012 Bonamassa New York Lookbook SS 2012
Bonamassa New York Lookbook SS 2012 Steven Bonamassa
 
Bod & Christensen Men's lookbook ss13 email
Bod & Christensen Men's lookbook ss13 emailBod & Christensen Men's lookbook ss13 email
Bod & Christensen Men's lookbook ss13 emailSteven Bonamassa
 
Scott Moore relevant accomplishments bio
Scott Moore relevant accomplishments bioScott Moore relevant accomplishments bio
Scott Moore relevant accomplishments bioScott Moore
 
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...GreenLabAtDI
 
La Pina fall 2013 lookbook (resized)
La Pina fall 2013 lookbook (resized)La Pina fall 2013 lookbook (resized)
La Pina fall 2013 lookbook (resized)Steven Bonamassa
 
Bonamassa New York Lookbook FW 2011
Bonamassa New York Lookbook FW 2011Bonamassa New York Lookbook FW 2011
Bonamassa New York Lookbook FW 2011Steven Bonamassa
 
LaMarque Collection Mens Lookbook SS 2014
LaMarque Collection Mens Lookbook SS 2014LaMarque Collection Mens Lookbook SS 2014
LaMarque Collection Mens Lookbook SS 2014Steven Bonamassa
 
Bergdorf goodman men's store
Bergdorf goodman men's storeBergdorf goodman men's store
Bergdorf goodman men's storeSteven Bonamassa
 
LaMarque Collection lookbook-
LaMarque Collection lookbook-LaMarque Collection lookbook-
LaMarque Collection lookbook-Steven Bonamassa
 
The Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionThe Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionGreenLabAtDI
 
LaMarque Collection Mens lookbook fw 2014
LaMarque Collection Mens lookbook fw 2014LaMarque Collection Mens lookbook fw 2014
LaMarque Collection Mens lookbook fw 2014Steven Bonamassa
 

Viewers also liked (18)

Lookbook ss13 lmc-updated showrooms
Lookbook ss13 lmc-updated showroomsLookbook ss13 lmc-updated showrooms
Lookbook ss13 lmc-updated showrooms
 
Bonamassa New York Lookbook SS 2012
Bonamassa New York Lookbook SS 2012 Bonamassa New York Lookbook SS 2012
Bonamassa New York Lookbook SS 2012
 
Bod & Christensen Men's lookbook ss13 email
Bod & Christensen Men's lookbook ss13 emailBod & Christensen Men's lookbook ss13 email
Bod & Christensen Men's lookbook ss13 email
 
Scott Moore relevant accomplishments bio
Scott Moore relevant accomplishments bioScott Moore relevant accomplishments bio
Scott Moore relevant accomplishments bio
 
Susan Rep SS 2013
Susan Rep SS 2013Susan Rep SS 2013
Susan Rep SS 2013
 
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...
Haskell in Green Land: Analyzing the Energy Behavior of a Purely Functional L...
 
La Pina fall 2013 lookbook (resized)
La Pina fall 2013 lookbook (resized)La Pina fall 2013 lookbook (resized)
La Pina fall 2013 lookbook (resized)
 
Bonamassa New York Lookbook FW 2011
Bonamassa New York Lookbook FW 2011Bonamassa New York Lookbook FW 2011
Bonamassa New York Lookbook FW 2011
 
Susan rep SS 2014
Susan rep SS 2014Susan rep SS 2014
Susan rep SS 2014
 
Payroll
PayrollPayroll
Payroll
 
Lamarque lookbook women
Lamarque lookbook womenLamarque lookbook women
Lamarque lookbook women
 
LaMarque Collection Mens Lookbook SS 2014
LaMarque Collection Mens Lookbook SS 2014LaMarque Collection Mens Lookbook SS 2014
LaMarque Collection Mens Lookbook SS 2014
 
Bergdorf goodman men's store
Bergdorf goodman men's storeBergdorf goodman men's store
Bergdorf goodman men's store
 
Digital i o
Digital i oDigital i o
Digital i o
 
LaMarque Collection lookbook-
LaMarque Collection lookbook-LaMarque Collection lookbook-
LaMarque Collection lookbook-
 
C language
C languageC language
C language
 
The Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionThe Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy Consumption
 
LaMarque Collection Mens lookbook fw 2014
LaMarque Collection Mens lookbook fw 2014LaMarque Collection Mens lookbook fw 2014
LaMarque Collection Mens lookbook fw 2014
 

Similar to Vldb14

E132833
E132833E132833
E132833irjes
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Peter Tröger
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima Pratima Pandey
 
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic ReasoningHybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic ReasoningHassan Rifky
 
Dynamic_Cloud_Application_Redistribution_Performance_Optimization
Dynamic_Cloud_Application_Redistribution_Performance_OptimizationDynamic_Cloud_Application_Redistribution_Performance_Optimization
Dynamic_Cloud_Application_Redistribution_Performance_OptimizationSantiago Gómez Sáez
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Jiaheng Lu
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic rankingFELIX75
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithmsFarhan Zaki
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingDhaval Thakker
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 

Similar to Vldb14 (20)

E132833
E132833E132833
E132833
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic ReasoningHybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
 
Dynamic_Cloud_Application_Redistribution_Performance_Optimization
Dynamic_Cloud_Application_Redistribution_Performance_OptimizationDynamic_Cloud_Application_Redistribution_Performance_Optimization
Dynamic_Cloud_Application_Redistribution_Performance_Optimization
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories Benchmarking
 
Chapter15
Chapter15Chapter15
Chapter15
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 

Recently uploaded

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

Vldb14

  • 1. Aggregate Estimation Over Dynamic Hidden Web Databases Presenter: Weimo Liu (The George Washington University) Joint work with Saravanan Thirumuruganathan (University of Texas at Arlington), Nan Zhang (The George Washington University), and Gautam Das (University of Texas at Arlington) 1
  • 2. Outline  Background and Motivation  REISSUE-ESTIMATOR  RS-ESTIMATOR  SYSTEM DESIGN  Experimental Results  Conclusion 2
  • 3. Hidden Databases: Used Car Inventory  Form-like interface  Return top-k tuples 3
  • 4. Search Queries vs Aggregate Queries  Search Queries  SELECT * FROM D WHERE ac1 = vc1 &···& acu = vcu  e.g., List 2006 Ford F-150 with 4WD and 5.4L engine in Cargiant’s inventory  Answered by hidden database with top-k restriction  Aggregate Queries  SELECT AGGR(*) FROM D WHERE ac1 = vc1 &···& acu = vcu,  e.g., How many vehicles in Cargiant’s inventory have MPG > 30?  Cannot be answered through the public web interface Search query Aggregate query Web interface Hidden database 4
  • 5. Challenges  Prior work is over a static hidden database. Problems exist in the simple approach to tackle the dynamic case by repeatedly executing (at certain time interval) the existing “static” algorithms:  Daily limit number of search queries per-IP  Repeated executions waste a lot of search queries 5
  • 6. Outline of Technical Results  Baseline  Repeated executions of existing “static” algorithm [DJJ+10]  Two Algorithms  REISSUE-ESTIMATOR  We try to infer whether and how search query answers received in the last round change in this round.  RS-ESTIMATOR  Automatically maintains a sample of a database according to how the database changes. 6
  • 7. Model of Dynamic Hidden Web Databases  Hidden Web Database and Query Interface  A hidden database D with m attributes A1, …, Am. Let Ui be the domain for attribute Ai. For a tuple t Î D, we use t[Ai] Î Ui to denote the value of Ai for t.  SELECT * FROM D WHERE Ai1 = ui1 AND … AND Ais = uis where i1, …, in Î [1, m] and uij Î Uij . Let Sel(q) Î D be the tuples matching q.  Dynamic Hidden Databases  In most part of the paper, we consider a round-update model where modifications occur at the beginning instant of each round. 7
  • 8. Objectives of Aggregate Estimation  In this paper, we consider two types of aggregate estimation tasks over a dynamic hidden database:  Single-round aggregates  In one round  Average, Count, Sum  Trans-round aggregates  The current ROUND and the previous ROUND  |Di|-|Di-1|
  • 9. Outline  Background and Motivation  REISSUE-ESTIMATOR  RS-ESTIMATOR  SYSTEM DESIGN  Experimental Results  Conclusion 9
  • 10. Query Reissuing for Multiple Rounds
  • 15. Key Question: Reissue or Restart?  Example 1 (No change)  The queries issued by REISSUE-ESTIMATOR are always a subset of those issued by RESTART-ESTIMATOR
  • 16. Key Question: Reissue or Restart?  Example 2 (Total change)  REISSUE-ESTIMATOR might end up performing worse than RESTART-ESTIMATOR
  • 18. Outline  Background and Motivation  REISSUE-ESTIMATOR  RS-ESTIMATOR  SYSTEM DESIGN  Experimental Results  Conclusion 18
  • 19. Problem of REISSUE-ESTIMATOR  Example (No Change)  One does not need to issue many queries before realizing the database has changed little, and therefore reallocate the remaining query budget to initiate new drill downs  Reservoir Sampling [V85]  How much change should happen to the sample being maintained depends on how much incoming data are inserted to the database.
  • 20. Key Ideas of RS-ESTIMATOR
  • 22. Outline  Background and Motivation  REISSUE-ESTIMATOR  RS-ESTIMATOR  SYSTEM DESIGN  Experimental Results  Conclusion 22
  • 24. Outline  Background and Motivation  REISSUE-ESTIMATOR  RS-ESTIMATOR  SYSTEM DESIGN  Experimental Results  Conclusion 24
  • 25. CONCLUSION AND FUTURE WORK A study of estimating aggregates over dynamic hidden web databases  Query reissuing  Bootstrapping-based query-plan adjustment Future Work  A study of how meta data such as COUNT can be used to guide the design of drill downs in future rounds;  Given a workload of aggregate queries, how to minimize the total query cost for estimating all of them;  How to leverage both keyword search and form-like search interfaces provided by many web databases to further improve the performance of aggregate estimations.
  • 26. References  [DJJ+10]Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan Zhang, and Gautam Das, Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases, in SIGMOD 2010.  [V85] J. S. Vitter, Random sampling with a reservoir. ACM Trans. Math. Software., 11(1):37–57, Mar. 1985. 26