SlideShare a Scribd company logo
1 of 20
BY
K.RAJASEKHAR REDDY
        (08Q61A0528)
Contents:
 Introduction
 Wrappers
 Clustering
 System Description
 Working
 Types
 Advantages and Disadvantages
 Conclusion
Introduction:
STAVIES is a system for
 Information Extraction through
 Automatic Web Wrapper Using
 clustering Techniques.
STAVIES is used in:
 Automatic Information Discovery.


 Extraction of structured web data.
WRAPPERS
 Piece of software to extract the
 useful information from web data
 sources.

 Data extracted is referred as Structural
 Tokens.
Categories of Wrappers:
 Site Specific:
    Extracts information from a web
 pages
    or family of web pages.
 Generic wrappers:
   Can be applied to almost any page
   regardless of the structures.
CLUSTERING
Process of recognizing input data
 set in such a way that data points in
 same cluster are similar other than
 in different clusters.
Quality Evaluation Measures:
 Cluster Compactness:
 Evaluates how the subsets of input are redistributed
 by clustering system, compared with whole input set.
 Cluster Separation:
 Indicates overall dissimilarity among the output
 clusters.
System Description
 Two modules


     1.Transformation module

     2.Extraction module
Phases:
 Preparation Phase:
   1.Validation correction and XHTML
   generation.

    2.Tree transformation and Terminal
      node selecton
• Segmentation Phase:
   1. Nodes Comparison.

   2. Hierarchical clustering.

   3. Cluster Evaluation and Target area
      Discover.

   4. Boundary selection.
• Information Retrieval Phase:

    1. Information Extraction component.
Working:
Experimental Results:
Types:
 OMINI



 MDR
Advantages:
 Executes in less than 0.4 sec.


 No human assistance is required.


 High performance.
Disadvantage:
 Hard to implement in free texts and
 non-template pages.
Conclusion
 STAVIES saves precious time and effort.
 Tested successfully in more than 63,000
  HTML pages from 50 different web
 data sources.
THANK YOU.
Queries????

More Related Content

Similar to stavies

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicityhannonhill
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering enginesYash Darak
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingIRJET Journal
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESMykola Novik
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing pptDC Graphics
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarkingniallmilton
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsSaifur Rahman
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsDenodo
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 

Similar to stavies (20)

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicity
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Clusters
ClustersClusters
Clusters
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering engines
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault Mapping
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing ppt
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
50120140504006
5012014050400650120140504006
50120140504006
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarking
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate Functions
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large Deployments
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
cluster computing
cluster computingcluster computing
cluster computing
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Promostat original
Promostat originalPromostat original
Promostat original
 

More from Akhil Kumar

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solidsAkhil Kumar
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solidsAkhil Kumar
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planesAkhil Kumar
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of linesAkhil Kumar
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projectionAkhil Kumar
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersectionAkhil Kumar
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen methodAkhil Kumar
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solidsAkhil Kumar
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problemAkhil Kumar
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)Akhil Kumar
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authenticationAkhil Kumar
 

More from Akhil Kumar (20)

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solids
 
Edp scales
Edp  scalesEdp  scales
Edp scales
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solids
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planes
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of lines
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projection
 
Edp isometric
Edp  isometricEdp  isometric
Edp isometric
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersection
 
Edp excerciseeg
Edp  excerciseegEdp  excerciseeg
Edp excerciseeg
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen method
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solids
 
Edp curves2
Edp  curves2Edp  curves2
Edp curves2
 
Edp curve1
Edp  curve1Edp  curve1
Edp curve1
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problem
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authentication
 
yii framework
yii frameworkyii framework
yii framework
 
cloud computing
cloud computingcloud computing
cloud computing
 
WORDPRESS
WORDPRESSWORDPRESS
WORDPRESS
 
AJAX
AJAXAJAX
AJAX
 

Recently uploaded

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

stavies