SlideShare a Scribd company logo
1 of 20
BY
K.RAJASEKHAR REDDY
        (08Q61A0528)
Contents:
 Introduction
 Wrappers
 Clustering
 System Description
 Working
 Types
 Advantages and Disadvantages
 Conclusion
Introduction:
STAVIES is a system for
 Information Extraction through
 Automatic Web Wrapper Using
 clustering Techniques.
STAVIES is used in:
 Automatic Information Discovery.


 Extraction of structured web data.
WRAPPERS
 Piece of software to extract the
 useful information from web data
 sources.

 Data extracted is referred as Structural
 Tokens.
Categories of Wrappers:
 Site Specific:
    Extracts information from a web
 pages
    or family of web pages.
 Generic wrappers:
   Can be applied to almost any page
   regardless of the structures.
CLUSTERING
Process of recognizing input data
 set in such a way that data points in
 same cluster are similar other than
 in different clusters.
Quality Evaluation Measures:
 Cluster Compactness:
 Evaluates how the subsets of input are redistributed
 by clustering system, compared with whole input set.
 Cluster Separation:
 Indicates overall dissimilarity among the output
 clusters.
System Description
 Two modules


     1.Transformation module

     2.Extraction module
Phases:
 Preparation Phase:
   1.Validation correction and XHTML
   generation.

    2.Tree transformation and Terminal
      node selecton
• Segmentation Phase:
   1. Nodes Comparison.

   2. Hierarchical clustering.

   3. Cluster Evaluation and Target area
      Discover.

   4. Boundary selection.
• Information Retrieval Phase:

    1. Information Extraction component.
Working:
Experimental Results:
Types:
 OMINI



 MDR
Advantages:
 Executes in less than 0.4 sec.


 No human assistance is required.


 High performance.
Disadvantage:
 Hard to implement in free texts and
 non-template pages.
Conclusion
 STAVIES saves precious time and effort.
 Tested successfully in more than 63,000
  HTML pages from 50 different web
 data sources.
THANK YOU.
Queries????

More Related Content

Similar to stavies

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicityhannonhill
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering enginesYash Darak
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingIRJET Journal
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESMykola Novik
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing pptDC Graphics
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarkingniallmilton
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsSaifur Rahman
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsDenodo
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 

Similar to stavies (20)

Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicity
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Clusters
ClustersClusters
Clusters
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering engines
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Automatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault MappingAutomatic Analyzing System for Packet Testing and Fault Mapping
Automatic Analyzing System for Packet Testing and Fault Mapping
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
 
Cluster computing ppt
Cluster computing pptCluster computing ppt
Cluster computing ppt
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
50120140504006
5012014050400650120140504006
50120140504006
 
Cassandra Applications Benchmarking
Cassandra Applications BenchmarkingCassandra Applications Benchmarking
Cassandra Applications Benchmarking
 
Database Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate FunctionsDatabase Analysis, OLAP, Aggregate Functions
Database Analysis, OLAP, Aggregate Functions
 
Data Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large DeploymentsData Virtualization Deployments: How to Manage Very Large Deployments
Data Virtualization Deployments: How to Manage Very Large Deployments
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
cluster computing
cluster computingcluster computing
cluster computing
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Promostat original
Promostat originalPromostat original
Promostat original
 

More from Akhil Kumar

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solidsAkhil Kumar
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solidsAkhil Kumar
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planesAkhil Kumar
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of linesAkhil Kumar
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projectionAkhil Kumar
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersectionAkhil Kumar
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen methodAkhil Kumar
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solidsAkhil Kumar
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problemAkhil Kumar
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)Akhil Kumar
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authenticationAkhil Kumar
 

More from Akhil Kumar (20)

Edp section of solids
Edp  section of solidsEdp  section of solids
Edp section of solids
 
Edp scales
Edp  scalesEdp  scales
Edp scales
 
Edp projection of solids
Edp  projection of solidsEdp  projection of solids
Edp projection of solids
 
Edp projection of planes
Edp  projection of planesEdp  projection of planes
Edp projection of planes
 
Edp projection of lines
Edp  projection of linesEdp  projection of lines
Edp projection of lines
 
Edp ortographic projection
Edp  ortographic projectionEdp  ortographic projection
Edp ortographic projection
 
Edp isometric
Edp  isometricEdp  isometric
Edp isometric
 
Edp intersection
Edp  intersectionEdp  intersection
Edp intersection
 
Edp excerciseeg
Edp  excerciseegEdp  excerciseeg
Edp excerciseeg
 
Edp ellipse by gen method
Edp  ellipse by gen methodEdp  ellipse by gen method
Edp ellipse by gen method
 
Edp development of surfaces of solids
Edp  development of surfaces of solidsEdp  development of surfaces of solids
Edp development of surfaces of solids
 
Edp curves2
Edp  curves2Edp  curves2
Edp curves2
 
Edp curve1
Edp  curve1Edp  curve1
Edp curve1
 
Edp typical problem
Edp  typical problemEdp  typical problem
Edp typical problem
 
Edp st line(new)
Edp  st line(new)Edp  st line(new)
Edp st line(new)
 
graphical password authentication
graphical password authenticationgraphical password authentication
graphical password authentication
 
yii framework
yii frameworkyii framework
yii framework
 
cloud computing
cloud computingcloud computing
cloud computing
 
WORDPRESS
WORDPRESSWORDPRESS
WORDPRESS
 
AJAX
AJAXAJAX
AJAX
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

stavies