Identifying Auxiliary Web Images Using Combinations of Analyses

•

0 likes•542 views

Tewson Seeoun

Education Technology

Agenda
● Introduction
● Background
● Document Object Model (DOM) in HTML
● Support Vector Machine (SVM)
● Objective
● Methodology
● Results
● Discussion / Future Work
● Conclusion
Acknowledgement
2
●

Introduction

● Websites contain images.
● Some images are not necessary.
● Search Engine Indexing
● Printing
● Ignoring them is sometimes
economical and green.

3

Background - DOM

● Web browsers / layout engines parse
HTML / CSS / JavaScript into DOM.
● DOM represents things (elements) in a Web page.
● An element has properties (position, size, etc.).
● JavaScript sees DOM.

4

Background - SVM

● SVM is a supervised machine learning algorithm
● SVM is used for statistical pattern recognition.

5

Objective (for now)

To recognize patterns of auxiliary Web images quickly
using DOM analysis and basic image processing

6

Methodology
HTML IMG
PyQtWebKit Python
CSS DOM Files
JS

jQuery
PIL
Page Level Features
OpenCV
Domain Level Features
Tesseract

Labels MySQL Image Level Features
7

Methodology (continued)
● Image Level Features
● No. of Colors
● No. of Human Faces
● No. of Alphabets
● Page Level Features
● Position
● Dimension
● No. of Images with Similar Dimension
● Domain Level Features
External / Internal Links
8
●

Methodology (continued)

MySQL 80% (500/626) Randomly-Selected

SVM (Train)
20%
Model

SVM (Predict)

Results
Results
Results

9

Results

10-fold Cross-Validation (10 Experiments)
Average Accuracy = 84.92%
After Applying Grid-Search Technique
Average Accuracy = 93.17%

10

Discussion
● Some pages cannot be parsed.
● Frames and redirections
● Positions can be miscalculated.
● JavaScript used in displaying images
● CSS sprites
● Tesseract is not well-tuned.
● Small images have to be magnified, but how much?
● Downloading images for processing is a bottleneck.
● Features are not weighted.
The definition of “auxiliary image” is subjective.
11
●

Future Work

● Context Analysis
● Weighed Features
● Adaptive Page Analysis (Website Categorization)
● Techniques Evaluation / Optimization

12

Conclusion

Layout analysis and basic image processing techniques
alone perform well, but the system could be better.

13

Acknowledgement

● NSTDA, NECTEC, and YSTP program
● Dr. Choochart Haruechaiyasak
● Dr. Toshiaki Kondo
● Mr. Krikamol Muendet
● And Many Others...

14

Similar to Identifying Auxiliary Web Images Using Combinations of Analyses

Architectures For Scaling Ajaxwolframkriesing

Dconrails Gecco PresentationJuan J. Merelo

JS Single-Page Web App EssentialsSergey Bolshchikov

Building assets on the fly with Node.jsAcquisio

Os Solomonoscon2007

Super Sizing Youtube with Pythondidip

Iasi code camp 12 october 2013 responsive images in the wild-vlad zelinschiCodecamp Romania

Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017VisageCloud

InfoEducatie - Face Recognition ArchitectureBogdan Bocse

Coding the UIMark Meeker

Coding Uirajivmordani

Everything is Awesome - Cutting the Corners off the WebJames Rakich

Strata London - Deep Learning 05-2015Turi, Inc.

Yurii Pashchenko: Unlocking the potential of Segment Anything Model (UA)Lviv Startup Club

CIKB - Software Architecture Analysis DesignAntonio Castellon

Talk Paris Infovis 091207132953 Phpapp01(2)johnnybiz

Using Web Standards to create Interactive Data Visualizations for the Webphilogb

Performance on a budgetDimitry Ushakov

20080611accelJeff Hammerbacher

Asp.Net MVC3 - BasicsSaravanan Subburayal

Similar to Identifying Auxiliary Web Images Using Combinations of Analyses (20)

Architectures For Scaling Ajax

Dconrails Gecco Presentation

JS Single-Page Web App Essentials

Building assets on the fly with Node.js

Os Solomon

Super Sizing Youtube with Python

Iasi code camp 12 october 2013 responsive images in the wild-vlad zelinschi

Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017

InfoEducatie - Face Recognition Architecture

Coding the UI

Coding Ui

Everything is Awesome - Cutting the Corners off the Web

Strata London - Deep Learning 05-2015

Yurii Pashchenko: Unlocking the potential of Segment Anything Model (UA)

CIKB - Software Architecture Analysis Design

Talk Paris Infovis 091207132953 Phpapp01(2)

Using Web Standards to create Interactive Data Visualizations for the Web

Performance on a budget

20080611accel

Asp.Net MVC3 - Basics

Recently uploaded

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

MENTAL STATUS EXAMINATION format.docxPoojaSen20

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

Accessible design: Minimum effort, maximum impactdawncurless

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Presiding Officer Training module 2024 lok sabha electionsanshu789521

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Software Engineering Methodologies (overview)eniolaolutunde

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

Paris 2024 Olympic Geographies - an activityGeoBlogs

The Most Excellent Way | 1 Corinthians 13Steve Thomason

APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management

PSYCHIATRIC History collection FORMAT.pptxPoojaSen20

Arihant handbook biology for class 11 .pdfchloefrazer622

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

Recently uploaded (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝

Solving Puzzles Benefits Everyone (English).pptx

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

Call Girls in Dwarka Mor Delhi Contact Us 9654467111

MENTAL STATUS EXAMINATION format.docx

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Separation of Lanthanides/ Lanthanides and Actinides

Accessible design: Minimum effort, maximum impact

Staff of Color (SOC) Retention Efforts DDSD

Presiding Officer Training module 2024 lok sabha elections

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991

Grant Readiness 101 TechSoup and Remy Consulting

Software Engineering Methodologies (overview)

Código Creativo y Arte de Software | Unidad 1

Paris 2024 Olympic Geographies - an activity

The Most Excellent Way | 1 Corinthians 13

APM Welcome, APM North West Network Conference, Synergies Across Sectors

PSYCHIATRIC History collection FORMAT.pptx

Arihant handbook biology for class 11 .pdf

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

Identifying Auxiliary Web Images Using Combinations of Analyses

1. Identifying Auxiliary Web Images Using Combination of Analyses Tewson Seeoun Sirindhorn International Institute of Technology With Guidance From Asst. Prof. Dr. Toshiaki Kondo Sirindhorn International Institute of Technology Dr. Choochart Haruechaiyasak Human Language Technology Laboratory, NECTEC, NSTDA

2. Agenda ● Introduction ● Background ● Document Object Model (DOM) in HTML ● Support Vector Machine (SVM) ● Objective ● Methodology ● Results ● Discussion / Future Work ● Conclusion Acknowledgement 2 ●

3. Introduction ● Websites contain images. ● Some images are not necessary. ● Search Engine Indexing ● Printing ● Ignoring them is sometimes economical and green. 3

4. Background - DOM ● Web browsers / layout engines parse HTML / CSS / JavaScript into DOM. ● DOM represents things (elements) in a Web page. ● An element has properties (position, size, etc.). ● JavaScript sees DOM. 4

5. Background - SVM ● SVM is a supervised machine learning algorithm ● SVM is used for statistical pattern recognition. 5

6. Objective (for now) To recognize patterns of auxiliary Web images quickly using DOM analysis and basic image processing 6

7. Methodology HTML IMG PyQtWebKit Python CSS DOM Files JS jQuery PIL Page Level Features OpenCV Domain Level Features Tesseract Labels MySQL Image Level Features 7

8. Methodology (continued) ● Image Level Features ● No. of Colors ● No. of Human Faces ● No. of Alphabets ● Page Level Features ● Position ● Dimension ● No. of Images with Similar Dimension ● Domain Level Features External / Internal Links 8 ●

9. Methodology (continued) MySQL 80% (500/626) Randomly-Selected SVM (Train) 20% Model SVM (Predict) Results Results Results 9

10. Results 10-fold Cross-Validation (10 Experiments) Average Accuracy = 84.92% After Applying Grid-Search Technique Average Accuracy = 93.17% 10

11. Discussion ● Some pages cannot be parsed. ● Frames and redirections ● Positions can be miscalculated. ● JavaScript used in displaying images ● CSS sprites ● Tesseract is not well-tuned. ● Small images have to be magnified, but how much? ● Downloading images for processing is a bottleneck. ● Features are not weighted. The definition of “auxiliary image” is subjective. 11 ●

12. Future Work ● Context Analysis ● Weighed Features ● Adaptive Page Analysis (Website Categorization) ● Techniques Evaluation / Optimization 12

13. Conclusion Layout analysis and basic image processing techniques alone perform well, but the system could be better. 13

14. Acknowledgement ● NSTDA, NECTEC, and YSTP program ● Dr. Choochart Haruechaiyasak ● Dr. Toshiaki Kondo ● Mr. Krikamol Muendet ● And Many Others... 14

Identifying Auxiliary Web Images Using Combinations of Analyses

Recommended

Recommended

More Related Content

Similar to Identifying Auxiliary Web Images Using Combinations of Analyses

Similar to Identifying Auxiliary Web Images Using Combinations of Analyses (20)

Recently uploaded

Recently uploaded (20)

Identifying Auxiliary Web Images Using Combinations of Analyses