SlideShare a Scribd company logo
1 of 16
Identifying Legality of Japanese Online Advertisements
using Complex-valued Support Vector Machine
with DFT-based Document Features
The Graduate School of Arts and Sciences, The Open University of Japan
Satoshi Kawamoto
1
Background of this study
• Issues in Web Advertising
• Problematic expressions
• Violating the Pharmaceutical Affairs Law
• Wording such as "physician endorsed“
• Expressions related to patents
• Needs for a system to determine the validity of advertisements
• As the market expands, manual screening is becoming difficult.
• Benefits of implementing a discriminant model
• Ad-serving companies
• Reduce the manual workload of screening
• Advertisers
• Reduce risk of brand damage
• Media
• Prevent users from leaving
2
Related Work
• Determining Legality(Chinese Advertisements)
• SVM+Weighting of word vectors(Y. Tang et. al.)
• Word weighting using log-frequency ratio(weighted binary vector)
• Highlight words that occur frequently in problematic documents.
• Issues
• No word order information
• It’s unclear why weighting is effective.
• Dependency-based CNN(H. Huang et. al.)
• Word Embedding+Syntactic Structure
• Overall, CNN works better than SVM when categorizing Chinese Ads
• Issues
• Difficulty in tuning parameters
• Requires a relatively large amount of data
3
Definition of problematic advertisements(Prohibited Expressions)
4
• Problematic under the Pharmaceutical Affairs Law
• Restrictions on expressions related to efficacy and safety
• “Fine lines and wrinkles will disappear”, “Anti-aging effect will be obtained”
• Restrictions on efficacy guarantee expressions
• Historical phrases such as "proven to be effective over a period of 100 years“
• Provide examples of clinical data or experimental examples
• Wording that guarantees effectiveness.
• Restrictions on wording regarding ingredients and raw materials
• Without indicating the purpose of the ingredients and raw materials
• Wording that may imply pharmacological effects
• Restrictions on slanderous advertising of other companies' products
• Restrictions on recommendations from pharmaceutical professionals
Words likely to appear in problematic Ads
5
Words related to pharmaceuticals occur here and there
Advertisement containing these words may violate Pharmaceutical Law;
however, just containing these words doesn't mean illegality.
Tang’s weighting
6
In nouns, there are many outliers.
(Problematic documents often include problematic nouns.)
High variance in verbs and nouns
Distribution of log frequency ratios for each part of speech
Where do the words with large appear?
7
start of a sentence end of a sentence
relative word position
word frequency
Words used very frequently in problematic documents
tend to occur near the center of the sentence.
8
start of a sentence end of a sentence
Words that are likely to appear in problematic documents
tend to appear near the start or the center or the end of
the sentence.
There is no significant bias in the location of the words.
Where do the words with large appear?
2. Words with large tend to appear in characteristic locations.
Effective document vector for classification of advertising documents
9
1. Certain words (e.g., medical science) are more likely to appear in problematic documents.
Statistical information is effective
- Likely to appear in specific locations in a sentence
- Some words appear periodically
Word order information and periodical information are effective
Features combining word weighting and discrete Fourier transform
If "word weighting," "word order information," and "period information"
are embedded into document vectors, discriminant models will be able to
categorize advertisements accurately.
How to create DFT-based document vector
10
10
word2vec
Word
weighting
weighted
embedding No rotation
Statistical information(SWEM-Aver)
One rotation
Word-order information
・
・
・
Two rotation
Periodic information
今
(now)
話題
(hot topic)
の
[particle]
ふるさと
(hometown)
納税
(tax payments)
DFT
Random
Projection
Outline of Complex-valued SVM(CV-SVM)
11
Discriminant Function
: Document Vector
Re
Im
Legal Documents
Illegal Documents
: basis function
: bias
Simulation using holdout method
• Data
• Cosmetics Advertisements
• Illegal :3008, Legal :8103
• How to divide the data
• Training(50%)
• Negative examples are downsampled and set to the same number as positive examples.
• Validation(25%)
• Adjust SVM parameters for higher F-measure(RBF kernel)
• Test(25%)
• Evaluation of Discrimination Performance
• Numerical evaluation (Accuracy, Precision, Recall, F-measure)
• Model & Feature
• SVM : SWEM-Aver
• CNN : word2vec
• Complex-valued SVM(CV-SVM) : SWEM-Aver, DFT-Based Feature 12
Balancing Precision and Recall
13
Simulation Results
Word order information and period information improve discrimination performance.
Accuracy improves when
The performance both Precision and Recall is good.
Discussion of simulation results
• Combination of word vector weighting and DFT results in high F-values
• Benefits of weighting
• Higher Accuracy and Precision.
• Benefits of DFT
• Achieve both Precision and Recall at a high level (>0.75)
• Why this simulation result was obtained?
• Position of the words with high is characteristic.
• Tends to appear at the beginning, at the center , or at the end of a sentence
• Word order information is embedded.
• Words with a cycle of about half the sentence length are emphasized.
14
Summary
• Survey of characteristics of advertising documents
• Characteristics of words that appear in problematic documents
• Certain nouns and verbs are more likely to appear.
• There is a large bias in the position of the words.
• Discrimination Simulations
• Word weighting is highly effective.
• Discrimination performance is high when DFT and word weighting are combined.
• CV-SVM can handle complex-valued vectors and has high generalization performance.
• Future work
• Can this model also discriminate non-cosmetic advertisements?
• Is this model effective for general discrimination tasks?
• Need to Compare with more recent models, such as BERT
15
16
Thank you!

More Related Content

What's hot

The Nature Of Patterns
The Nature Of PatternsThe Nature Of Patterns
The Nature Of PatternsNick Harrison
 
Critical perspectives exam
Critical perspectives examCritical perspectives exam
Critical perspectives examVicky Casson
 
Common Method Variance
Common Method Variance Common Method Variance
Common Method Variance Hiệp Phạm
 
Query formulation (chapter 1)
Query formulation (chapter 1)Query formulation (chapter 1)
Query formulation (chapter 1)Mohamed Rafique
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisAhsan Khan Eco (Superior College)
 
Presented a short (50 100-word) response to the topics given below.
Presented a short (50 100-word) response to the topics given below. Presented a short (50 100-word) response to the topics given below.
Presented a short (50 100-word) response to the topics given below. YASHU40
 

What's hot (10)

From measurement model to structural model
From  measurement model to structural modelFrom  measurement model to structural model
From measurement model to structural model
 
The Nature Of Patterns
The Nature Of PatternsThe Nature Of Patterns
The Nature Of Patterns
 
Critical perspectives exam
Critical perspectives examCritical perspectives exam
Critical perspectives exam
 
Qualitative methods
Qualitative methods Qualitative methods
Qualitative methods
 
Common Method Variance
Common Method Variance Common Method Variance
Common Method Variance
 
Query formulation (chapter 1)
Query formulation (chapter 1)Query formulation (chapter 1)
Query formulation (chapter 1)
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysis
 
Presented a short (50 100-word) response to the topics given below.
Presented a short (50 100-word) response to the topics given below. Presented a short (50 100-word) response to the topics given below.
Presented a short (50 100-word) response to the topics given below.
 
Sk ghi (wip) 22052014
Sk ghi (wip) 22052014Sk ghi (wip) 22052014
Sk ghi (wip) 22052014
 
Sampling
SamplingSampling
Sampling
 

Similar to 20211115 jsai international_symposia_slide

Confidential 1450 physician_powerpoint_emr_ver 12-23-08
Confidential 1450 physician_powerpoint_emr_ver 12-23-08Confidential 1450 physician_powerpoint_emr_ver 12-23-08
Confidential 1450 physician_powerpoint_emr_ver 12-23-08Dragon Medical
 
CTC101– College Success Seminar Weekly Journal Assignments
CTC101– College Success Seminar Weekly Journal AssignmentsCTC101– College Success Seminar Weekly Journal Assignments
CTC101– College Success Seminar Weekly Journal AssignmentsMargenePurnell14
 
Kasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeKasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeTEST Huddle
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureDr. Haxel Consult
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSText Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSTECSI FEA USP
 
Raghav_CDM-Exp_4.5yrs
Raghav_CDM-Exp_4.5yrsRaghav_CDM-Exp_4.5yrs
Raghav_CDM-Exp_4.5yrsRaghavendra S
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
Searching the medical literature aug 2010
Searching the medical literature aug 2010Searching the medical literature aug 2010
Searching the medical literature aug 2010Robin Featherstone
 
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docx
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docxPaper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docx
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docxestefana2345678
 
Among the Resources in this module is the Rutherford (2008) articl.docx
Among the Resources in this module is the Rutherford (2008) articl.docxAmong the Resources in this module is the Rutherford (2008) articl.docx
Among the Resources in this module is the Rutherford (2008) articl.docxgreg1eden90113
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniquesAshutosh Garg
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniques2PiRTechnologies
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 

Similar to 20211115 jsai international_symposia_slide (20)

Confidential 1450 physician_powerpoint_emr_ver 12-23-08
Confidential 1450 physician_powerpoint_emr_ver 12-23-08Confidential 1450 physician_powerpoint_emr_ver 12-23-08
Confidential 1450 physician_powerpoint_emr_ver 12-23-08
 
Fypca4
Fypca4Fypca4
Fypca4
 
CTC101– College Success Seminar Weekly Journal Assignments
CTC101– College Success Seminar Weekly Journal AssignmentsCTC101– College Success Seminar Weekly Journal Assignments
CTC101– College Success Seminar Weekly Journal Assignments
 
Kasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than KnowledgeKasper Hanselman - Imagination is More Important Than Knowledge
Kasper Hanselman - Imagination is More Important Than Knowledge
 
Fypca4
Fypca4Fypca4
Fypca4
 
Fypca4
Fypca4Fypca4
Fypca4
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSText Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
 
Raghav_CDM-Exp_4.5yrs
Raghav_CDM-Exp_4.5yrsRaghav_CDM-Exp_4.5yrs
Raghav_CDM-Exp_4.5yrs
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
Searching the medical literature aug 2010
Searching the medical literature aug 2010Searching the medical literature aug 2010
Searching the medical literature aug 2010
 
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docx
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docxPaper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docx
Paper OneLength- 1000- 1200 words- 3-5-4 pages- exclusive of the Work.docx
 
Among the Resources in this module is the Rutherford (2008) articl.docx
Among the Resources in this module is the Rutherford (2008) articl.docxAmong the Resources in this module is the Rutherford (2008) articl.docx
Among the Resources in this module is the Rutherford (2008) articl.docx
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniques
 
Test case design techniques
Test case design techniquesTest case design techniques
Test case design techniques
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 

More from Satoshi Kawamoto

第5章 マルコフ連鎖モンテカルロ法 1
第5章 マルコフ連鎖モンテカルロ法 1第5章 マルコフ連鎖モンテカルロ法 1
第5章 マルコフ連鎖モンテカルロ法 1Satoshi Kawamoto
 
マンガで分かるベイズ統計学勉強会(第3章その2)
マンガで分かるベイズ統計学勉強会(第3章その2)マンガで分かるベイズ統計学勉強会(第3章その2)
マンガで分かるベイズ統計学勉強会(第3章その2)Satoshi Kawamoto
 
マンガで分かるベイズ統計学勉強会(第3章その1)
マンガで分かるベイズ統計学勉強会(第3章その1)マンガで分かるベイズ統計学勉強会(第3章その1)
マンガで分かるベイズ統計学勉強会(第3章その1)Satoshi Kawamoto
 
マンガでわかるベイズ統計学第二章実装Tips(C#)
マンガでわかるベイズ統計学第二章実装Tips(C#)マンガでわかるベイズ統計学第二章実装Tips(C#)
マンガでわかるベイズ統計学第二章実装Tips(C#)Satoshi Kawamoto
 
マンガで分かるベイズ統計学勉強会(第1章+α)
マンガで分かるベイズ統計学勉強会(第1章+α)マンガで分かるベイズ統計学勉強会(第1章+α)
マンガで分かるベイズ統計学勉強会(第1章+α)Satoshi Kawamoto
 

More from Satoshi Kawamoto (16)

第5章 マルコフ連鎖モンテカルロ法 1
第5章 マルコフ連鎖モンテカルロ法 1第5章 マルコフ連鎖モンテカルロ法 1
第5章 マルコフ連鎖モンテカルロ法 1
 
マンガで分かるベイズ統計学勉強会(第3章その2)
マンガで分かるベイズ統計学勉強会(第3章その2)マンガで分かるベイズ統計学勉強会(第3章その2)
マンガで分かるベイズ統計学勉強会(第3章その2)
 
マンガで分かるベイズ統計学勉強会(第3章その1)
マンガで分かるベイズ統計学勉強会(第3章その1)マンガで分かるベイズ統計学勉強会(第3章その1)
マンガで分かるベイズ統計学勉強会(第3章その1)
 
マンガでわかるベイズ統計学第二章実装Tips(C#)
マンガでわかるベイズ統計学第二章実装Tips(C#)マンガでわかるベイズ統計学第二章実装Tips(C#)
マンガでわかるベイズ統計学第二章実装Tips(C#)
 
マンガで分かるベイズ統計学勉強会(第1章+α)
マンガで分かるベイズ統計学勉強会(第1章+α)マンガで分かるベイズ統計学勉強会(第1章+α)
マンガで分かるベイズ統計学勉強会(第1章+α)
 
統計検定3級 5
統計検定3級 5統計検定3級 5
統計検定3級 5
 
統計検定3級 4
統計検定3級 4統計検定3級 4
統計検定3級 4
 
統計検定3級 3
統計検定3級 3統計検定3級 3
統計検定3級 3
 
統計検定3級 2
統計検定3級 2統計検定3級 2
統計検定3級 2
 
統計検定3級 1
統計検定3級 1統計検定3級 1
統計検定3級 1
 
Prml7 7.1
Prml7 7.1Prml7 7.1
Prml7 7.1
 
Prml 4.3.6
Prml 4.3.6Prml 4.3.6
Prml 4.3.6
 
Prml 4.3.5
Prml 4.3.5Prml 4.3.5
Prml 4.3.5
 
Prml 4.1.2
Prml 4.1.2Prml 4.1.2
Prml 4.1.2
 
Prml 4.1.1
Prml 4.1.1Prml 4.1.1
Prml 4.1.1
 
Prml 4
Prml 4Prml 4
Prml 4
 

Recently uploaded

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 

20211115 jsai international_symposia_slide

  • 1. Identifying Legality of Japanese Online Advertisements using Complex-valued Support Vector Machine with DFT-based Document Features The Graduate School of Arts and Sciences, The Open University of Japan Satoshi Kawamoto 1
  • 2. Background of this study • Issues in Web Advertising • Problematic expressions • Violating the Pharmaceutical Affairs Law • Wording such as "physician endorsed“ • Expressions related to patents • Needs for a system to determine the validity of advertisements • As the market expands, manual screening is becoming difficult. • Benefits of implementing a discriminant model • Ad-serving companies • Reduce the manual workload of screening • Advertisers • Reduce risk of brand damage • Media • Prevent users from leaving 2
  • 3. Related Work • Determining Legality(Chinese Advertisements) • SVM+Weighting of word vectors(Y. Tang et. al.) • Word weighting using log-frequency ratio(weighted binary vector) • Highlight words that occur frequently in problematic documents. • Issues • No word order information • It’s unclear why weighting is effective. • Dependency-based CNN(H. Huang et. al.) • Word Embedding+Syntactic Structure • Overall, CNN works better than SVM when categorizing Chinese Ads • Issues • Difficulty in tuning parameters • Requires a relatively large amount of data 3
  • 4. Definition of problematic advertisements(Prohibited Expressions) 4 • Problematic under the Pharmaceutical Affairs Law • Restrictions on expressions related to efficacy and safety • “Fine lines and wrinkles will disappear”, “Anti-aging effect will be obtained” • Restrictions on efficacy guarantee expressions • Historical phrases such as "proven to be effective over a period of 100 years“ • Provide examples of clinical data or experimental examples • Wording that guarantees effectiveness. • Restrictions on wording regarding ingredients and raw materials • Without indicating the purpose of the ingredients and raw materials • Wording that may imply pharmacological effects • Restrictions on slanderous advertising of other companies' products • Restrictions on recommendations from pharmaceutical professionals
  • 5. Words likely to appear in problematic Ads 5 Words related to pharmaceuticals occur here and there Advertisement containing these words may violate Pharmaceutical Law; however, just containing these words doesn't mean illegality. Tang’s weighting
  • 6. 6 In nouns, there are many outliers. (Problematic documents often include problematic nouns.) High variance in verbs and nouns Distribution of log frequency ratios for each part of speech
  • 7. Where do the words with large appear? 7 start of a sentence end of a sentence relative word position word frequency Words used very frequently in problematic documents tend to occur near the center of the sentence.
  • 8. 8 start of a sentence end of a sentence Words that are likely to appear in problematic documents tend to appear near the start or the center or the end of the sentence. There is no significant bias in the location of the words. Where do the words with large appear?
  • 9. 2. Words with large tend to appear in characteristic locations. Effective document vector for classification of advertising documents 9 1. Certain words (e.g., medical science) are more likely to appear in problematic documents. Statistical information is effective - Likely to appear in specific locations in a sentence - Some words appear periodically Word order information and periodical information are effective Features combining word weighting and discrete Fourier transform If "word weighting," "word order information," and "period information" are embedded into document vectors, discriminant models will be able to categorize advertisements accurately.
  • 10. How to create DFT-based document vector 10 10 word2vec Word weighting weighted embedding No rotation Statistical information(SWEM-Aver) One rotation Word-order information ・ ・ ・ Two rotation Periodic information 今 (now) 話題 (hot topic) の [particle] ふるさと (hometown) 納税 (tax payments) DFT Random Projection
  • 11. Outline of Complex-valued SVM(CV-SVM) 11 Discriminant Function : Document Vector Re Im Legal Documents Illegal Documents : basis function : bias
  • 12. Simulation using holdout method • Data • Cosmetics Advertisements • Illegal :3008, Legal :8103 • How to divide the data • Training(50%) • Negative examples are downsampled and set to the same number as positive examples. • Validation(25%) • Adjust SVM parameters for higher F-measure(RBF kernel) • Test(25%) • Evaluation of Discrimination Performance • Numerical evaluation (Accuracy, Precision, Recall, F-measure) • Model & Feature • SVM : SWEM-Aver • CNN : word2vec • Complex-valued SVM(CV-SVM) : SWEM-Aver, DFT-Based Feature 12 Balancing Precision and Recall
  • 13. 13 Simulation Results Word order information and period information improve discrimination performance. Accuracy improves when The performance both Precision and Recall is good.
  • 14. Discussion of simulation results • Combination of word vector weighting and DFT results in high F-values • Benefits of weighting • Higher Accuracy and Precision. • Benefits of DFT • Achieve both Precision and Recall at a high level (>0.75) • Why this simulation result was obtained? • Position of the words with high is characteristic. • Tends to appear at the beginning, at the center , or at the end of a sentence • Word order information is embedded. • Words with a cycle of about half the sentence length are emphasized. 14
  • 15. Summary • Survey of characteristics of advertising documents • Characteristics of words that appear in problematic documents • Certain nouns and verbs are more likely to appear. • There is a large bias in the position of the words. • Discrimination Simulations • Word weighting is highly effective. • Discrimination performance is high when DFT and word weighting are combined. • CV-SVM can handle complex-valued vectors and has high generalization performance. • Future work • Can this model also discriminate non-cosmetic advertisements? • Is this model effective for general discrimination tasks? • Need to Compare with more recent models, such as BERT 15

Editor's Notes

  1. 関連研究として、Tangの指標を紹介すること[TODO]