SlideShare a Scribd company logo
1 of 9
“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
EXTRA and FRANCIS
Stuart Myles * Associated Press * 24th April 2018
© 2018 IPTC (www.iptc.org) All rights reserved
https://flic.kr/p/fBshW3
https://flic.kr/p/atFSAr
Rules-Based Classification
• Rules better for breaking news than statistical methods
– You don’t need 50 examples before you can start tagging
– A rule for a new topic doesn’t require other rules to change
• More consistent and scalable than hand tagging
• Easier to explain why rules classify content
– Machine learning methods can be “black boxes”
– Easier to precisely explain - and correct - mistakes
© 2018 IPTC (www.iptc.org) All rights reserved 3
EXTRA
EXTraction Rules Apparatus
Rules-based classification of text
Open source software https://iptc.github.io/extra/
EXTRA was developed by the IPTC
€50,000 Grant from the Digital News Initiative
https://www.digitalnewsinitiative.com/fund/
You can use your own taxonomy, rules and formats
- Example rules help us drive development of the EXTRA system
- You can use the example rules to see how to develop your own
- Rules could apply IPTC Media Topics or any other taxonomy
© 2018 IPTC (www.iptc.org) All rights reserved 4
Development Process
The EXTRA software was developed by Infalia
- All software is open source
Two linguists creating rules in English and German
- Samples rules to apply IPTC Media Topics
Example news corpora licensed for EXTRA
- English from Thomson Reuters
- German from APA
© 2018 IPTC (www.iptc.org) All rights reserved 5
EXTRA Components
Elasticsearch
Percolator
+ Custom
Code
Classification
Rule
authoring
Corpus
Testing
Schema
Management
© 2018 IPTC (www.iptc.org) All rights reserved 6
Classification using Percolator
• Elasticsearch
– A sophisticated, open source full-text search engine
– Lets you query documents stored in an index
• Elasticsearch Percolator
– Store queries in an index and match documents to queries
– Classification uses the percolator to match documents to rules
• EXTRA Rule Language
– Rule-writer-friendly language (easier than ES DSL)
– Access to all ES features, plus custom operators
© 2018 IPTC (www.iptc.org) All rights reserved 7
Schema and Rules Example
• Two fields - headline and body- with body allowed to be
queried by paragraph
headline
body
body_paragraph
• A rule to require that “angela merkel” and “us elections”
appear in the same paragraph
(prox/unit=paragraph/distance=1
(body adj "angela merkel")
(body adj "us elections")
)
© 2018 IPTC (www.iptc.org) All rights reserved 8
FRANCIS*
Using machine learning to empower rule-based
classification of news with semantics.
• “aboutness” evaluation
– Given that a story is about a topic, how much is it about it?
• Rule suggestion
– Suggest rules based on a pre-tagged corpus
• Enriched rule operators
– For example, nested “count” operators
– Using EXTRA as the foundation
* St Francis de Sales is the patron saint of writers and journalists
© 2018 IPTC (www.iptc.org) All rights reserved 9

More Related Content

What's hot

Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE
 
New PID developments
New PID developmentsNew PID developments
New PID developmentsOpenAIRE
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksOpenAIRE
 
IPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupIPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupStuart Myles
 
Fair - Interoperability - Keith Russell
Fair  - Interoperability - Keith RussellFair  - Interoperability - Keith Russell
Fair - Interoperability - Keith RussellARDC
 
ICIC 2013 New Product Introductions max.recall
ICIC 2013 New Product Introductions max.recallICIC 2013 New Product Introductions max.recall
ICIC 2013 New Product Introductions max.recallDr. Haxel Consult
 
EDI Training Module 12: Learn to Cite and Link Your Data
EDI Training Module 12:  Learn to Cite and Link Your DataEDI Training Module 12:  Learn to Cite and Link Your Data
EDI Training Module 12: Learn to Cite and Link Your DataEnvironmental Data Initiative
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - MinesoftDr. Haxel Consult
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectBigData_Europe
 
ICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftDr. Haxel Consult
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositoriesVarsha Khodiyar
 
EDI Training Module 10: EDI Data Repository Overview
EDI Training Module 10:  EDI Data Repository OverviewEDI Training Module 10:  EDI Data Repository Overview
EDI Training Module 10: EDI Data Repository OverviewEnvironmental Data Initiative
 
2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins
2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins
2013 CrossRef Workshops Boot Camp CrossCheck Susan CollinsCrossref
 
Research Data Shared Services
Research Data Shared ServicesResearch Data Shared Services
Research Data Shared ServicesJisc RDM
 

What's hot (20)

II-SDV 2016 Minesoft
II-SDV 2016 MinesoftII-SDV 2016 Minesoft
II-SDV 2016 Minesoft
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
IPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupIPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working Group
 
II-SDV 2016 RightsDirect
II-SDV 2016 RightsDirectII-SDV 2016 RightsDirect
II-SDV 2016 RightsDirect
 
Fair - Interoperability - Keith Russell
Fair  - Interoperability - Keith RussellFair  - Interoperability - Keith Russell
Fair - Interoperability - Keith Russell
 
ICIC 2013 New Product Introductions max.recall
ICIC 2013 New Product Introductions max.recallICIC 2013 New Product Introductions max.recall
ICIC 2013 New Product Introductions max.recall
 
EDI Training Module 12: Learn to Cite and Link Your Data
EDI Training Module 12:  Learn to Cite and Link Your DataEDI Training Module 12:  Learn to Cite and Link Your Data
EDI Training Module 12: Learn to Cite and Link Your Data
 
New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - Minesoft
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
ICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction Minesoft
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositories
 
Feick "Institutional Identifier - Soon To Be a Reality - I2"
Feick "Institutional Identifier - Soon To Be a Reality - I2"Feick "Institutional Identifier - Soon To Be a Reality - I2"
Feick "Institutional Identifier - Soon To Be a Reality - I2"
 
EDI Training Module 10: EDI Data Repository Overview
EDI Training Module 10:  EDI Data Repository OverviewEDI Training Module 10:  EDI Data Repository Overview
EDI Training Module 10: EDI Data Repository Overview
 
2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins
2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins
2013 CrossRef Workshops Boot Camp CrossCheck Susan Collins
 
20110606 portal site
20110606 portal site20110606 portal site
20110606 portal site
 
Research Data Shared Services
Research Data Shared ServicesResearch Data Shared Services
Research Data Shared Services
 

Similar to IPTC EXTRA Spring 2018

IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017Stuart Myles
 
EXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsEXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsStuart Myles
 
IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
IPTC Rights Expression Working Group Spring 2016
IPTC Rights Expression Working Group Spring 2016IPTC Rights Expression Working Group Spring 2016
IPTC Rights Expression Working Group Spring 2016Stuart Myles
 
Update on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineUpdate on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineStuart Myles
 
FIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data ModelsFIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data ModelsFIWARE
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Toni Hermoso Pulido
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT FrameworkPERICLES_FP7
 
IPTC Rights Expression Working Group Spring 2014
IPTC Rights Expression Working Group Spring 2014IPTC Rights Expression Working Group Spring 2014
IPTC Rights Expression Working Group Spring 2014Stuart Myles
 
II-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoftII-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoftDr. Haxel Consult
 
IPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopIPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopStuart Myles
 
Introduction to IPTC Rights - RightsML and ODRL
Introduction to IPTC Rights - RightsML and ODRLIntroduction to IPTC Rights - RightsML and ODRL
Introduction to IPTC Rights - RightsML and ODRLStuart Myles
 
CLARIN Component Metadata Infrastructure
CLARIN Component Metadata Infrastructure CLARIN Component Metadata Infrastructure
CLARIN Component Metadata Infrastructure EOSC-hub project
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Andrew Clark
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdNathan Yergler
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningIRJET Journal
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 

Similar to IPTC EXTRA Spring 2018 (20)

IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
EXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsEXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for News
 
IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC Rights Expression Working Group Spring 2016
IPTC Rights Expression Working Group Spring 2016IPTC Rights Expression Working Group Spring 2016
IPTC Rights Expression Working Group Spring 2016
 
Update on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineUpdate on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification Engine
 
FIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data ModelsFIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data Models
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
The PeriCAT Framework
The PeriCAT FrameworkThe PeriCAT Framework
The PeriCAT Framework
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
IPTC Rights Expression Working Group Spring 2014
IPTC Rights Expression Working Group Spring 2014IPTC Rights Expression Working Group Spring 2014
IPTC Rights Expression Working Group Spring 2014
 
II-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoftII-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoft
 
IPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopIPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification Workshop
 
Introduction to IPTC Rights - RightsML and ODRL
Introduction to IPTC Rights - RightsML and ODRLIntroduction to IPTC Rights - RightsML and ODRL
Introduction to IPTC Rights - RightsML and ODRL
 
CLARIN Component Metadata Infrastructure
CLARIN Component Metadata Infrastructure CLARIN Component Metadata Infrastructure
CLARIN Component Metadata Infrastructure
 
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
 
File000162
File000162File000162
File000162
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 

More from Stuart Myles

IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working GroupStuart Myles
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressStuart Myles
 
IPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingIPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingStuart Myles
 

More from Stuart Myles (20)

IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working Group
 
Rights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated PressRights for Photo and Video Archives at the Associated Press
Rights for Photo and Video Archives at the Associated Press
 
IPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 MeetingIPTC Welcome to IPTC's Spring 2017 Meeting
IPTC Welcome to IPTC's Spring 2017 Meeting
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

IPTC EXTRA Spring 2018

  • 1. “Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
  • 2. EXTRA and FRANCIS Stuart Myles * Associated Press * 24th April 2018 © 2018 IPTC (www.iptc.org) All rights reserved https://flic.kr/p/fBshW3 https://flic.kr/p/atFSAr
  • 3. Rules-Based Classification • Rules better for breaking news than statistical methods – You don’t need 50 examples before you can start tagging – A rule for a new topic doesn’t require other rules to change • More consistent and scalable than hand tagging • Easier to explain why rules classify content – Machine learning methods can be “black boxes” – Easier to precisely explain - and correct - mistakes © 2018 IPTC (www.iptc.org) All rights reserved 3
  • 4. EXTRA EXTraction Rules Apparatus Rules-based classification of text Open source software https://iptc.github.io/extra/ EXTRA was developed by the IPTC €50,000 Grant from the Digital News Initiative https://www.digitalnewsinitiative.com/fund/ You can use your own taxonomy, rules and formats - Example rules help us drive development of the EXTRA system - You can use the example rules to see how to develop your own - Rules could apply IPTC Media Topics or any other taxonomy © 2018 IPTC (www.iptc.org) All rights reserved 4
  • 5. Development Process The EXTRA software was developed by Infalia - All software is open source Two linguists creating rules in English and German - Samples rules to apply IPTC Media Topics Example news corpora licensed for EXTRA - English from Thomson Reuters - German from APA © 2018 IPTC (www.iptc.org) All rights reserved 5
  • 7. Classification using Percolator • Elasticsearch – A sophisticated, open source full-text search engine – Lets you query documents stored in an index • Elasticsearch Percolator – Store queries in an index and match documents to queries – Classification uses the percolator to match documents to rules • EXTRA Rule Language – Rule-writer-friendly language (easier than ES DSL) – Access to all ES features, plus custom operators © 2018 IPTC (www.iptc.org) All rights reserved 7
  • 8. Schema and Rules Example • Two fields - headline and body- with body allowed to be queried by paragraph headline body body_paragraph • A rule to require that “angela merkel” and “us elections” appear in the same paragraph (prox/unit=paragraph/distance=1 (body adj "angela merkel") (body adj "us elections") ) © 2018 IPTC (www.iptc.org) All rights reserved 8
  • 9. FRANCIS* Using machine learning to empower rule-based classification of news with semantics. • “aboutness” evaluation – Given that a story is about a topic, how much is it about it? • Rule suggestion – Suggest rules based on a pre-tagged corpus • Enriched rule operators – For example, nested “count” operators – Using EXTRA as the foundation * St Francis de Sales is the patron saint of writers and journalists © 2018 IPTC (www.iptc.org) All rights reserved 9