Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
An Update on EXTRA
Stuart Myles * Associated Press * 24th October 2016
© 2016 IPTC (www.iptc.org) All rights reserved
http...
EXTRA
EXTraction Rules Apparatus
Rules-based classification of text
Open source software
EXTRA is being developed by the I...
Google DNI
• Google’s €150 million Digital News Initiative fund
– Stimulate innovation among European news organizations
–...
EXTRA
EXTraction Rules Apparatus
• Open source
– IPTC always uses open licenses – in this case, the MIT license
• Rules-ba...
EXTRA Requirements
Weekly teleconferences and emails to document requirements
https://iptc.org/events/
https://groups.yaho...
Seeking Developers
Know anyone who might be qualified to develop EXTRA?
Send them our way
https://goo.gl/nUGrGT
Qualificat...
Apache UIMA Ruta
UIMA - Unstructured Information Management Applications
Ruta - Rules-Based Text Annotation – consists of ...
Rules and News
Securing news corpora in two+ Media Topics languages
• English from Thomson Reuters
• German from APA
• Fre...
How Can You Get Involved?
In order of increasing effort (and potential reward):
1. Join the (low frequency) email list to ...
Date and Place of Next Meeting
London, UK 15 – 17 May 2017
https://flic.kr/p/suXCVH
Danke und auf wiedersehen!
© 2016 IPTC...
Upcoming SlideShare
Loading in …5
×

Update on IPTC's EXTRA Open Source Classification Engine

476 views

Published on

IPTC is creating an open source news classification engine. We have won a grant from Google DNI to create a rule-based engine and rules for at least two languages. We've worked up detailed requirements and are now moving onto implementation.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Update on IPTC's EXTRA Open Source Classification Engine

  1. 1. “Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
  2. 2. An Update on EXTRA Stuart Myles * Associated Press * 24th October 2016 © 2016 IPTC (www.iptc.org) All rights reserved https://flic.kr/p/HMQ514
  3. 3. EXTRA EXTraction Rules Apparatus Rules-based classification of text Open source software EXTRA is being developed by the IPTC Grant from the Digital News Initiative https://iptc.github.io/extra/ © 2016 IPTC (www.iptc.org) All rights reserved 3
  4. 4. Google DNI • Google’s €150 million Digital News Initiative fund – Stimulate innovation among European news organizations – https://www.digitalnewsinitiative.com/fund/ • Multiple funding rounds – First funding of €27 million to projects in 23 countries – http://googlepolicyeurope.blogspot.gr/2016/02/digital-news-initiative- first-funding_24.html • IPTC’s EXTRA project funded in first round - October 2015 – Developer to create the engine ~ €35,000 – Linguists to develop sample rules ~ €14,000 – Marketing to promote the work ~ €1,000 – Total grant to IPTC from DNI = €50,000 © 2016 IPTC (www.iptc.org) All rights reserved
  5. 5. EXTRA EXTraction Rules Apparatus • Open source – IPTC always uses open licenses – in this case, the MIT license • Rules-based – Better for breaking news than statistical methods – More consistent and scalable than hand tagging – Easier to explain why rules classify content • Multilingual – Developing rules for two IPTC Media Topics Languages • News classification – Rules will be developed using news content corpora © 2016 IPTC (www.iptc.org) All rights reserved 5
  6. 6. EXTRA Requirements Weekly teleconferences and emails to document requirements https://iptc.org/events/ https://groups.yahoo.com/neo/groups/iptc-extra/info https://goo.gl/EY4pMP – Use Cases – Performance – Internationalization and Character Encoding – Rule Language Operators and Functions – Input and Output Formats – Hit and miss highlighting – Relevance – Machine Learning – Sample Rules © 2016 IPTC (www.iptc.org) All rights reserved 6
  7. 7. Seeking Developers Know anyone who might be qualified to develop EXTRA? Send them our way https://goo.gl/nUGrGT Qualifications? Proposed technical approach? Particular frameworks/languages/tools? © 2016 IPTC (www.iptc.org) All rights reserved 7
  8. 8. Apache UIMA Ruta UIMA - Unstructured Information Management Applications Ruta - Rules-Based Text Annotation – consists of two parts: 1. Analysis Engine for executing the rules 2. Eclipse-based rule-writing workbench https://uima.apache.org/ruta.html Has many – but not all – of the features we require for EXTRA UIMA has a reputation for a steep learning curve ASF License is slightly more restrictive than MIT License © 2016 IPTC (www.iptc.org) All rights reserved 8
  9. 9. Rules and News Securing news corpora in two+ Media Topics languages • English from Thomson Reuters • German from APA • French from AFP • English+ from Signal http://research.signalmedia.co/ • Agreeing on licensing remains the stumbling block © 2016 IPTC (www.iptc.org) All rights reserved 9
  10. 10. How Can You Get Involved? In order of increasing effort (and potential reward): 1. Join the (low frequency) email list to stay up-to-date https://groups.yahoo.com/neo/groups/iptc-extra/info 2. Suggest to someone they should apply to develop EXTRA https://goo.gl/nUGrGT 3. Read and comment on the requirements https://goo.gl/EY4pMP 4. Join the weekly teleconferences https://iptc.org/events/ © 2016 IPTC (www.iptc.org) All rights reserved 10
  11. 11. Date and Place of Next Meeting London, UK 15 – 17 May 2017 https://flic.kr/p/suXCVH Danke und auf wiedersehen! © 2016 IPTC (www.iptc.org) All rights reserved 11

×