Your SlideShare is downloading. ×

AnnoMarket - Cloud-based text analytics

251

Published on

Presentation introducing AnnoMarket which I gave at the Open Data Institute, Women in Data meetup, on 3rd December 2013

Presentation introducing AnnoMarket which I gave at the Open Data Institute, Women in Data meetup, on 3rd December 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
251
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extracting value from data: Introducing AnnoMarket - the cloud-based text annotation marketplace Helen Lippell, Press Association The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°296322”
  • 2. Getting started  AnnoMarket is a European research project  PA part of a consortium with the University of Sheffield, French start-up IMR and Bulgarian semantic specialists Ontotext  I work as a data wrangler within the PA technology team, working with linked data curation and semantic modelling  Agenda:  Brief overview of text analytics  Introducing the AnnoMarket platform Women in Data: NLP edition, Open Data Institute, 3 December 2013 2
  • 3. Text analytics isn’t that new   People have been trying to make sense of unstructured data for a long time Rosetta Stone an early use case!   Experts compared patterns in the 3 texts and eventually could identify entities in the previouslyincomprehensible hieroglyphics This 1950s definition startlingly accurate:  H.P. Luhn, IBM Journal, 1958: "...utilize data-processing machines for auto- abstracting and auto-encoding of documents for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points”  Now there’s an unprecedented buzz around text analytics   Big Data movement Semantics gaining traction in business applications Women in Data: NLP edition, Open Data Institute, 3 December 2013 3
  • 4. Who uses text analytics?         Anyone who wants to derive value from unstructured data at scale Not just spooks… Scientific and technical Media and publishing Open data community Researchers Data-driven businesses Customer experience Women in Data: NLP edition, Open Data Institute, 3 December 2013 4
  • 5. What text analytics can do  Named entity recognition  Disambiguation  Eg Iceland!  Entity types  Eg People, places, things, organisations  Relevance  Pattern-identified entities  Eg amounts of money, postcodes  Co-occurrence  Classification and categorisation  Sentiment analysis Women in Data: NLP edition, Open Data Institute, 3 December 2013 5
  • 6. AnnoMarket  The marketplace - An “App store” for text analysis services  Breaking down barriers to entry for SMEs (developers and endusers alike)  Built on robust, mature GATE applications (open-source with global community supported by the University of Sheffield)  Benefits to end-users  Affordable, pay-for-what-you-need model  SaaS, cloud-based  Flexible input and output formats  Benefits to suppliers  Payments system  Access to user base  (A note on look and feel: It is basic at the moment!) Women in Data: NLP edition, Open Data Institute, 3 December 2013 6
  • 7. Running an annotation job        Find a service Test it on site Upload documents or specify a custom crawl Manage server (GATE Teamware or Mimir) Platform handles execution of job, keeps user updated Download results or export to a GATE Mimir instance Formats include XML, HTML, PDF, DOC - GATE Teamware – web-based management platform for annotation - GATE Mimir – open-source framework for integrated semantic search Women in Data: NLP edition, Open Data Institute, 3 December 2013 7
  • 8. Uploading pipelines  Straightforward process  Standard components:  Pipeline – GATE saved application state  Supporting files (eg gazetteers)  Metadata for the platform and user-facing pages  Files checked then put live  Platform tracks usage and handles payments Women in Data: NLP edition, Open Data Institute, 3 December 2013 8
  • 9. AnnoMarket screenshots Browsable portal Tag-based filtering Input config Output config Women in Data: NLP edition, Open Data Institute, 3 December 2013 9
  • 10. News pipeline tool - Customised pipeline which annotates named entities in the news domain (optimised for the UK) - Leverages PA’s knowledgebase and Linked Data references, also other entity types Women in Data: NLP edition, Open Data Institute, 3 December 2013 10
  • 11. Get involved  Public beta  Register your interest now  We’ll email when it’s open  Free credit to early registrants  Ultimate aim:  A sustainable platform that generates revenue for contributors who wouldn’t have an outlet otherwise  Play with the platform  Feed back to us – bugs, functionality, finding resources, what more you’d like to see, etc! Women in Data: NLP edition, Open Data Institute, 3 December 2013 11
  • 12. Get in touch  Public beta – http://annomarket.com  Project site – https://annomarket.eu  @AnnoMarket  helen.lippell@pressassociation.com  @octodude Women in Data: NLP edition, Open Data Institute, 3 December 2013 12

×