Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using entity extraction extension with OpenRefine and Dandelion API

5,464 views

Published on

Food for thoughts to understand why you need entity extraction capabilities inside OpenRefine. Some examples and scenarios.

Published in: Technology
  • Sex in your area is here: ❤❤❤ http://bit.ly/369VOVb ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ http://bit.ly/369VOVb ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Positions Available Now! We currently have several openings for writing workers. ★★★ http://t.cn/AieXS62G
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The performance of Dandelion API is not good, especially for the similarity analysis the word vector metrics are not modelled precisely also for the entity extraction the performance is too bad
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Using entity extraction extension with OpenRefine and Dandelion API

  1. 1. Using entity extraction extension with OpenRefine and Dandelion API ! food for thoughts
  2. 2. What we are talking about OpenRefine www.openrefine.org NER extension integrated with Dandelion API http://freeyourmetadata.org/named-entity-extraction/ (dandelion.eu)
  3. 3. What industries are using OpenRefine? https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
  4. 4. data journalists metadata curators museums libraries research labs SEO folks data scientists enterprises universities patent attorneys Open Data hackers Social Media specialists civil servants
  5. 5. What does OpenRefine offer that other data-parsing tools don't? http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
  6. 6. reconciliation of text data against reference data services containing strong identifiers (Freebase, OpenCorporates, any SPARQL or RDF, etc) ! simple linking of reconciled entities to other info sources like Wikipedia, MusicBrainz, IMDB, etc […] […]
  7. 7. How we are using it, at SpazioDati?
  8. 8. OpenRefine is inside our data curation controller
  9. 9. normalize, clean and extract data from different sources reconcile against internal reconciliation services ( administrative regions, names and telephone numbers… ) apply rules and transformations to data, aligned it with our internal ontologies
  10. 10. A look at OpenRefine & reconciliation
  11. 11. Why it’s useful reconciliation? Instruments bla bla bla bla bla bla bla … what kind of instruments?
  12. 12. reconciliation identifies keywords in flowing text and gives them a URL from strings to things
  13. 13. instruments data column musical instruments measuring instruments aeronautical instruments URL URL URL Instruments bla bla bla
  14. 14. reconciliation works great for those fields in your dataset that contain single terms names of people countries, works of art […]
  15. 15. and what if we have a column with unstructured texts, like this one?
  16. 16. we need a new step in the data curation workflow… a new column data, labelled “dataTXT” extract named entities using NER extension + Dandelion API data column with some texts
  17. 17. in this column, there are named concepts, linked to Wikipedia label + URI “Collective action” + http://en.wikipedia.org/wiki/Collective_action
  18. 18. make a text filter looking for a concept classify and categorize the content … things, not strings
  19. 19. some scenarios
  20. 20. Open Data community real issues Using OpenRefine + NER extension with Dandelion API extract meaninful informations from some CVs, like names, organizations, skills, … http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction normalize organizations names cited in some texts
  21. 21. Data journalists Using OpenRefine + NER extension with Dandelion API extract relevant news to a precise topic ( a person, a brand or a company ) write a summary from a politician speech, starting from the main concepts extracted from the text mine specific informations in judicial decisions (judge's name, court, area of law and neutral citation number
  22. 22. Using OpenRefine + NER extension with Dandelion API Text mining on tweets: extract brands, places and concepts easily from a twitter flow related to an event Text mining on website content: extract concepts and places easily from a webpage, to improve website SEO ranking Social media specialists
  23. 23. Using OpenRefine + NER extension with Dandelion API Understand your own bank account statements: extract useful informations, like brands and places, to categorize and classify your own expenses “Quantify self” movement Analytics on Personal Data
  24. 24. @dandelionapi #refine #ner you know other use cases? tell us on Twitter! @spaziodatidandelion.eu

×