Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WordLift for Digital Publishers and how to create an Open Database of Knowledge

2,152 views

Published on

InsideOut10 is delivering an innovative product called WordLift. WordLift brings the power of Artificial Intelligence to Web Publishers around the World turning editorial contents into actionable knowledge. WordLift integrates with WordPress a well-known open source CMS. This presentation also includes an introduction of MICO a research co-funded by the European project MICO (Media in Context).

Published in: Technology

WordLift for Digital Publishers and how to create an Open Database of Knowledge

  1. 1. Andrea Volpini @cyberandy @multilingweb - Dipartimento di Informatica, Sapienza Università di Roma 6th July 2015 WordLift for Digital Publishers
  2. 2. This fine event is hosted by: @multilingweb // LIDER future of journalism opendata @wordliftit v3 @mico_project Hello, I am: @cyberandy No.8 - MARK ROTHKO This workshop is about:
  3. 3. Meet Your Audience
  4. 4. Some are humans and some …are not. Astro Boy Comic
  5. 5. “Hi Stacey! Would you like me to read your favourite news?”
  6. 6. “ok Hound, When will the sun rise in Japan two days before Christmas in 2021?” Friendly, helpful and intelligent a complete new class of voice-enabled assistants has just arrived
  7. 7. Beta Testing the Apocalypse - TOM KACZYNSKI ANTI MONEY LAUNDRY COMPLIANCE AND INVESTMENT STRATEGIES BANKS & INVESTORS CHECKING IF THERE ARE ON-GOING OR PAST LEGAL PROCESSES LAW FIRMS POLICY MAKERS NEWS AS VALUABLE INPUT IN THE LAW MAKING PROCESS BUSINESS CREATING BUSINESS VALUES AND TAKING DECISIONS BY READING NEWS (Humans)…creating value with News
  8. 8. Meet Your New Colleagues
  9. 9. can interpret your data and turn it into meaningful, personalised content. Associated Press announced last year that corporate earnings stories and sport stories are written automatically. Text Generation Algorithms Logan Ingalls / Flickr
  10. 10. Analysts expect higher profit for Paychex when the company reports its fourth quarter results on Tuesday, July 1, 2014. The consensus estimate is calling for profit of 40 cents a share, reflecting a rise from 38 cents per share a year ago. Your New Colleague…the Algorithm has just written a new piece.
  11. 11. but remember… you still are “Uniquely Human” Pay a visit to http://nextdraft.com/
  12. 12. “If our role as journalists is to help communities better organize their knowledge and themselves, then it is apparent that we are in the service business and that we must draw on many tools, including content, and place value on the relationships we build with members of our communities, which will also take many forms. Thus we are in the relationship business.” Jeff Jarvis Human Factor is key!
  13. 13. Introducing
  14. 14. MEANINGFULLY ORGANISE YOUR CONTENT A Semantic Editor for WordPress for journalists and bloggers to: ASSIST THE WRITING PROCESS WITH CONTEXTUAL INFORMATION ADD STRUCTURED METADATA ENRICH CONTENT SUGGESTING IMAGES, LINKS AND WIDGETS RECOMMEND RELEVANT CONTENT TO READERS BUILD AN OPEN DATASET (ENTITIES + ANNOTATIONS + CONTENT)
  15. 15. ASSIST THE WRITING PROCESS WITH CONTEXTUAL INFORMATION Fact-based information are derived from open datasets and are contextually relevant to the article. Editors can choose what datasets will be used for the enrichment.
  16. 16. ENRICH CONTENT SUGGESTING IMAGES, LINKS AND WIDGETS Relevant and free to use photos and illustrations from the Commons community meaningful navigation systems for internal interlinking
  17. 17. Bringing to the audience an overview of all the content being written around a specific topic using the chord widget. RECOMMEND RELEVANT CONTENT content evolution over time INTRODUCING THE NAVIGATOR WIDGET WHERE /entity/earth WHO /entity/michael-caine schema:Person schema:Place schema:Organisation WHO /entity/nasa type: /BlogPosting /2015/07/04/coopers-endurance-crew/ Creates links to entity pages and related articles by using the WHO, WHERE, WHAT and WHEN classifications.
  18. 18. ADD STRUCTURED METADATA The blog post, entities (dct:references), publishing information (schema:datePublished and schema:dateModified), the author (schema:author), and the number of comments (schema:interactionCount) are published as Linked Open Data and printed using schema.org for on-page SEO. http://data.redlink.io/91/be2/post/Interstellar.html
  19. 19. Editors identify the basic 'WHO, WHAT, WHEN and WHERE'of an article and structure information around it by creating new entities in their custom vocabulary. Content, vocabulary and annotations constitutes the publisher’s knowledge graph and can be queried via SPARQL. BUILD AN OPEN DATASET (ENTITIES + ANNOTATIONS + CONTENT)
  20. 20. (using and ) How does a blog post look in the knowledge graph? Special thanks to @dvcama :) owl:sameAs connects entities, detected in the blog post, such as Wormhole (with the same entity on DBpedia and Freebase).
  21. 21. Starting this coming September WordLift and the technologies of MICO (for cross-media analysis) are going to be used and validated by Greenpeace Italy on their subscribers magazine website (magazine.greenpeace.it). Let’s move now to a real-world use case where ecologists, journalists and visionaries stand to defend the natural world and to promote peace.
  22. 22. CONTENT ANALYSIS LINKED DATA PUBLISHING 1 3 Technology Stack Text Legacy Data Audio/Images CONTENT DISCOVERY2 MICO is a 3yrs EU- funded research project (grant no. 610480) that brings to the platform Cross-Media Extraction Cross-Media Metadata Publishing Cross-Media Querying Cross-Media Recommendation • Enterprise Linked Data • Content Analysis • Semantic Search • Semantic Media Analysis and Search Media extractors available in MICO today: Animal detection, video quality, temporal segmentation, automatic speech recognition, speech-music discrimination, face detection and audio tampering detection.
  23. 23. Multimedia Retrieval Cross-Media Querying: Introducing the SPARQL extension SPARQL-MM, which adds multimedia specific features to the standard query language for the Semantic Web. How can we help Greenpeace Italy? •Connect videos with text using cross-media recommendations •Provide compact contextual information for media assets •Create new discovery path for their readers and subscribers Spation-Temporal Object Model in SPARQL-MM “Point me to scenes within videos where Barack Obama is standing to left of the MD of Greenpeace while talking about whale hunting” Find out more on the SPARQL extension SPARQL-MM by reading this presentation by Thomas Kurz
  24. 24. Lessons learned so far… • The bond between data and journalism is growing stronger and even for independent news organisation like Greenpeace providing context, clarity and building relationships (and knowledge graphs) is vital • Algorithms are great and AI has entered the newsrooms but journalists shall preserve their authorship and role when crafting content - always leave the control in the hands of humans • Providing immediate added value in the UX of semantic apps like WordLift is key to engage journalists and not only marketers and management • Tags don’t help organising contents and named entities are much better • Linked Data is a service NOT a technology: users want to see images, meaningful links, recommendation and interactive widgets - they don’t care about underlying technologies like RDF and SPARQL • Creating datasets as a side effect while editing contents helps journalists make an impact and connect with policy makers, business and other communities.
  25. 25. JOIN.WORDLIFT.IT Grazie! “[SLIDES] Creating an open database of knowledge by tagging the WHO, WHAT, WHERE, WHEN of your contents #journalism” Lclick to share it on Twitter! mico-project.eu wordlift.it insideout.io
  26. 26. CREDITS Wilfried Runde of Deutsche Welle, “In Praise of Robots and Humans” Justin Kosslyn from Google Ideas, on thinking about how journalists' work gets used Luca Rosati from News to Experience BBC News Labs A manifesto for structured journalism this presentation is the result of many inspiring ideas and amazing work from media experts, journalists and technologists and here is the list: any idea, graphics or meme belonging to us is available for sharing, copying and re-mixing under creative commons license 3.0 This presentation and the work behind it was partially developed within the MICO project (Media in Context - European Commission 7th Framework Programme grant agreement no: 610480). FIND OUT MORE ABOUT OUR PRODUCTS Video Hosting Platform Semantic Editor Semantic Search

×