Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Boost your data analytics with open data and public news content

669 views

Published on

Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.


This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:

Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources

Published in: Data & Analytics
  • Be the first to comment

Boost your data analytics with open data and public news content

  1. 1. Boost Your Data Analytics with Open Data and Public News Content Ontotext Webinar, 24 Mar 2016
  2. 2. Presentation Outline – PART I • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 2
  3. 3. Quick news-analytics case Mar 2016Open Data & News Analytics 3 • Our Dynamic Semantic Publishing platform already offers linking of text with big open data graphs • One can get navigate from text to concepts, get trends, related entities and news • Try it at http://now.ontotext.com
  4. 4. Presentation Outline • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 4
  5. 5. Our approach to Big Data 1. Integrate relevant data from many sources − Build a Big Knowledge Graph from proprietary databases and taxonomies integrated with millions of facts of Linked Data 2. Infer new facts and unveil relationships − Performing reasoning across data from different sources 3. Interlink text and with big data − Using text-mining to automatically discover references to concepts and entities 4. Use NoSQL graph database for metadata management, querying and search Mar 2016Open Data & News Analytics #5
  6. 6. NoSQL Graph Database Mar 2016Open Data & News Analytics 6 myData: Maria ptop:Agent ptop:Person ptop:Woman ptop:childOf ptop:parentOf rdfs:range owl:inverseOf inferred myData:Ivan owl:relativeOf owl:inverseOfowl:SymmetricProperty rdfs:subPropertyOf owl:inverseOf owl:inverseOf rdf:type rdf:type rdf:type • The hottest NoSQL trend • W3C standards • Efficient Data Integration − Using logical inference − For data integration and BI
  7. 7. Analyzing Text Mar 2016Open Data & News Analytics 7 • Full spectrum of NLP weaponry • Semantic indexing − Tag references with entity IDs − Generate semantic metadata descriptions of documents − Store metadata in GraphDB
  8. 8. Presentation Outline • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 8
  9. 9. The Web of Linked Data in 2007 Mar 2016Open Data & News Analytics #9 structured database version of Wikipedia database of all locations on Earth product reviews semantic synonym dictionary Note: Each bubble represents a dataset. Arrows represent mappings across datasets; e.g. dbpedia:Paris owl:sameAs geo:2988507
  10. 10. The Web of Linked Data is Gaining Mass Mar 2016Open Data & News Analytics #10
  11. 11. The Web of Data is Gaining Mass (2011) Mar 2016Open Data & News Analytics #11
  12. 12. The Web of Linked Data is Gaining Mass Mar 2016Open Data & News Analytics #12 • 2013 stats: 2 289 public datasets − http://stats.lod2.eu/ • Growing exponentially − see the dotted trend line • Structured markup − Schema.org; semantic SEO • Enables better semantic tagging! − As there are more concepts and richer descriptions to refer to 27 43 89 162 295 822 2,289 2007 2008 2009 2010 2011 2012 2013 LinkedDataDatasets
  13. 13. The FactForge Data • DBpedia (the English version only): 496M statements • Geonames: 150M statements − SameAs links between DBpedia and Geonames: 471K statements • NOW data – metadata about news: 128M statements • Total size: 938М statements − 656M explicit statements + 281M inferred statements − RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints Mar 2016Open Data & News Analytics 13
  14. 14. News Metadata • Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase • News corpus from Google since Feb 2015, about 10k news/month • ~70 tags (annotations) per news article • Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases Mar 2016Open Data & News Analytics 14
  15. 15. News Metadata Mar 2016Open Data & News Analytics 15
  16. 16. News Metadata Mar 2016Open Data & News Analytics 16 Category Count International 52 074 Science and Technology 23 201 Sports 20 714 Business 15 155 Lifestyle 11 684 122 828 Mentions / entity type Count Keyphrase 2 589 676 Organization 1 276 441 Location 1 260 972 Person 1 248 784 Work 309 093 Event 258 388 RelationPersonRole 236 638 Species 180 946
  17. 17. News Geographic Coverage Mar 2016Open Data & News Analytics 17 • Quite focused on USA!
  18. 18. Class Hierarchy Map (by number of instances) Mar 2016Open Data & News Analytics 18 Left: The big picture Right: dbo:Agent class (2.7M organizations and persons)
  19. 19. Presentation Outline • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 19
  20. 20. Sample queries • There is a rich set of sample queries that allow exploration of this combination of DBPedia, GeoNames and news metadata • We will showcase few of those, starting from the simple once • In bold we marked the “parameters” of the queires Mar 2016Open Data & News Analytics 20
  21. 21. Query: Big Cities in Eastern Europe # benefits from inference over transitive gn:parentFeature # benefits from owl:sameAs mapping between DBPedia and Geonames PREFIX dbr: <http://dbpedia.org/resource/> PREFIX onto: <http://www.ontotext.com/> PREFIX gn: <http://www.geonames.org/ontology#> PREFIX dbo: <http://dbpedia.org/ontology/> select * from onto:disable-sameAs where { ?loc gn:parentFeature dbr:Eastern_Europe ; gn:featureClass gn:P. ?loc dbo:populationTotal ?population ; dbo:country ?country . FILTER(?population > 300000 ) } order by ?country Mar 2016Open Data & News Analytics 21
  22. 22. Query: People and Organizations related to Google # benefits from inference over transitive dbo:parent # RDFRank makes it easy to see the “top suspects” in a list of 93 entities PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#> PREFIX dbr: <http://dbpedia.org/resource/> select distinct ?related_entity ?rank where { BIND (dbr:Google as ?entity) { ?related_entity a dbo:Person ; ?p ?entity . } UNION { ?related_entity a dbo:Organisation ; dbo:parent ?entity . } ?related_entity rank:hasRDFRank ?rank } order by desc(?rank) Mar 2016Open Data & News Analytics 22
  23. 23. Query: Airports near London # GraphDB’s geo-spatial plug-in allows efficient evaluation of near-by # RDFRank brings the top 6 passanger airports at the top of a list of 80 PREFIX dbr: <http://dbpedia.org/resource/> PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX gdb-geo: <http://www.ontotext.com/owlim/geo#> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX gdb: <http://www.ontotext.com/owlim/> SELECT distinct ?airport ?rrank WHERE { { SELECT * { dbr:London geo-pos:lat ?lat ; geo-pos:long ?long . } LIMIT 10 } ?airport gdb-geo:nearby(?lat ?long "50mi"); a dbo:Airport ; gdb:hasRDFRank ?rrank . } ORDER BY DESC(?rrank) Mar 2016Open Data & News Analytics 23
  24. 24. Query: Top-level Industries by number of companies # benefits from mapping and consolidation of industry classifications # and predicates in DBPedia (ff-map) PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX ff-map: <http://factforge.net/ff2016-mapping/> select distinct ?topIndustry (count(?company) as ?companies) where { ?company dbo:industry ?industry . ?industrySum ff-map:industryVariant ?industry . ?industrySum ff-map:industryCenter ?topIndustry . } group by ?topIndustry order by desc(?companies) Mar 2016Open Data & News Analytics 24
  25. 25. Presentation Outline • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 25
  26. 26. Semantic Press-Clipping • We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state of the art Named Entity Recognition technology is used − What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero) • We can trace and consolidate references to daughter companies • We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g. company classified as dbr:Bank will also be considered classified as dbr:FinancialServices) Mar 2016Open Data & News Analytics 26
  27. 27. Query: News Mentioning an IBM # technical example to demonstrate how news metadata can be accessed PREFIX pub-old: <http://ontology.ontotext.com/publishing#> PREFIX pub: <http://ontology.ontotext.com/taxonomy/> PREFIX dbr: <http://dbpedia.org/resource/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?news ?title ?date ?pub_entity where { ?news pub-old:containsMention / pub-old:hasInstance ?pub_entity . ?pub_entity pub:exactMatch dbr:IBM . ?news pub-old:creationDate ?date; pub-old:title ?title . FILTER ( (?date > "2015-10-01T00:02:00Z"^^xsd:dateTime) && (?date < "2015-11-01T00:02:00Z"^^xsd:dateTime)) } limit 100 Mar 2016Open Data & News Analytics 27
  28. 28. Query: News Mentioning Gazprom and Its Related Entities # benefits from inference over transitive dbo:parent relation and mappings to it select distinct ?news ?title ?date ?related_entity where { { select distinct ?related_entity { BIND (dbr:Gazprom as ?entity) { ?related_entity a dbo:Person ; ?p ?entity . FILTER NOT EXISTS { ?related_entity dbo:club ?entity } } UNION { ?related_entity a dbo:Organisation ; dbo:parent ?entity . } UNION { BIND(?entity as ?related_entity) } } } ?news pub-old:containsMention / pub-old:hasInstance ?pub_entity . ?pub_entity pub:exactMatch ?related_entity . ?news pub-old:creationDate ?date; pub-old:title ?title . } order by desc(?date) limit 1000 Mar 2016Open Data & News Analytics 28
  29. 29. Query: Most Popular in the News Automotive Companies # benefits from mapping and consolidation of industry classifications select distinct ?pub_entity (max(?entity_label) as ?label) (count(?news) as ?news_count) where { ?news pub-old:containsMention / pub-old:hasInstance ?pub_entity . ?pub_entity pub:exactMatch ?entity; pub:preferredLabel ?entity_label. dbr:Automotive ff-map:industryVariant ?industry . ?entity dbo:industry ?industry . ?news pub-old:creationDate ?date . } group by ?pub_entity order by desc(?news_count) Mar 2016Open Data & News Analytics 29
  30. 30. Query: Most Popular in the News, including children # benefits from mapping and consolidation of industry classifications select distinct ?parent (count(?news) as ?news_count) where { { select distinct ?parent ?entity { BIND(dbr:Software as ?industry) ?industry ff-map:industryVariant ?industryVar . ?parent dbo:industry ?industryVar . ?parent a dbo:Company . FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry } { ?entity dbo:parent ?parent . } UNION { BIND(?parent as ?entity) } } } ?news pub-old:containsMention / pub-old:hasInstance ?pub_entity . ?pub_entity pub:exactMatch ?entity . ?news pub-old:creationDate ?date . } group by ?parent order by desc(?news_count) Mar 2016Open Data & News Analytics 30
  31. 31. News Popularity Ranking: Automotive Mar 2016Open Data & News Analytics 31 Rank Company News # Rank Company incl. mentions of controlled News # 1 General Motors 2722 1 General Motors 4620 2 Tesla Motors 2346 2 Volkswagen Group 3999 3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658 4 Ford Motor Company 1934 4 Tesla Motors 2370 5 Toyota 1325 5 Ford Motor Company 2125 6 Chevrolet 1264 6 Toyota 1656 7 Chrysler 1054 7 Renault-Nissan Alliance 1332 8 Fiat Chrysler Automobiles 1011 8 Honda 864 9 Audi AG 972 9 BMW 715 10 Honda 717 10 Takata Corporation 547
  32. 32. News Popularity: Finance Mar 2016Open Data & News Analytics 32 Rank Company News # Rank Company incl. mentions of controlled News # 1 Bloomberg L.P. 3203 1 China Merchants Bank 40940 2 Goldman Sachs 1992 2 Alphabet Inc. 24219 3 JP Morgan Chase 1712 3 Capital Group Companies 4379 4 Wells Fargo 1688 4 Bloomberg L.P. 3893 5 Citigroup 1557 5 Exor (company) 2775 6 HSBC Holdings 1546 6 JP Morgan Chase 2715 7 Deutsche Bank 1414 7 Nasdaq, Inc. 2178 8 Bank of America 1335 8 Oaktree Capital Management 1757 9 Barclays 1260 9 Goldman Sachs 1085 10 UBS 694 10 Sentinel Capital Partners 1064 Note: Including investment funds, stock exchanges, agencies, etc.
  33. 33. News Popularity: Banking Mar 2016Open Data & News Analytics 33 Rank Company News # Rank Company incl. mentions of controlled News # 1 Goldman Sachs 996 1 China Merchants Bank * 38288 2 JP Morgan Chase 856 2 JP Morgan Chase 1972 3 HSBC Holdings 773 3 Goldman Sachs 1030 4 Deutsche Bank 707 4 HSBC 966 5 Barclays 630 5 Bank of America 771 6 Citigroup 519 6 Deutsche Bank 742 7 Bank of America 445 7 Barclays 681 8 Wells Fargo 422 8 Citigroup 630 9 UBS 347 9 Wells Fargo 428 10 Chase 126 10 UBS 347 Note: including investment funds, stock exchanges, agencies, etc.
  34. 34. Presentation Outline • Quick news-analytics case • Technology approach • FactForge-News: Data architecture • Sample queries on Linked Open Data • News analytics examples • Today’s News Map Mar 2016Open Data & News Analytics 34
  35. 35. Today’s News Map: Business Mar 2016Open Data & News Analytics 35
  36. 36. Today’s News Map: International Mar 2016Open Data & News Analytics 36
  37. 37. Expect in Part II • Mentions of entity and related by month • Most relevant co-occurrnig entities • Most relevant co-occurrnig entities per month • Related News • and more Mar 2016Open Data & News Analytics 37
  38. 38. Thank you! Experience the technology with NOW: Semantic News Portal http://now.ontotext.com Start using GraphDB and text-mining with S4 in the cloud http://s4.ontotext.com Learn more at our website or simply get in touch info@ontotext.com, @ontotext Mar 2016Open Data & News Analytics 38

×