Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

euBusinessGraph Company and Economic Data

191 views

Published on

Presentation at Semantics 2017, Sep 2017, Amsterdam.
By Atanas Kiryakov, Vladimir Alexiev, Plamen Tarkalanov

Published in: Economy & Finance
  • Be the first to comment

  • Be the first to like this

euBusinessGraph Company and Economic Data

  1. 1. euBusinessGraph Company and Economic Data Semantics Conference, Sep 2017
  2. 2. Presentation Outline •Ontotext Introduction •euBusinesGraph •FactForge: Open data and news about people and organizations •Relationship Discovery Examples •Media Monitoring Examples & Popularity Ranking •Global Legal Entity Identifier RDF-ization and DBPedia mapping Sep 2017euBusinessGraph Company and Economic Data
  3. 3. Ontotext Introduction
  4. 4. History and Essential Facts • Started in year 2000 as Semantic Web pioneer − As R&D lab within Sirma – one of the biggest Bulgarian software companies − Got spun-off and took VC investment in 2008 • 65 staff, R&D Center at Sofia; 80% sales in USA and UK − Serving BBC, FT, Springer Nature, Wiley, Elsevier, OUP, IET… • 400+ person-years invested in R&D − Multiple innovation & technology awards: Washington Post, BBC, FT, BAIT, etc. • Member of multiple industry bodies − W3C, EDMC, ODI, LDBC, STI, DBPedia Foundation euBusinessGraph Company and Economic Data Sep 2017
  5. 5. Commercial Company Database (e.g. D&B) Link data! Reveal more! Social Media News Wikipedia Private• Recognizing and linking entities across text and data requires knowledge and context • Knowledge Graphs incorporate semantic entity fingerprints for entities and concepts • Evolve knowledge graphs and interlink them with proprietary data Sep 2017euBusinessGraph Company and Economic Data
  6. 6. Sep 2017euBusinessGraph Company and Economic Data
  7. 7. NOW: Linking News to Big Knowledge Graphs • The Ontotext platform links text to knowledge graphs • Navigate from news to concepts, entities and topics; from there to other news Try it at http://now.ontotext.com Sep 2017
  8. 8. Ontotext Portfolio Sep 2017euBusinessGraph Company and Economic Data
  9. 9. Technology Excellence Delivered • Powerful technology mix: Graph DB engine + Text mining • Robust technology: We run BBC.CO.UK/SPORT and parts of FT.COM • We serve some of the most knowledge intensive enterprises: Sep 2017euBusinessGraph Company and Economic Data
  10. 10. euBusinessGraph
  11. 11. Sep 2017euBusinessGraph Company and Economic Data • Integrate European company and economic data • euBusinessGraph will overcome barriers in company data provisioning • Technology and research partners: SINTEF (coord.), Ontotext, IJS, Uni. Milano
  12. 12. Sep 2017euBusinessGraph Company and Economic Data euBusinessGraph
  13. 13. Sep 2017euBusinessGraph Company and Economic Data • Global Legal Entity Identifier (GLEI) • Business Registers Interconnection System (BRIS) • Financial Industry Business Ontology (FIBO) • OpenCorporates schema • Bulgarian Trade Register schema • W3C: Organization ontology, Registered Organization ontology, Location ontology • Investigative journalism datasets: Panama Papers dataset, Linked Leaks, Trump World dataset • Wikidata properties for describing companies, especially company identifiers in various registers • Other ontologies and code lists: Schema.org, Dublin Core, IANA language tags, NUTS and LAU (EU administrative regions), NACE (EU economic activities), etc. Company Datasets and Ontologies
  14. 14. Sep 2017euBusinessGraph Company and Economic Data • The semantic data model combines various data artefacts − Includes detailed treatment of classes, properties, values, scope notes, data provider rules, URL conventions, etc. • Tools: − rdfpuml used to generate the diagrams − Object-Role Modeling through the Norma euBusinessGraph Semantic Data Model
  15. 15. Sep 2017euBusinessGraph Company and Economic Data Object-Role Diagram of Part of the Semantic Model
  16. 16. Sep 2017euBusinessGraph Company and Economic Data euBusinessGraph Technologies • Ontotext Cognitive Cloud & GraphDB • DataGraft • Dandelion API • Wikifier • ABSTAT • TARQL • XSPARQL
  17. 17. FactForge: Open data and news about people and organizations http://factforge.net
  18. 18. FactForge: Data Integration DBpedia (the English version) 496M Geonames (all geographic features on Earth) 150M owl:sameAs links between DBpedia and Geonames 471K Company registry data (GLEI) 3M Panama Papers DB (#LinkedLeaks) 20M Other datasets and ontologies: WordNet, WorldFacts, FIBO News metadata (2000 articles/day enriched by NOW) 473M Total size (1611M explicit + 328M inferred statements) 1 939М Sep 2017euBusinessGraph Company and Economic Data
  19. 19. News Metadata • Metadata from Ontotext’s Dynamic Semantic Publishing platform − News stream from Google − Automatically generated as part of the NOW.ontotext.com semantic news showcase •News stream from Google since Feb 2015, about 50k news/month − ~70 tags (annotations) per news article • Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases Sep 2017euBusinessGraph Company and Economic Data
  20. 20. New Metadata Category Count International 52 074 Science and Technology 23 201 Sports 20 714 Business 15 155 Lifestyle 11 684 122 828 Mentions / entity type Count Keyphrase 2 589 676 Organization 1 276 441 Location 1 260 972 Person 1 248 784 Work 309 093 Event 258 388 RelationPersonRole 236 638 Species 180 946 News Metadata Sep 2017euBusinessGraph Company and Economic Data
  21. 21. Class Hierarchy Map (by number of instances) Left: The big picture Right: dbo:Agent class (2.7M organizations and persons) Sep 2017euBusinessGraph Company and Economic Data
  22. 22. Sample queries at http://factforge.net • F1: Big cities in Eastern Europe • F2: Airports near London • F3: People and organizations related to Google • F4: Top-level industries by number of companies Available as Saved Queries at http://factforge.net/sparql Note: Open Saved Queries with the folder icon in the upper-right corner Sep 2017euBusinessGraph Company and Economic Data
  23. 23. Relationship Discovery Examples
  24. 24. Relation Discovery Case • Find suspicious relationships like: − Company in USA − Controls another company in USA − Through a company in an off-shore zone • Show news relevant to these companies Sep 2017euBusinessGraph Company and Economic Data
  25. 25. Offshore control example • Query: Find companies, which control other companies in the same country, through company in an off-shore zone • How it works: • Establish control-relationship • Establish a company-country mapping • Establish an “off-shore criteria” • SPARQL it Sep 2017euBusinessGraph Company and Economic Data
  26. 26. Off-shore company control example SELECT * FROM onto:disable-sameAs WHERE { ?c1 fibo-fnd-rel-rel:controls ?c2 . ?c2 fibo-fnd-rel-rel:controls ?c3 . ?c1 ff-map:orgCountry ?c1_country . ?c2 ff-map:orgCountry ?c2_country . ?c3 ff-map:orgCountry ?c1_country . FILTER (?c1_country != ?c2_country) ?c2_country ff-map:hasOffshoreProvisions true . } Sep 2017euBusinessGraph Company and Economic Data
  27. 27. Media Monitoring Examples Sep 2017euBusinessGraph Company and Economic Data
  28. 28. Semantic Media Monitoring For each entity: •popularity trends •relevant news •related entities •knowledge graph information Try it at http://now.ontotext.com Sep 2017euBusinessGraph Company and Economic Data
  29. 29. Semantic Media Monitoring/Press-Clipping • We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state of the art Named Entity Recognition technology is used − What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero) • We can trace and consolidate references to daughter companies • We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g. company classified as dbr:Bank will also be considered classified as dbr:FinancialServices) Sep 2017euBusinessGraph Company and Economic Data
  30. 30. Media Monitoring Queries • F5: Mentions in the news of an organization and its related entities • F7: Most popular companies per industry, including children • F8: Regional exposition of company – normalized Sep 2017euBusinessGraph Company and Economic Data
  31. 31. Media Monitoring Queries • F5: Mentions in the news of an organization and its related entities • F7: Most popular companies per industry, including children • F8: Regional exposition of company – normalized Sep 2017euBusinessGraph Company and Economic Data
  32. 32. News popularity ranking of companies • Rankings can be customized by specifying a geographic region, news category (e.g., business, sport, lifestyle, etc.) and time period. • Unique features: − It is based on live streaming news − Tracks also mentions of subsidiaries • Rank uses the industry sectors of DBPedia with several refinements − About 40 top-industry sectors − Sectors are linked in a hierarchical taxonomy (all together 251 sectors) − Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)
  33. 33. Rank uses NOW, FactForge and GraphDB • This ranking service is entirely based on FactForge − FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts, which is loaded in GraphDB − GraphDB is a semantic graph database engine of Ontotext − Unlike FactForge, this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology. − But it allows users to see the SPARQL query for each ranking and to customize it • Try http://rank.ontotext.com
  34. 34. rank.ontotext.com demonstrator Try it at http://rank.ontotext.com Sep 2017euBusinessGraph Company and Economic Data
  35. 35. rank.ontotext.com demonstrator Try it at http://rank.ontotext.com Sep 2017euBusinessGraph Company and Economic Data
  36. 36. Global Legal Entity Identifier as Open Data
  37. 37. Global Legal Entity Identifier (GLEI) data •Global Legal Entity Identifier Foundation (GLEIF) Utility data −Global Legal Entity Identifier Foundation (GLEIF) is tasked to support the implementation and use of the Legal Entity Identifier (LEI) −The foundation is backed and overseen by the LEI Regulatory Oversight Committee •The data dump −We downloaded as XML data dump from https://www.gleif.org/en/lei-data/gleif-concatenated-file/download-the-concatenated-file. −We used these 2 provided dumps ‣ Level 1 Data (Who is who) ‣ Level 2 Data (Who Owns Whom) Sep 2017euBusinessGraph Company and Economic Data
  38. 38. Global Legal Entity Identifier (GLEI) data •RDF-ized company records −20M explicit statements for 505 thousand organizations ▪ For comparison, there are 296,544 organizations in DBPeda and D&B covers 200+ million ▪ A year ago GLEI had only 3M statements about 211 thousand organizations −9,105 parent/child relationships, 16,150 associated organization •9 705 organizations from the GLEI mapped to DBPediа •Modeling the company data to FIBO •XSPARQL as the transformation engine https://github.com/Ontotext-AD/GLEI Sep 2017euBusinessGraph Company and Economic Data
  39. 39. GLEI Data Model Sep 2017euBusinessGraph Company and Economic Data
  40. 40. GLEI Company Data Sample: ABN-AMRO lei:businessRegistry Kamer van Koophandel lei:businessRegistryNumber 34334259 lei:duplicateReference data:549300T5O0D0T4V2ZB28 lei:entityStatus ACTIVE lei:headquartersCity Amsterdam lei:headquartersState Noord-Holland lei:legalForm NAAMLOZE VENNOOTSCHAP lei:legalName ABN AMRO Bank N.V. lei:lei BFXS5XCH7N0Y05NIXW11 lei:registeredCity Amsterdam lei:registeredCountry NL lei:registeredPostCode 1082 PP lei:registeredState Noord-Holland GLEI Company Data Sample: ABN-AMRO Sep 2017euBusinessGraph Company and Economic Data
  41. 41. Ultimate parent Children Country 1 The Goldman Sachs Group, Inc. 1 851 US 2 United Technologies Corporation 427 US 3 Honeywell International Inc. 341 US 4 Morgan Stanley 228 US 5 Cargill, Incorporated 217 US 6 1832 Asset Management L.P. 202 CA 7 Aegon N.V. 174 NL 8 Union Bancaire Privée, UBP SA 138 CH 9 Citigroup Inc. 135 US 10 State Street Corporation 128 US Country Companies 1 dbr:United_States 103 548 2 dbr:Canada 17 425 3 dbr:Luxembourg 13 984 4 dbr:Sweden 7 934 5 dbr:United_Kingdom 7 421 6 dbr:Belgium 6 868 7 dbr:Ireland 4 762 8 dbr:Australia 4 385 9 dbr:Germany 3 039 10 dbr:Netherlands 2 561 GLEI Data Stats: 2016 (OLD) Sep 2017euBusinessGraph Company and Economic Data
  42. 42. GLEI Data Stats: 2017 Sep 2017euBusinessGraph Company and Economic Data Ultimate Parent Children Country 1 LLOYDS BANKING GROUP PLC 619 GB 2 HSBC HOLDINGS PLC 542 GB 3 THE ROYAL BANK OF SCOTLAND … 378 GB 4 DEUTSCHE BANK AKTIENGESELLSCHAFT 174 DE 5 BANK OF SCOTLAND PLC 111 GB 6 LLOYDS BANK PLC 93 GB 7 Swedbank AB (Publ) 90 SE 8 ROYAL LONDON MUTUAL INSURANCE SOCIETY,LIMITED(THE) 89 GB 9 Lincoln Investment Advisors Corporation 88 US 10 Swedbank Robur AB 85 SE Country Companies 1 US 136 889 2 IT 50 021 3 DE 48 850 4 FR 33 412 5 GB 32 015 6 CA 22 107 7 LU 22 075 8 NL 20 327 9 ES 19 569 10 SE 11 272
  43. 43. Mapping Datasets to DBPedia with the GraphDB Lucene Connector Sep 2017euBusinessGraph Company and Economic Data
  44. 44. Mapping datasets to DBPedia • The task: map people, organizations and locations to IDs in DBPedia − So that we can analyze the original data with the help of the extra information available in DBPedia and other datasets that are related to it, e.g. Geonames − For instance, the data from GLEI doesn’t contain any extra information about the companies, e.g. industry sector, products, etc. • Specific conditions: we had to map by names and locations − There’re little features common for both for the GLEI and DBPedia data ▪ Address and country attributes are present, but those appeared to be marginally useful for mapping − We mapped locations only in terms of countries and/or cities not finer grained locations ▪ For this purpose DBPedia geographic data is sufficient and it is also well mapped with GeoNames Sep 2017euBusinessGraph Company and Economic Data
  45. 45. Mapping datasets to DBPedia (2) • We used the GraphDB connector to Lucene for these mappings − Using the GraphDB connector, Lucene index was created for Organizations and People from DBPedia, indexing all sorts of names, descriptions and other textual information for each entity − The mapping process consists mostly of using the name of the entity from the 3rd party dataset (in this case Panama Papers or GLEI) as a FTS query, embedded in a SPARQL query • What is that Lucence does better than SPARQL? − When there is little information other than the name, we benefit from the free text indexing of Lucene, because it deals well with minor syntactic variations and sorts the results by relevance − When mappings 300 000 organizations against another 500 000 organizations, without a key, the complexity of a SPARQL query is 300 000 x 500 000, which is slower that 300 000 Lucene queries Sep 2017euBusinessGraph Company and Economic Data
  46. 46. Mapping GLEI to DBPedia • Data Pre-processing in DBPedia − We generated primary city and primary country for each organization in DBPedia ▪ Also cleaned up data about HQ locations, etc. ▪ We used a series of SPARQL queries for this • Iterative matching − Match first those that have high relevance and match better constraints by location and country • Matching outcome − skos:exactMatch: 3880 matches − skos:closeMatch: 5825 matches Sep 2017euBusinessGraph Company and Economic Data
  47. 47. Thank you! Experience the technology with our demonstrators NOW: Semantic News Portal http://now.ontotext.com RANK: News popularity ranking for companies http://rank.ontotext.com FactForge: Hub for open data and news about People and Organizations http://factforge.net Sep 2017euBusinessGraph Company and Economic Data

×