SlideShare a Scribd company logo
Digital Enterprise Research Institute                                                            www.deri.ie




                  Challenges Ahead For Converging Financial
                                    Data
                                                Edward Curry1, Andreas Harth2, Sean O’Riain1

                                                DERI, NUI Galway, Ireland1
                 2   Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB),
                                        Karlsruher Institut für Technologie (KIT)




W3C Workshop on Improving Access to
Financial Data on the Web

October 2009, Arlington, Virginia USA
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Agenda
Digital Enterprise Research Institute                          www.deri.ie




       n    Motivation - Financial Data Ecosystem
              ¨    Data Providers
              ¨    Data Formats
              ¨    Data Consumers
       n    Converging Financial Data from Multiple Sources
              ¨    Entity Centric Approach
              ¨    Architecture
              ¨    Identity Mismatch
              ¨    Data Query
       n    Data Integration Challenges
       n    Recommendations
Financial Data Ecosystem
Digital Enterprise Research Institute                www.deri.ie


                    Information             Information
                    Providers               Consumers




                                        ?
Financial Information Providers
Digital Enterprise Research Institute                          www.deri.ie



      n    Individuals: e.g. CEOs reporting equity sale
      n    Companies: e.g. 10-K filing
      n    NGOs: e.g. sector-wide lobbying groups
      n    Government: e.g. regulators, central banks, statistics
            offices
      n    Worldwide organisations: UN, OECD
      n    Academics: various economists, public policy
      n    ...

      n    Publicly available datasets, purchased datasets or in-
            house sources
Various Data Formats
Digital Enterprise Research Institute                                         www.deri.ie



      n    Unstructed Text
             ¨    News articles, press releases, raw transcripts of investor calls
      n    Hypertext
             ¨    Coporate websites, goverment websites, ...
      n    Spreadsheets, et al.
             ¨    CSV files, word docs, pdf, powerpoint, ...
      n    Strucutred Data
             ¨    XML, XBRL, CSV, SDMX, ...
      n    Graph Structured Data in RDF
             ¨    DBPedia, CrunchBase, RSS-CB, ...
Financial Information Consumers
Digital Enterprise Research Institute                                        www.deri.ie



      n    Competitive Analysis
             ¨    Mash-up of financial figures and analyst commentary for
                   decision support

      n    Regulatory Compliance
             ¨    Forensic Economics
             ¨    Spotting patterns or conditions that support fraud or money
                   laundering

      n    Investment Analysis
             ¨    Individual/Institutional investors
             ¨    Transparent fund comparisons
             ¨    Evaluate potential fund return
Goal
Digital Enterprise Research Institute                       www.deri.ie



      n    Integrate data for:
             ¨    Central access
             ¨    Cross document analysis


      n    Our group works in data integration and have applied
            our approach to pilots in the financial services
            industry

      n    Report on experiences and lessons learned
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Provide common data platform for search, browsing,
         analysis, and interactive visualisations across sources

   n    Entity centric approach
           ¨    Single data view allowing information filtering and cross
                 analysis
           ¨    Consolidate data into coherent graph 'mashed up' from
                 potentially thousands of sources

   n    Key challenge is semantic integration of structured
         and unstructured data from the open Web and internal
         corporate data sources
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Large graph of RDF entities

   n    Entities typed according to what they describe
           ¨    People, locations, organizations, publications as well as
                 documents
           ¨    Inter-relations and structured descriptions of entities

   n    Entities have specified relations to other entities
           ¨    People can work for companies, people know other people,
                 people author documents, organisations are based in
                 locations, and so on
Data Integration Approach
Digital Enterprise Research Institute                            www.deri.ie



      n    Lifting data sources to common format, in our case
            RDF (Resource Description Format)

      n    Integrating the disparate datasets into a holistic
            dataset by aligning entities and concepts

      n    Run domain/task specific analysis algorithms on
            integrated data

      n    Interactive browsing and exploration of integrated
            data or results of algorithmic analysis
Data Integration Approach
Digital Enterprise Research Institute   www.deri.ie
Architecture
Digital Enterprise Research Institute   www.deri.ie
Identity Mismatch
Digital Enterprise Research Institute                                      www.deri.ie


  n    Need to connect sources that may describe the same
        data on a particular entity

  n    Case studies analyzing connections between people
        and organizations
         ¨    SEC filings (Form 4) identified 69K people connected to 80K
               organizations
         ¨    Same analysis on database describing companies produced
               122K people connected to 140K organizations
         ¨    Data needed to be enrich and interlinked using entity
               consolidation (a.k.a. object consolidation) to avoid having the
               knowledge split over numerous instances
         ¨    Ontology-based disambiguation
Data Query
Digital Enterprise Research Institute                                   www.deri.ie



  n    SPARQL, the semantic query language allows queries/
        questions to be asked:

         ¨    What do the companies ‘Microsoft’ and ‘IBM’ have in common?


         ¨    What competitors of ‘HP’ are in ‘Arlington’?


         ¨    What’s the relationship between ‘Microsoft’ and ‘IBM’?
Data Integration Challenges
Digital Enterprise Research Institute                                    www.deri.ie



  n    Text/Data Mismatch
         ¨    Human language often ambiguous
         ¨    Same company might be referred to in several variations
               (e.g. IBM, International Business Machines, Big Blue)
         ¨    Ambiguity makes cross-linking with structured data difficult
  n    Object Identity and Separate Schema
         ¨    Sources differ in how they state the same fact
         ¨    Differences on level of individual objects and schema
         ¨    SEC Central Index Key (CIK) to identify people (CEOs, CFOs),
               companies, and financial instruments
         ¨    DBpedia use URIs to identify same entities
         ¨    Methods have to be in place for reconciling different
               representations of objects and schema
Data Integration Challenges
Digital Enterprise Research Institute                                          www.deri.ie



      n    Abstraction Levels (Data Context)
             ¨    Financial data sources provide data at incompatible levels of
                   abstraction
             ¨    Classify data in taxonomies pertinent to a certain sector
             ¨    Differences in legislation on book-keeping (e.g. Indicators
                   from Euro regulators may not be directly comparable with
                   indicators from US-based regulators)
             ¨    Differences in geographic aggregation (e.g. region data from
                   one source and country-level data from another, IBM Ireland
                   Ltd, IBM Europe, IBM Global,…)
Data Integration Challenges
Digital Enterprise Research Institute                                        www.deri.ie



      n    Data Quality
             ¨    General challenge integrating data from multiple sources
             ¨    Errors in signage, amounts, labelling, and classification can
                   seriously impede utility of systems operating on such data
             ¨    Combining erroneous data aggravates the problem
             ¨    Within open environment data aggregator has little or no
                   influence on the data publisher
             ¨    Challenge for data publishers/consumers to coordinate to fix
                   problems in data or blacklist sites providing unreliable data
Recommendations
Digital Enterprise Research Institute                                         www.deri.ie


      n    Agree approach to the specification and use of
            common identifiers or at least their mappings

      n    Adhering to common publishing method reduces
            integration effort and facilitates data reuse
             ¨    Linked Data principles


      n    Convergence between data providers requires
            coordination and time
             ¨    No need for “Big Bang” integration
             ¨    Follow a pay-as-you-go iterative approach to integration
Digital Enterprise Research Institute                                         www.deri.ie




                                   Thank you for listening

            E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging
            Financial Data,” in Proceedings of the XBRL/W3C Workshop on
            Improving Access to Financial Data on the Web, 2009.

More Related Content

What's hot

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
Edward Curry
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Edward Curry
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
Edward Curry
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Edward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
Edward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
Edward Curry
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Edward Curry
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
Edward Curry
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
Edward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
Edward Curry
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
Edward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Edward Curry
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Edward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Edward Curry
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
Edward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
Edward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
Edward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Edward Curry
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
Edward Curry
 

What's hot (20)

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 

Viewers also liked

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
Edward Curry
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
Edward Curry
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
Instituto Familia y Adopción
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business Opportunity
Edward Curry
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Edward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
Edward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
Edward Curry
 

Viewers also liked (7)

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business Opportunity
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 

Similar to Challenges Ahead for Converging Financial Data

Innovation ecosystems value co-creation
Innovation ecosystems value co-creationInnovation ecosystems value co-creation
Innovation ecosystems value co-creation
Innovation Ecosystems Network (IEN)
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
Georg Guentner
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
Alexandre Passant
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
Bianca Pereira
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
Derilinx
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandJukka Huhtamäki
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
Richard Cyganiak
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
FIWARE
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
FIWARE
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Srini Bezwada
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
Liam Ó Móráin
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in Government
Neo4j
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...
jodischneider
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
Alexandre Passant
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive Advantage
CCG
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
Coert Du Plessis (杜康)
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
Alexandre Passant
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 

Similar to Challenges Ahead for Converging Financial Data (20)

Innovation ecosystems value co-creation
Innovation ecosystems value co-creationInnovation ecosystems value co-creation
Innovation ecosystems value co-creation
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, Finland
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in Government
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Challenges Ahead for Converging Financial Data

  • 1. Digital Enterprise Research Institute www.deri.ie Challenges Ahead For Converging Financial Data Edward Curry1, Andreas Harth2, Sean O’Riain1 DERI, NUI Galway, Ireland1 2 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT) W3C Workshop on Improving Access to Financial Data on the Web October 2009, Arlington, Virginia USA © Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Agenda Digital Enterprise Research Institute www.deri.ie n  Motivation - Financial Data Ecosystem ¨  Data Providers ¨  Data Formats ¨  Data Consumers n  Converging Financial Data from Multiple Sources ¨  Entity Centric Approach ¨  Architecture ¨  Identity Mismatch ¨  Data Query n  Data Integration Challenges n  Recommendations
  • 3. Financial Data Ecosystem Digital Enterprise Research Institute www.deri.ie Information Information Providers Consumers ?
  • 4. Financial Information Providers Digital Enterprise Research Institute www.deri.ie n  Individuals: e.g. CEOs reporting equity sale n  Companies: e.g. 10-K filing n  NGOs: e.g. sector-wide lobbying groups n  Government: e.g. regulators, central banks, statistics offices n  Worldwide organisations: UN, OECD n  Academics: various economists, public policy n  ... n  Publicly available datasets, purchased datasets or in- house sources
  • 5. Various Data Formats Digital Enterprise Research Institute www.deri.ie n  Unstructed Text ¨  News articles, press releases, raw transcripts of investor calls n  Hypertext ¨  Coporate websites, goverment websites, ... n  Spreadsheets, et al. ¨  CSV files, word docs, pdf, powerpoint, ... n  Strucutred Data ¨  XML, XBRL, CSV, SDMX, ... n  Graph Structured Data in RDF ¨  DBPedia, CrunchBase, RSS-CB, ...
  • 6. Financial Information Consumers Digital Enterprise Research Institute www.deri.ie n  Competitive Analysis ¨  Mash-up of financial figures and analyst commentary for decision support n  Regulatory Compliance ¨  Forensic Economics ¨  Spotting patterns or conditions that support fraud or money laundering n  Investment Analysis ¨  Individual/Institutional investors ¨  Transparent fund comparisons ¨  Evaluate potential fund return
  • 7. Goal Digital Enterprise Research Institute www.deri.ie n  Integrate data for: ¨  Central access ¨  Cross document analysis n  Our group works in data integration and have applied our approach to pilots in the financial services industry n  Report on experiences and lessons learned
  • 8. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Provide common data platform for search, browsing, analysis, and interactive visualisations across sources n  Entity centric approach ¨  Single data view allowing information filtering and cross analysis ¨  Consolidate data into coherent graph 'mashed up' from potentially thousands of sources n  Key challenge is semantic integration of structured and unstructured data from the open Web and internal corporate data sources
  • 9. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Large graph of RDF entities n  Entities typed according to what they describe ¨  People, locations, organizations, publications as well as documents ¨  Inter-relations and structured descriptions of entities n  Entities have specified relations to other entities ¨  People can work for companies, people know other people, people author documents, organisations are based in locations, and so on
  • 10. Data Integration Approach Digital Enterprise Research Institute www.deri.ie n  Lifting data sources to common format, in our case RDF (Resource Description Format) n  Integrating the disparate datasets into a holistic dataset by aligning entities and concepts n  Run domain/task specific analysis algorithms on integrated data n  Interactive browsing and exploration of integrated data or results of algorithmic analysis
  • 11. Data Integration Approach Digital Enterprise Research Institute www.deri.ie
  • 13. Identity Mismatch Digital Enterprise Research Institute www.deri.ie n  Need to connect sources that may describe the same data on a particular entity n  Case studies analyzing connections between people and organizations ¨  SEC filings (Form 4) identified 69K people connected to 80K organizations ¨  Same analysis on database describing companies produced 122K people connected to 140K organizations ¨  Data needed to be enrich and interlinked using entity consolidation (a.k.a. object consolidation) to avoid having the knowledge split over numerous instances ¨  Ontology-based disambiguation
  • 14. Data Query Digital Enterprise Research Institute www.deri.ie n  SPARQL, the semantic query language allows queries/ questions to be asked: ¨  What do the companies ‘Microsoft’ and ‘IBM’ have in common? ¨  What competitors of ‘HP’ are in ‘Arlington’? ¨  What’s the relationship between ‘Microsoft’ and ‘IBM’?
  • 15. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Text/Data Mismatch ¨  Human language often ambiguous ¨  Same company might be referred to in several variations (e.g. IBM, International Business Machines, Big Blue) ¨  Ambiguity makes cross-linking with structured data difficult n  Object Identity and Separate Schema ¨  Sources differ in how they state the same fact ¨  Differences on level of individual objects and schema ¨  SEC Central Index Key (CIK) to identify people (CEOs, CFOs), companies, and financial instruments ¨  DBpedia use URIs to identify same entities ¨  Methods have to be in place for reconciling different representations of objects and schema
  • 16. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Abstraction Levels (Data Context) ¨  Financial data sources provide data at incompatible levels of abstraction ¨  Classify data in taxonomies pertinent to a certain sector ¨  Differences in legislation on book-keeping (e.g. Indicators from Euro regulators may not be directly comparable with indicators from US-based regulators) ¨  Differences in geographic aggregation (e.g. region data from one source and country-level data from another, IBM Ireland Ltd, IBM Europe, IBM Global,…)
  • 17. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Data Quality ¨  General challenge integrating data from multiple sources ¨  Errors in signage, amounts, labelling, and classification can seriously impede utility of systems operating on such data ¨  Combining erroneous data aggravates the problem ¨  Within open environment data aggregator has little or no influence on the data publisher ¨  Challenge for data publishers/consumers to coordinate to fix problems in data or blacklist sites providing unreliable data
  • 18. Recommendations Digital Enterprise Research Institute www.deri.ie n  Agree approach to the specification and use of common identifiers or at least their mappings n  Adhering to common publishing method reduces integration effort and facilitates data reuse ¨  Linked Data principles n  Convergence between data providers requires coordination and time ¨  No need for “Big Bang” integration ¨  Follow a pay-as-you-go iterative approach to integration
  • 19. Digital Enterprise Research Institute www.deri.ie Thank you for listening E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.