SlideShare a Scribd company logo
1 of 19
Download to read offline
Digital Enterprise Research Institute                                                            www.deri.ie




                  Challenges Ahead For Converging Financial
                                    Data
                                                Edward Curry1, Andreas Harth2, Sean O’Riain1

                                                DERI, NUI Galway, Ireland1
                 2   Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB),
                                        Karlsruher Institut für Technologie (KIT)




W3C Workshop on Improving Access to
Financial Data on the Web

October 2009, Arlington, Virginia USA
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Agenda
Digital Enterprise Research Institute                          www.deri.ie




       n    Motivation - Financial Data Ecosystem
              ¨    Data Providers
              ¨    Data Formats
              ¨    Data Consumers
       n    Converging Financial Data from Multiple Sources
              ¨    Entity Centric Approach
              ¨    Architecture
              ¨    Identity Mismatch
              ¨    Data Query
       n    Data Integration Challenges
       n    Recommendations
Financial Data Ecosystem
Digital Enterprise Research Institute                www.deri.ie


                    Information             Information
                    Providers               Consumers




                                        ?
Financial Information Providers
Digital Enterprise Research Institute                          www.deri.ie



      n    Individuals: e.g. CEOs reporting equity sale
      n    Companies: e.g. 10-K filing
      n    NGOs: e.g. sector-wide lobbying groups
      n    Government: e.g. regulators, central banks, statistics
            offices
      n    Worldwide organisations: UN, OECD
      n    Academics: various economists, public policy
      n    ...

      n    Publicly available datasets, purchased datasets or in-
            house sources
Various Data Formats
Digital Enterprise Research Institute                                         www.deri.ie



      n    Unstructed Text
             ¨    News articles, press releases, raw transcripts of investor calls
      n    Hypertext
             ¨    Coporate websites, goverment websites, ...
      n    Spreadsheets, et al.
             ¨    CSV files, word docs, pdf, powerpoint, ...
      n    Strucutred Data
             ¨    XML, XBRL, CSV, SDMX, ...
      n    Graph Structured Data in RDF
             ¨    DBPedia, CrunchBase, RSS-CB, ...
Financial Information Consumers
Digital Enterprise Research Institute                                        www.deri.ie



      n    Competitive Analysis
             ¨    Mash-up of financial figures and analyst commentary for
                   decision support

      n    Regulatory Compliance
             ¨    Forensic Economics
             ¨    Spotting patterns or conditions that support fraud or money
                   laundering

      n    Investment Analysis
             ¨    Individual/Institutional investors
             ¨    Transparent fund comparisons
             ¨    Evaluate potential fund return
Goal
Digital Enterprise Research Institute                       www.deri.ie



      n    Integrate data for:
             ¨    Central access
             ¨    Cross document analysis


      n    Our group works in data integration and have applied
            our approach to pilots in the financial services
            industry

      n    Report on experiences and lessons learned
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Provide common data platform for search, browsing,
         analysis, and interactive visualisations across sources

   n    Entity centric approach
           ¨    Single data view allowing information filtering and cross
                 analysis
           ¨    Consolidate data into coherent graph 'mashed up' from
                 potentially thousands of sources

   n    Key challenge is semantic integration of structured
         and unstructured data from the open Web and internal
         corporate data sources
Converging Financial Data from Multiple
         Sources
Digital Enterprise Research Institute                                        www.deri.ie



   n    Large graph of RDF entities

   n    Entities typed according to what they describe
           ¨    People, locations, organizations, publications as well as
                 documents
           ¨    Inter-relations and structured descriptions of entities

   n    Entities have specified relations to other entities
           ¨    People can work for companies, people know other people,
                 people author documents, organisations are based in
                 locations, and so on
Data Integration Approach
Digital Enterprise Research Institute                            www.deri.ie



      n    Lifting data sources to common format, in our case
            RDF (Resource Description Format)

      n    Integrating the disparate datasets into a holistic
            dataset by aligning entities and concepts

      n    Run domain/task specific analysis algorithms on
            integrated data

      n    Interactive browsing and exploration of integrated
            data or results of algorithmic analysis
Data Integration Approach
Digital Enterprise Research Institute   www.deri.ie
Architecture
Digital Enterprise Research Institute   www.deri.ie
Identity Mismatch
Digital Enterprise Research Institute                                      www.deri.ie


  n    Need to connect sources that may describe the same
        data on a particular entity

  n    Case studies analyzing connections between people
        and organizations
         ¨    SEC filings (Form 4) identified 69K people connected to 80K
               organizations
         ¨    Same analysis on database describing companies produced
               122K people connected to 140K organizations
         ¨    Data needed to be enrich and interlinked using entity
               consolidation (a.k.a. object consolidation) to avoid having the
               knowledge split over numerous instances
         ¨    Ontology-based disambiguation
Data Query
Digital Enterprise Research Institute                                   www.deri.ie



  n    SPARQL, the semantic query language allows queries/
        questions to be asked:

         ¨    What do the companies ‘Microsoft’ and ‘IBM’ have in common?


         ¨    What competitors of ‘HP’ are in ‘Arlington’?


         ¨    What’s the relationship between ‘Microsoft’ and ‘IBM’?
Data Integration Challenges
Digital Enterprise Research Institute                                    www.deri.ie



  n    Text/Data Mismatch
         ¨    Human language often ambiguous
         ¨    Same company might be referred to in several variations
               (e.g. IBM, International Business Machines, Big Blue)
         ¨    Ambiguity makes cross-linking with structured data difficult
  n    Object Identity and Separate Schema
         ¨    Sources differ in how they state the same fact
         ¨    Differences on level of individual objects and schema
         ¨    SEC Central Index Key (CIK) to identify people (CEOs, CFOs),
               companies, and financial instruments
         ¨    DBpedia use URIs to identify same entities
         ¨    Methods have to be in place for reconciling different
               representations of objects and schema
Data Integration Challenges
Digital Enterprise Research Institute                                          www.deri.ie



      n    Abstraction Levels (Data Context)
             ¨    Financial data sources provide data at incompatible levels of
                   abstraction
             ¨    Classify data in taxonomies pertinent to a certain sector
             ¨    Differences in legislation on book-keeping (e.g. Indicators
                   from Euro regulators may not be directly comparable with
                   indicators from US-based regulators)
             ¨    Differences in geographic aggregation (e.g. region data from
                   one source and country-level data from another, IBM Ireland
                   Ltd, IBM Europe, IBM Global,…)
Data Integration Challenges
Digital Enterprise Research Institute                                        www.deri.ie



      n    Data Quality
             ¨    General challenge integrating data from multiple sources
             ¨    Errors in signage, amounts, labelling, and classification can
                   seriously impede utility of systems operating on such data
             ¨    Combining erroneous data aggravates the problem
             ¨    Within open environment data aggregator has little or no
                   influence on the data publisher
             ¨    Challenge for data publishers/consumers to coordinate to fix
                   problems in data or blacklist sites providing unreliable data
Recommendations
Digital Enterprise Research Institute                                         www.deri.ie


      n    Agree approach to the specification and use of
            common identifiers or at least their mappings

      n    Adhering to common publishing method reduces
            integration effort and facilitates data reuse
             ¨    Linked Data principles


      n    Convergence between data providers requires
            coordination and time
             ¨    No need for “Big Bang” integration
             ¨    Follow a pay-as-you-go iterative approach to integration
Digital Enterprise Research Institute                                         www.deri.ie




                                   Thank you for listening

            E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging
            Financial Data,” in Proceedings of the XBRL/W3C Workshop on
            Improving Access to Financial Data on the Web, 2009.

More Related Content

What's hot

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementEdward Curry
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipEdward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeEdward Curry
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information ManagementEdward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeEdward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachEdward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 

What's hot (20)

Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 

Viewers also liked

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInstituto Familia y Adopción
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityEdward Curry
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
 

Viewers also liked (7)

A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
 
Big Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business OpportunityBig Data Analytics: A New Business Opportunity
Big Data Analytics: A New Business Opportunity
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 

Similar to Challenges Ahead for Converging Financial Data

The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web dataGeorg Guentner
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Alexandre Passant
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open DataDerilinx
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandJukka Huhtamäki
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - OverviewFIWARE
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - OverviewFIWARE
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Srini Bezwada
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in GovernmentNeo4j
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...jodischneider
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperAlexandre Passant
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageCCG
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryCoert Du Plessis (杜康)
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Alexandre Passant
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

Similar to Challenges Ahead for Converging Financial Data (20)

Innovation ecosystems value co-creation
Innovation ecosystems value co-creationInnovation ecosystems value co-creation
Innovation ecosystems value co-creation
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
 
Innovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, FinlandInnovation Ecosystems at EBRF 2010, Nokia, Finland
Innovation Ecosystems at EBRF 2010, Nokia, Finland
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
 
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
Smart analytics - Data Visualisation and Predictive Analytics solutions for F...
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
 
1. The Importance of Graphs in Government
1. The Importance of Graphs in Government1. The Importance of Graphs in Government
1. The Importance of Graphs in Government
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive AdvantageHow to Monetize Your Data Assets and Gain a Competitive Advantage
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Challenges Ahead for Converging Financial Data

  • 1. Digital Enterprise Research Institute www.deri.ie Challenges Ahead For Converging Financial Data Edward Curry1, Andreas Harth2, Sean O’Riain1 DERI, NUI Galway, Ireland1 2 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT) W3C Workshop on Improving Access to Financial Data on the Web October 2009, Arlington, Virginia USA © Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Agenda Digital Enterprise Research Institute www.deri.ie n  Motivation - Financial Data Ecosystem ¨  Data Providers ¨  Data Formats ¨  Data Consumers n  Converging Financial Data from Multiple Sources ¨  Entity Centric Approach ¨  Architecture ¨  Identity Mismatch ¨  Data Query n  Data Integration Challenges n  Recommendations
  • 3. Financial Data Ecosystem Digital Enterprise Research Institute www.deri.ie Information Information Providers Consumers ?
  • 4. Financial Information Providers Digital Enterprise Research Institute www.deri.ie n  Individuals: e.g. CEOs reporting equity sale n  Companies: e.g. 10-K filing n  NGOs: e.g. sector-wide lobbying groups n  Government: e.g. regulators, central banks, statistics offices n  Worldwide organisations: UN, OECD n  Academics: various economists, public policy n  ... n  Publicly available datasets, purchased datasets or in- house sources
  • 5. Various Data Formats Digital Enterprise Research Institute www.deri.ie n  Unstructed Text ¨  News articles, press releases, raw transcripts of investor calls n  Hypertext ¨  Coporate websites, goverment websites, ... n  Spreadsheets, et al. ¨  CSV files, word docs, pdf, powerpoint, ... n  Strucutred Data ¨  XML, XBRL, CSV, SDMX, ... n  Graph Structured Data in RDF ¨  DBPedia, CrunchBase, RSS-CB, ...
  • 6. Financial Information Consumers Digital Enterprise Research Institute www.deri.ie n  Competitive Analysis ¨  Mash-up of financial figures and analyst commentary for decision support n  Regulatory Compliance ¨  Forensic Economics ¨  Spotting patterns or conditions that support fraud or money laundering n  Investment Analysis ¨  Individual/Institutional investors ¨  Transparent fund comparisons ¨  Evaluate potential fund return
  • 7. Goal Digital Enterprise Research Institute www.deri.ie n  Integrate data for: ¨  Central access ¨  Cross document analysis n  Our group works in data integration and have applied our approach to pilots in the financial services industry n  Report on experiences and lessons learned
  • 8. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Provide common data platform for search, browsing, analysis, and interactive visualisations across sources n  Entity centric approach ¨  Single data view allowing information filtering and cross analysis ¨  Consolidate data into coherent graph 'mashed up' from potentially thousands of sources n  Key challenge is semantic integration of structured and unstructured data from the open Web and internal corporate data sources
  • 9. Converging Financial Data from Multiple Sources Digital Enterprise Research Institute www.deri.ie n  Large graph of RDF entities n  Entities typed according to what they describe ¨  People, locations, organizations, publications as well as documents ¨  Inter-relations and structured descriptions of entities n  Entities have specified relations to other entities ¨  People can work for companies, people know other people, people author documents, organisations are based in locations, and so on
  • 10. Data Integration Approach Digital Enterprise Research Institute www.deri.ie n  Lifting data sources to common format, in our case RDF (Resource Description Format) n  Integrating the disparate datasets into a holistic dataset by aligning entities and concepts n  Run domain/task specific analysis algorithms on integrated data n  Interactive browsing and exploration of integrated data or results of algorithmic analysis
  • 11. Data Integration Approach Digital Enterprise Research Institute www.deri.ie
  • 13. Identity Mismatch Digital Enterprise Research Institute www.deri.ie n  Need to connect sources that may describe the same data on a particular entity n  Case studies analyzing connections between people and organizations ¨  SEC filings (Form 4) identified 69K people connected to 80K organizations ¨  Same analysis on database describing companies produced 122K people connected to 140K organizations ¨  Data needed to be enrich and interlinked using entity consolidation (a.k.a. object consolidation) to avoid having the knowledge split over numerous instances ¨  Ontology-based disambiguation
  • 14. Data Query Digital Enterprise Research Institute www.deri.ie n  SPARQL, the semantic query language allows queries/ questions to be asked: ¨  What do the companies ‘Microsoft’ and ‘IBM’ have in common? ¨  What competitors of ‘HP’ are in ‘Arlington’? ¨  What’s the relationship between ‘Microsoft’ and ‘IBM’?
  • 15. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Text/Data Mismatch ¨  Human language often ambiguous ¨  Same company might be referred to in several variations (e.g. IBM, International Business Machines, Big Blue) ¨  Ambiguity makes cross-linking with structured data difficult n  Object Identity and Separate Schema ¨  Sources differ in how they state the same fact ¨  Differences on level of individual objects and schema ¨  SEC Central Index Key (CIK) to identify people (CEOs, CFOs), companies, and financial instruments ¨  DBpedia use URIs to identify same entities ¨  Methods have to be in place for reconciling different representations of objects and schema
  • 16. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Abstraction Levels (Data Context) ¨  Financial data sources provide data at incompatible levels of abstraction ¨  Classify data in taxonomies pertinent to a certain sector ¨  Differences in legislation on book-keeping (e.g. Indicators from Euro regulators may not be directly comparable with indicators from US-based regulators) ¨  Differences in geographic aggregation (e.g. region data from one source and country-level data from another, IBM Ireland Ltd, IBM Europe, IBM Global,…)
  • 17. Data Integration Challenges Digital Enterprise Research Institute www.deri.ie n  Data Quality ¨  General challenge integrating data from multiple sources ¨  Errors in signage, amounts, labelling, and classification can seriously impede utility of systems operating on such data ¨  Combining erroneous data aggravates the problem ¨  Within open environment data aggregator has little or no influence on the data publisher ¨  Challenge for data publishers/consumers to coordinate to fix problems in data or blacklist sites providing unreliable data
  • 18. Recommendations Digital Enterprise Research Institute www.deri.ie n  Agree approach to the specification and use of common identifiers or at least their mappings n  Adhering to common publishing method reduces integration effort and facilitates data reuse ¨  Linked Data principles n  Convergence between data providers requires coordination and time ¨  No need for “Big Bang” integration ¨  Follow a pay-as-you-go iterative approach to integration
  • 19. Digital Enterprise Research Institute www.deri.ie Thank you for listening E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.