SlideShare a Scribd company logo
1 of 13
Download to read offline
HiTiME project description




Christian Roosendaal (christian.roosendaal@gmail.com),
          Vyacheslav Tykhonov (vty@iisg.nl),
      HiTiME System developers IISH Amsterdam
HiTiME prototype data flow
           Source                                NER
            data
                                                       NER
                    1.                                   NER                           Training sets from
                                                                                         IISH archives
              CMS
     (Drupal, WordPress, …,)
                                                  6.
                    2.                     Entity Recognize module
                                ●   Retrieve document tokens
                                ●   Send to NER by telnet
                                ●   If token is recognized entity → store in DB

Input DB
                                           5.

                    3.
                                                         7.                    Meanings module
         Processing module                                        ●   Look for sequences of entities
 ●    Check for new documents                                     ●   Replace with known composite entities
 ●    Split into words          4.
 ●    Store in DB
                                       Knowledge Base
IISH systems integration
                                                      OCR application
   LINKS                                              ● Scans, posters,

   Database with 8000+ professions                      archives
   ● Create training sets




                                                                      Evergreen library
                                        HiTiME application            System
                                        - Persons                     ● Create training sets for

                                        - Organizations                 authority records
                                        - Locations                   ● Improve MARC21

                                        - Dates                       ● records

                                        - Professions
                                                                          PID service
  Knowledge base                                                          ● Store entities

  Export data to e.g.
  RDF, OWL, XML                                                            External applications
                                                      Search               ● BWSA

                                                      search.iisg.nl       ● Timeline
Clio-Infrastructure                                   ● Improve metadata ● Visual Mets
● Infrastructure to store data from different systems
                                                      ● Extend
● Connect dates and locations with datasets

● Find relevant documents in time/location domain
                                                        functionality with
● Visualize trends relevant to documents
                                                        new filters
System design
●   HiTiME core checks for new or updated documents in input database
●   Input database can be any type of database with timestamps



doc_id           last_modified       data
Document 1       12-13-12 12:04      “Petrus Alma is
                                     great...”
Document 2       12-13-12 11:37      “...”




             Input
              data               HiTiME core

                                                            doc_id       last_modified    data
                                                            Document 1   12-13-12 12:04   <person>Petrus
                                                       KB                                 Alma</person> is
                                                                                          great...”
                                                            Document 2   12-13-12 11:37   “...”
Database design (1/2)
          Example string: “Petrus Alma is great”

Split text into words and store words separately in table:

  doc_id                            word_id                        word

  0                                 0                              Petrus

  0                                 1                              Alma

  0                                 2                              is
  0                                 3                              great


 Store coordinates of each word in coordinate table:

      doc_id     sentence_id   position   word_id   meaning_flag   identity_id
      0          0             0          0         0
      0          0             1          1         0
      0          0             2          2         0
      0          0             3          3         0
Database Design (2/2)


Processing of text by NER.
   Output of NER:                          Store in decision table:
                                 word_id       NER          Frog        Heidel    UCTO         Decision
    “Petrus” → PERS
    “Alma” → PERS                0             PERS                                            PERS
    “is”     → 0                 1             PERS                                            PERS
    “great” → 0

                                 Update meaning_flag in coordinate table:
                        doc_id   sentence_id     position     word_id     meaning_flag   identity_id
                        0        0               0            0           1
                        0        0               1            1           1
                        0        0               2            2           0
                        0        0               3            3           0
Improvement : Integration of FROG,
      UCTO and HeidelTime


● Prototype only uses NER, and crude methods to split raw text
into sentences and words
● Splitting can be made more reliable with UCTO and FROG

● Time expressions are not recognized in prototype → HeidelTime
Improvement: Disambiguation of
           recognized entities (1/2)
Word               NER          Frog         Heidel       ...           Decision
Amsterdam          LOC                                                  LOC




Amsterdam is a location. Seems right, but what if the text means the VOC ship “Amsterdam”?
Improvement: Disambiguation of
    recognized entities (2/2)
      NER can be trained to improve accuracy. By making
      use of differently trained NER's
      we can build an Expert System:

      Word                   NER              Frog           Heidel   NER2   NER3   Decision
      Amsterdam              LOC                                      SHIP   BAND   ?



    Final decision can be made based on priorities of trained models.
    Our idea is to assign lowest priorities to wide scope models.

Ships
Amsterdam (VOC ship), an 18th century cargo ship

MS Amsterdam, a cruise ship owned and operated by Holland America
Line
Music
Amsterdam (band), a pop band from the United Kingdom

"Amsterdam" (Jacques Brel song), a song by Jacques
Brel
Improvement: “composite” entities
             (1/2)
      In our prototype:

                   “Petrus Alma is great”



 Recognized as person     Recognized as person



                  Should be:


                “Petrus Alma is great”



             Recognized as one person
Improvement: “composite” entities (2/2)
         Possible solution: Keep track of known entities in separate entities table:

         Search for sequences of recognized entities in coordinate table:
     doc_id     sentence_id       position   word_id   meaning_flag    identity_id
     0          0                 0          0         1               0
     0          0                 1          1         1               0
     0          0                 2          2         0
     0          0                 3          3         0



                                                   “Petrus Alma”
                 Compare these sequences with entities in entities table:

 identity_id   name                   type
 0             Petrus Alma            PERS                         Final decision about entity:
 1             Aron van Dam           PERS
                                                               identity_id   name          type
 2             Frederik Feringa       PERS
                                                               0             Petrus Alma   PERS
BWSA application before processing
BWSA application after processing

More Related Content

What's hot

Hadoop file system
Hadoop file systemHadoop file system
Hadoop file systemJohn Veigas
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoopbalajiganesan03
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALRandy Splinter
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
 
Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Iron Mountain
 
SSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSamsung Business USA
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1sairahul321
 
Advanced OpenSplice Programming - Part II
Advanced OpenSplice Programming - Part IIAdvanced OpenSplice Programming - Part II
Advanced OpenSplice Programming - Part IIAngelo Corsaro
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodIRJET Journal
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesDavid Walker
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project ReportTushar Dalvi
 
VO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyVO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyJoint ALMA Observatory
 
Classical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSClassical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSAngelo Corsaro
 

What's hot (20)

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Whither Small Data?
Whither Small Data?Whither Small Data?
Whither Small Data?
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
 
Lee oracle
Lee oracleLee oracle
Lee oracle
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoop
 
Academic Workflows with iRODS FINAL
Academic Workflows with iRODS FINALAcademic Workflows with iRODS FINAL
Academic Workflows with iRODS FINAL
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
 
Maximize the Business Value of Your Information
Maximize the Business Value of Your Information Maximize the Business Value of Your Information
Maximize the Business Value of Your Information
 
SSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-Processing
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1
 
Advanced OpenSplice Programming - Part II
Advanced OpenSplice Programming - Part IIAdvanced OpenSplice Programming - Part II
Advanced OpenSplice Programming - Part II
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control Method
 
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store DatabasesStorage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Data ware house
Data ware houseData ware house
Data ware house
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
VO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyVO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomy
 
Ddn 2017 10_dse_primer
Ddn 2017 10_dse_primerDdn 2017 10_dse_primer
Ddn 2017 10_dse_primer
 
Classical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSClassical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDS
 

Viewers also liked

Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsvty
 
The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)vty
 
Bridging research and collections
Bridging research and collectionsBridging research and collections
Bridging research and collectionsvty
 
FAIR Dataverse
FAIR DataverseFAIR Dataverse
FAIR Dataversevty
 
API economy
API economyAPI economy
API economyvty
 
Clio infra Collabs data analysis tools
Clio infra Collabs data analysis toolsClio infra Collabs data analysis tools
Clio infra Collabs data analysis toolsvty
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 

Viewers also liked (7)

Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical maps
 
The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)
 
Bridging research and collections
Bridging research and collectionsBridging research and collections
Bridging research and collections
 
FAIR Dataverse
FAIR DataverseFAIR Dataverse
FAIR Dataverse
 
API economy
API economyAPI economy
API economy
 
Clio infra Collabs data analysis tools
Clio infra Collabs data analysis toolsClio infra Collabs data analysis tools
Clio infra Collabs data analysis tools
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 

Similar to HiTIME project

Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
A unified data modeler in the world of big data
A unified data modeler in the world of big dataA unified data modeler in the world of big data
A unified data modeler in the world of big dataWilliam Luk
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big data sketch-and-possible-usecases2
Big data sketch-and-possible-usecases2Big data sketch-and-possible-usecases2
Big data sketch-and-possible-usecases2Dmitri Apassov
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackDenodo
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix B.V.
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckDaystromTech
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemAbishek V S
 
Relational
RelationalRelational
Relationaldieover
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureThomas Uhl
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red_Hat_Storage
 

Similar to HiTIME project (20)

Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
A unified data modeler in the world of big data
A unified data modeler in the world of big dataA unified data modeler in the world of big data
A unified data modeler in the world of big data
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data sketch-and-possible-usecases2
Big data sketch-and-possible-usecases2Big data sketch-and-possible-usecases2
Big data sketch-and-possible-usecases2
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop Solution
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide Deck
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Relational
RelationalRelational
Relational
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
 
Dbms9
Dbms9Dbms9
Dbms9
 

More from vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 

More from vty (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 

Recently uploaded

Seerah un nabi Muhammad Quiz Part-1.pdf
Seerah un nabi  Muhammad Quiz Part-1.pdfSeerah un nabi  Muhammad Quiz Part-1.pdf
Seerah un nabi Muhammad Quiz Part-1.pdfAnsariB1
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Naveed Bangali
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
black magic specialist amil baba pakistan no 1 Black magic contact number rea...
black magic specialist amil baba pakistan no 1 Black magic contact number rea...black magic specialist amil baba pakistan no 1 Black magic contact number rea...
black magic specialist amil baba pakistan no 1 Black magic contact number rea...Amil Baba Mangal Maseeh
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxNetwork Bible Fellowship
 
Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxjainismworldseo
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfRebeccaSealfon
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedVintage Church
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Bassem Matta
 
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls DubaiDubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubaikojalkojal131
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfRebeccaSealfon
 
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证jdkhjh
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24deerfootcoc
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAmil Baba Mangal Maseeh
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAmil Baba Mangal Maseeh
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahoreamil baba kala jadu
 

Recently uploaded (20)

Seerah un nabi Muhammad Quiz Part-1.pdf
Seerah un nabi  Muhammad Quiz Part-1.pdfSeerah un nabi  Muhammad Quiz Part-1.pdf
Seerah un nabi Muhammad Quiz Part-1.pdf
 
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort serviceyoung Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
black magic specialist amil baba pakistan no 1 Black magic contact number rea...
black magic specialist amil baba pakistan no 1 Black magic contact number rea...black magic specialist amil baba pakistan no 1 Black magic contact number rea...
black magic specialist amil baba pakistan no 1 Black magic contact number rea...
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
 
Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptx
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024
 
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls DubaiDubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdf
 
Top 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdfTop 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdf
 
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证
原版1:1复刻莫纳什大学毕业证Monash毕业证留信学历认证
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
 
young Call girls in Dwarka sector 3🔝 9953056974 🔝 Delhi escort Service
young Call girls in Dwarka sector 3🔝 9953056974 🔝 Delhi escort Serviceyoung Call girls in Dwarka sector 3🔝 9953056974 🔝 Delhi escort Service
young Call girls in Dwarka sector 3🔝 9953056974 🔝 Delhi escort Service
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahir
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahore
 

HiTIME project

  • 1. HiTiME project description Christian Roosendaal (christian.roosendaal@gmail.com), Vyacheslav Tykhonov (vty@iisg.nl), HiTiME System developers IISH Amsterdam
  • 2. HiTiME prototype data flow Source NER data NER 1. NER Training sets from IISH archives CMS (Drupal, WordPress, …,) 6. 2. Entity Recognize module ● Retrieve document tokens ● Send to NER by telnet ● If token is recognized entity → store in DB Input DB 5. 3. 7. Meanings module Processing module ● Look for sequences of entities ● Check for new documents ● Replace with known composite entities ● Split into words 4. ● Store in DB Knowledge Base
  • 3. IISH systems integration OCR application LINKS ● Scans, posters, Database with 8000+ professions archives ● Create training sets Evergreen library HiTiME application System - Persons ● Create training sets for - Organizations authority records - Locations ● Improve MARC21 - Dates ● records - Professions PID service Knowledge base ● Store entities Export data to e.g. RDF, OWL, XML External applications Search ● BWSA search.iisg.nl ● Timeline Clio-Infrastructure ● Improve metadata ● Visual Mets ● Infrastructure to store data from different systems ● Extend ● Connect dates and locations with datasets ● Find relevant documents in time/location domain functionality with ● Visualize trends relevant to documents new filters
  • 4. System design ● HiTiME core checks for new or updated documents in input database ● Input database can be any type of database with timestamps doc_id last_modified data Document 1 12-13-12 12:04 “Petrus Alma is great...” Document 2 12-13-12 11:37 “...” Input data HiTiME core doc_id last_modified data Document 1 12-13-12 12:04 <person>Petrus KB Alma</person> is great...” Document 2 12-13-12 11:37 “...”
  • 5. Database design (1/2) Example string: “Petrus Alma is great” Split text into words and store words separately in table: doc_id word_id word 0 0 Petrus 0 1 Alma 0 2 is 0 3 great Store coordinates of each word in coordinate table: doc_id sentence_id position word_id meaning_flag identity_id 0 0 0 0 0 0 0 1 1 0 0 0 2 2 0 0 0 3 3 0
  • 6. Database Design (2/2) Processing of text by NER. Output of NER: Store in decision table: word_id NER Frog Heidel UCTO Decision “Petrus” → PERS “Alma” → PERS 0 PERS PERS “is” → 0 1 PERS PERS “great” → 0 Update meaning_flag in coordinate table: doc_id sentence_id position word_id meaning_flag identity_id 0 0 0 0 1 0 0 1 1 1 0 0 2 2 0 0 0 3 3 0
  • 7. Improvement : Integration of FROG, UCTO and HeidelTime ● Prototype only uses NER, and crude methods to split raw text into sentences and words ● Splitting can be made more reliable with UCTO and FROG ● Time expressions are not recognized in prototype → HeidelTime
  • 8. Improvement: Disambiguation of recognized entities (1/2) Word NER Frog Heidel ... Decision Amsterdam LOC LOC Amsterdam is a location. Seems right, but what if the text means the VOC ship “Amsterdam”?
  • 9. Improvement: Disambiguation of recognized entities (2/2) NER can be trained to improve accuracy. By making use of differently trained NER's we can build an Expert System: Word NER Frog Heidel NER2 NER3 Decision Amsterdam LOC SHIP BAND ? Final decision can be made based on priorities of trained models. Our idea is to assign lowest priorities to wide scope models. Ships Amsterdam (VOC ship), an 18th century cargo ship MS Amsterdam, a cruise ship owned and operated by Holland America Line Music Amsterdam (band), a pop band from the United Kingdom "Amsterdam" (Jacques Brel song), a song by Jacques Brel
  • 10. Improvement: “composite” entities (1/2) In our prototype: “Petrus Alma is great” Recognized as person Recognized as person Should be: “Petrus Alma is great” Recognized as one person
  • 11. Improvement: “composite” entities (2/2) Possible solution: Keep track of known entities in separate entities table: Search for sequences of recognized entities in coordinate table: doc_id sentence_id position word_id meaning_flag identity_id 0 0 0 0 1 0 0 0 1 1 1 0 0 0 2 2 0 0 0 3 3 0 “Petrus Alma” Compare these sequences with entities in entities table: identity_id name type 0 Petrus Alma PERS Final decision about entity: 1 Aron van Dam PERS identity_id name type 2 Frederik Feringa PERS 0 Petrus Alma PERS