SlideShare a Scribd company logo
1 of 20
Download to read offline
Using KB APIs to collect data
KB websites like Delpher.nl or GeheugenvanNederland.nl use the KBGA
infrastructure to search, retrieve and display digital content.
This infrastructure is also available to Digital Humanities scholars.
KBGA can be described as a set of datatypes and APIs, providing
dataservices.
Example dataset: ANP Radiobulletins
• About 1.5 million digitized typoscripts from radio news broadcasts between
1937 and 1984
• Available through the Delpher website as Radiobulletins collection
• KB offers the data under (semi) open licenses:
• CC0-license for the metadata, CC-BY-NC-ND-licenses for images and
full-text objects
Datatypes in the KBGA Infrastructure
• Structure metadata in MPEG21 DIDL format
Datatypes in the KBGA Infrastructure
• Structure metadata in MPEG21 DIDL format
Datatypes in the KBGA Infrastructure
• Structure metadata in MPEG21 DIDL format
Datatypes in the KBGA Infrastructure
• Structure metadata in MPEG21 DIDL format
• Descriptive metadata in the Dublin Core (DC) vocabulary, 

with some KB-specific extensions (‘DCX’)
• ALTO files with content and layout information
• OCR files containing only the textual content of the item
• One or more images of the object, for example in JPEG format
KBGA APIs
KBGA consists of three main functional components / groups of APIs:
1. The metadata associated with objects (descriptive, structural) is stored in a
OAI-PMH compatible metadata storage (KB-MDO).
2. The digital objects or files referred to by the structure metadata (images,
ALTO, OCR) are stored in a central web storage (KB-WO) accessible
through resolver links.
3. In order to search, the KB provides a search index from metadata and full-
text objects (jSRU).
KBGA overview
1. KB-MDO - OAI-PMH repository
• Based on the OAI-PMH (Open Archives Initiative Protocol for Metadata
Harvesting) protocol
• Harvesting metadata of a single record or an entire set
• Administrative information to keep copy up to date.
KB-MDO OAI-PMH examples 1/2
• http://services.kb.nl/mdo/oai?verb=ListSets

ListSets: Provides an overview of all 'sets'.
• http://services.kb.nl/mdo/oai?verb=ListIdentifiers&set=anp&metadataPrefix=didl

ListIdentifiers: Gives the first 'batch' of MPEG21 DIDL records of the ANP set.
• http://services.kb.nl/mdo/oai?
verb=ListIdentifiers&set=anp&metadataPrefix=didl&from=2008-10-01

Similar to the previous request, but now restricted to records that were last updated
on 2008-10-01 or later.
KB-MDO OAI-PMH examples 2/2
• http://services.kb.nl/mdo/oai?verb=ListIdentifiers&resumptionToken=anp!
2008-10-03T11:11:50.639Z!!didl!3932926

Returns the next 'batch' of identifiers for the previous request.
• http://services.kb.nl/mdo/oai?verb=GetRecord&identifier=anp:anp:
1937:07:08:8:mpeg21&metadataPrefix=didl

GetRecord: Retrieves one specific MPEG21 DIDL record.
Tip: Harvesting KB-MDO
A simple tool for harvesting metadata from KB-MDO (or any other OAI-PMH
repository) is oai2linerec (https://github.com/renevoorburg/oai2linerec)
./oai2linerec.sh -s anp -p didl -b http://services.kb.nl/mdo/oai -o output.txt
KBGA overview
2. Using resolver links - ANP example
• Persistent identifier in the form of a resolver link:

http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21
• Image can be retrieved by adding the :image suffix:

http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:image
• OCR can be retrieved by adding the :ocr suffix:

http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:ocr
• ALTO file can be retrieved using the :alto suffix:

http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:alto
KBGA overview
3. jSRU for search
• If you do not wish to retrieve an entire set, but are more interested in getting
results for a particular search query
• Based on the SRU (Search and Retrieval via URL) standard protocol
• Uses CQL (Contextual Query Language) as its query language
Using jSRU 1/3 - basic example
http://jsru.kb.nl/sru/sru?operation=searchRetrieve&x-
collection=ANP&query=nobelprijs
Base URL followed by query string with a number of parameters:
• Parameter operation with value searchRetrieve to specify the operation
• Parameter x-collection with value ANP to select the collection to search
• Parameter query to enter the search query ‘nobelprijs’
Using jSRU 2/3 - more parameters
• With the operation parameter set to explain, some general information
about the service can be obtained
• Adjust the number of results returned with the maximumRecords
parameter
• Use the startRecord parameter if you want to start viewing the result set
from a particular record onwards
Using jSRU 3/3 - CQL
• Keyword string: query="nobelprijs literatuur"
• OR- or AND-queries: query=nobelprijs AND literatuur
• Wildcards: query=nobelpr*
• Restrict to a particular field: query=date=01-01-1960
• Restrict to period in time: query=date within "01-01-1960 01-01-1961"
Exercise - Collect ANP data
To play with the Frame generator, collect OCR files from the ANP-collection.
1. Define a basic jSRU search query that defines a matching subset for your
research.
2. Perform this query for various periods in time.
3. Use the resolver to get the OCR (~ 3 per period)

Use ‘Save as…’ in your browser.

Apply the .xml-extension.
See http://lab.kb.nl/workshop-using-frame-generator

More Related Content

What's hot

A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Yahoo Developer Network
 
Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...webhostingguy
 

What's hot (20)

Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Improved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the MassesImproved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the Masses
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 

Similar to Rene Voorburg - Using KB APIs to collect data

The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agricultureValeria Pesce
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsJulien Nioche
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friendslucenerevolution
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.xFabio Codebue
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisliang chen
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...CIARD Movement
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management ServiceSafe Software
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationSimeon Warner
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
spatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data accessspatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data accessDesconnets Jean-Christophe
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes InternalsShimi Bandiel
 

Similar to Rene Voorburg - Using KB APIs to collect data (20)

The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.x
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysis
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
XDC demo: CTA
XDC demo: CTAXDC demo: CTA
XDC demo: CTA
 
spatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data accessspatial data infrastructure : data modelling and web services for data access
spatial data infrastructure : data modelling and web services for data access
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 

More from KBNLResearch

Automating rights decision elag 2017
Automating rights decision elag 2017Automating rights decision elag 2017
Automating rights decision elag 2017KBNLResearch
 
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigeren
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigerenTheo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigeren
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigerenKBNLResearch
 
Melvin Wevers - Computer vision and advertisements
Melvin Wevers - Computer vision and advertisementsMelvin Wevers - Computer vision and advertisements
Melvin Wevers - Computer vision and advertisementsKBNLResearch
 
Lotte Wilms - KB Lab as hub
Lotte Wilms - KB Lab as hubLotte Wilms - KB Lab as hub
Lotte Wilms - KB Lab as hubKBNLResearch
 
Martijn Kleppe - KBK-1M dataset
Martijn Kleppe - KBK-1M datasetMartijn Kleppe - KBK-1M dataset
Martijn Kleppe - KBK-1M datasetKBNLResearch
 
Pim Huijnen - Keyword generator & dictionary viewer
Pim Huijnen - Keyword generator & dictionary viewerPim Huijnen - Keyword generator & dictionary viewer
Pim Huijnen - Keyword generator & dictionary viewerKBNLResearch
 
Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles KBNLResearch
 

More from KBNLResearch (7)

Automating rights decision elag 2017
Automating rights decision elag 2017Automating rights decision elag 2017
Automating rights decision elag 2017
 
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigeren
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigerenTheo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigeren
Theo van Veen - Verrijkingen in Delpher: genereren, gebruiken en corrigeren
 
Melvin Wevers - Computer vision and advertisements
Melvin Wevers - Computer vision and advertisementsMelvin Wevers - Computer vision and advertisements
Melvin Wevers - Computer vision and advertisements
 
Lotte Wilms - KB Lab as hub
Lotte Wilms - KB Lab as hubLotte Wilms - KB Lab as hub
Lotte Wilms - KB Lab as hub
 
Martijn Kleppe - KBK-1M dataset
Martijn Kleppe - KBK-1M datasetMartijn Kleppe - KBK-1M dataset
Martijn Kleppe - KBK-1M dataset
 
Pim Huijnen - Keyword generator & dictionary viewer
Pim Huijnen - Keyword generator & dictionary viewerPim Huijnen - Keyword generator & dictionary viewer
Pim Huijnen - Keyword generator & dictionary viewer
 
Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles
 

Recently uploaded

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 

Recently uploaded (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 

Rene Voorburg - Using KB APIs to collect data

  • 1. Using KB APIs to collect data KB websites like Delpher.nl or GeheugenvanNederland.nl use the KBGA infrastructure to search, retrieve and display digital content. This infrastructure is also available to Digital Humanities scholars. KBGA can be described as a set of datatypes and APIs, providing dataservices.
  • 2. Example dataset: ANP Radiobulletins • About 1.5 million digitized typoscripts from radio news broadcasts between 1937 and 1984 • Available through the Delpher website as Radiobulletins collection • KB offers the data under (semi) open licenses: • CC0-license for the metadata, CC-BY-NC-ND-licenses for images and full-text objects
  • 3. Datatypes in the KBGA Infrastructure • Structure metadata in MPEG21 DIDL format
  • 4. Datatypes in the KBGA Infrastructure • Structure metadata in MPEG21 DIDL format
  • 5. Datatypes in the KBGA Infrastructure • Structure metadata in MPEG21 DIDL format
  • 6. Datatypes in the KBGA Infrastructure • Structure metadata in MPEG21 DIDL format • Descriptive metadata in the Dublin Core (DC) vocabulary, 
 with some KB-specific extensions (‘DCX’) • ALTO files with content and layout information • OCR files containing only the textual content of the item • One or more images of the object, for example in JPEG format
  • 7. KBGA APIs KBGA consists of three main functional components / groups of APIs: 1. The metadata associated with objects (descriptive, structural) is stored in a OAI-PMH compatible metadata storage (KB-MDO). 2. The digital objects or files referred to by the structure metadata (images, ALTO, OCR) are stored in a central web storage (KB-WO) accessible through resolver links. 3. In order to search, the KB provides a search index from metadata and full- text objects (jSRU).
  • 9. 1. KB-MDO - OAI-PMH repository • Based on the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol • Harvesting metadata of a single record or an entire set • Administrative information to keep copy up to date.
  • 10. KB-MDO OAI-PMH examples 1/2 • http://services.kb.nl/mdo/oai?verb=ListSets
 ListSets: Provides an overview of all 'sets'. • http://services.kb.nl/mdo/oai?verb=ListIdentifiers&set=anp&metadataPrefix=didl
 ListIdentifiers: Gives the first 'batch' of MPEG21 DIDL records of the ANP set. • http://services.kb.nl/mdo/oai? verb=ListIdentifiers&set=anp&metadataPrefix=didl&from=2008-10-01
 Similar to the previous request, but now restricted to records that were last updated on 2008-10-01 or later.
  • 11. KB-MDO OAI-PMH examples 2/2 • http://services.kb.nl/mdo/oai?verb=ListIdentifiers&resumptionToken=anp! 2008-10-03T11:11:50.639Z!!didl!3932926
 Returns the next 'batch' of identifiers for the previous request. • http://services.kb.nl/mdo/oai?verb=GetRecord&identifier=anp:anp: 1937:07:08:8:mpeg21&metadataPrefix=didl
 GetRecord: Retrieves one specific MPEG21 DIDL record.
  • 12. Tip: Harvesting KB-MDO A simple tool for harvesting metadata from KB-MDO (or any other OAI-PMH repository) is oai2linerec (https://github.com/renevoorburg/oai2linerec) ./oai2linerec.sh -s anp -p didl -b http://services.kb.nl/mdo/oai -o output.txt
  • 14. 2. Using resolver links - ANP example • Persistent identifier in the form of a resolver link:
 http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21 • Image can be retrieved by adding the :image suffix:
 http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:image • OCR can be retrieved by adding the :ocr suffix:
 http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:ocr • ALTO file can be retrieved using the :alto suffix:
 http://resolver.kb.nl/resolve?urn=anp:1973:10:18:44:mpeg21:alto
  • 16. 3. jSRU for search • If you do not wish to retrieve an entire set, but are more interested in getting results for a particular search query • Based on the SRU (Search and Retrieval via URL) standard protocol • Uses CQL (Contextual Query Language) as its query language
  • 17. Using jSRU 1/3 - basic example http://jsru.kb.nl/sru/sru?operation=searchRetrieve&x- collection=ANP&query=nobelprijs Base URL followed by query string with a number of parameters: • Parameter operation with value searchRetrieve to specify the operation • Parameter x-collection with value ANP to select the collection to search • Parameter query to enter the search query ‘nobelprijs’
  • 18. Using jSRU 2/3 - more parameters • With the operation parameter set to explain, some general information about the service can be obtained • Adjust the number of results returned with the maximumRecords parameter • Use the startRecord parameter if you want to start viewing the result set from a particular record onwards
  • 19. Using jSRU 3/3 - CQL • Keyword string: query="nobelprijs literatuur" • OR- or AND-queries: query=nobelprijs AND literatuur • Wildcards: query=nobelpr* • Restrict to a particular field: query=date=01-01-1960 • Restrict to period in time: query=date within "01-01-1960 01-01-1961"
  • 20. Exercise - Collect ANP data To play with the Frame generator, collect OCR files from the ANP-collection. 1. Define a basic jSRU search query that defines a matching subset for your research. 2. Perform this query for various periods in time. 3. Use the resolver to get the OCR (~ 3 per period)
 Use ‘Save as…’ in your browser.
 Apply the .xml-extension. See http://lab.kb.nl/workshop-using-frame-generator