Humanities Networked Infrastructure (HuNI)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Humanities Networked Infrastructure (HuNI)

  • 321 views
Uploaded on

Presentation by Toby Burrows and Deb Verhoeven to the Fifth National Forum of AeRO (the Australian eResearch Organization), held in Perth on 26 July 2013. The presentation gives an overview of the......

Presentation by Toby Burrows and Deb Verhoeven to the Fifth National Forum of AeRO (the Australian eResearch Organization), held in Perth on 26 July 2013. The presentation gives an overview of the HuNI Project as at July 2013. Topics covered include: data ingest and alignment from 28 Australian humanities datasets; building HuNI’s discovery functionality; and designing Virtual Laboratory tools for researchers.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
321
On Slideshare
321
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Presenting on behalf of Professor Deb Verhoeven, the Project Director
  • HuNI is one of the VLs funded under the NeCTAR VL programmmeDon’t need to explain NeCTAR to this audience? Focus on “Data-centred workflows” is a challenge for the humanities
  • Different kinds of interoperability – use these to structure the rest of the presentation
  • Organizational interoperability
  • Currently funded until 31 January 2014 – funding began June 2012
  • The HuNI VL needs an active community of early adopters and advocates – beyond the current partnership
  • Semantic interoperability
  • This was the initial (Phase 1) ingestion workflow – subsequently revisedData Sources ingested into RDF Triple Store and structured using components from existing ontologies
  • Building a core ontology to which partner data can be aligned and mapped.Components of CIDOC-CRM, FOAF and FRBR-OO ontologieswere re-used for the integration of the initial datasets. Initial focus was on people.More components were then added, especially in relation to events, and to works and expressions. Work is underway to plug in vocabularies using SKOS.
  • This section of the HuNI ontology shows the "joins" and class relationships, where the CIDOC-CRM and FRBR-OO ontologies align. The green bubbles record the CIDOC entities and the red bubbles record the FRBR entities. The bidirectional arrows indicate where there is a "sameAs" relationship.The unidirectional arrow indicate where there is a sub-class relationship.
  • That was “semantic interoperability”.Third angle is technical interoperability.Diagram shows a high-level view of the various processes.Will look at these separately.
  • XML publishing options for partners: OAI-PMH harvesting plus a custom-built solution for non-OAI sitesWe are not harvesting all the data – only the primary entity classes common to most partners: people, places, events and objects. Lowest common denominator is a flat XML file per class entity, together with uniquely identifying information. For the person class entity: first name, last name, date of birth/death, bio, occupation. Solr search server: aggregation of harvested XML records Jena RDF Triple Store: aggregation of stored RDF Graphs
  • Integration into RDF has proven to be semantically and technically complex, because: The publishing format necessary to allow us to do the mappings is too high a technical barrier for most data custodians The data analysis and mapping to a common data model is time consuming and complexSoftware performance issuesThat’s why only 6 partner data sources have been have aggregated into the RDF Triple Store so far. Work is continuing on this approach.We have also developed a Solr index.XML records are harvested from the partner feeds, transformed, and submitted to the Solr search server. 24 datasets aggregated so far. Remainder in process.
  • But this is not just a data integration project – VL also requires tools for researchers to use.We’re building a suite of tools for researchers to work with the aggregated dataThis is the technology stack being used for the VL tools
  • Tasks which can be carried out by researchers against the Solr index
  • Tasks which can be carried out against the Linked Data aggregate
  • The VL will support a workflow centred around discovery, analysis and sharing. Here’s the cartoon version of this workflow!
  • Researchers will be able to:Display existing connections between relevant records held within their virtual collection, and Add further links between particular records, with commentary describing the relationship between them The LORE Tool (developed at UQ) will be made modified to work with HuNI in this way.
  • Researchers can also export their Virtual Collections and undertake further analysis in their own tool environment.HuNI will also include a Tool Integration Framework specifying how third party tools can integrate within the lab and work with HuNI data.
  • Researchers will have the option to share their virtual collections, and their analyses, with other researchers
  • Currently in alphaLink will be made available on huni.net.au soon for testing and feedback
  • Fourth element of interoperability – project level.Collaborative governance structure in place: Steering Committee plus advisory groupsStaff for various function (includes some in-kind contributions)Formal project management methodology (Prince2)Some challenges: HuNI staff in four states; most effective communication methods, when to use face-to-face
  • Detailed technical information

Transcript

  • 1. Humanities Networked Infrastructure (HuNI) Professor Deb Verhoeven, Deakin University Dr Toby Burrows, University of Western Australia
  • 2. VIRTUAL LABORATORIES
  • 3. • Ensure that Australian cultural datasets and the research associated with them become part of the emerging international Linked Open Data environment • Enable research enquiries to move easily from: what is? to where is? • Support the role of annotation and metadata in discovery of new knowledge or the means to elucidate new knowledge • Position the idea of data as both a subject and an object of analysis in humanities • Contribute to debates around standards for development and implementation HuNI: BROAD BENEFITS
  • 4. • Enable humanities researchers to work with cultural datasets more efficiently and effectively, and on a larger scale; • Encourage the systematic sharing of research data between humanities researchers (including the cultural dataset curators themselves), the community and cultural institutions; • Encourage a greater level of cross-disciplinary and interdisciplinary research, both within the humanities and creative arts and between the humanities/creative arts and other disciplines, and the wider public; • Support innovative methodologies such as network analysis, game theory and ‘virtual history’ that rely on large-scale datasets HuNI: SPECIFIC BENEFITS
  • 5. 1. Organizational level: aligning the goals and processes of the institutions involved 2. Semantic level: aligning the meaning of the exchanged digital resources 3. Technical level: implementing data interoperability requires both data integration and data exchange processes as well as enabling effective use of the data that becomes available Pasquale Pagano, ‘Data Interoperability’ (GRDI2020) 4. Project level: The advent of more complex ‘big humanities’ projects requires multi-disciplinary personnel, which in turn entails the management of different workflows and expectations: developing a consortial approach, arriving at a common definition of project methods, etc. INTEROPERABILITY
  • 6. 1. The PARTNERSHIP Consortium led by Deakin University • Cultural data providers (10) – project co-operators • Humanities software developer (1) – project co- developers • eResearch organisations (2) – lead development agencies – VeRSI and Intersect
  • 7. HuNI PARTNER DATASETS AMHD MAP CAARP Bonza AFIRC Circus Oz AusStage Media: film, cinema, theatre, newspapers, magazines, advertising, music, live performances DAAO AustLit AWR ADB DoS Biographical: artists, designers, writers, significant people, scientists, Sydney demographics EOAS AUSTLANG Mura Indigenous languages
  • 8. Welcome to the Cinema and Audiences Research Project (CAARP) database: An online encyclopaedia of cinema-going in Australia. Data This site contains information on film screenings and venues in Australia. 430,137 screenings 10,256 films 1,978 cinemas 1,649 companies From 1846 to now
  • 9. • NeCTAR investment of $1.33M • Partner contributions of $480,000 • Partner in-kind contributions amounting to >$1M FINANCIAL COLLABORATION
  • 10. COMMUNITY BUILDING • Collated user-stories (20) • Online showcase events – next one is 4th September 2013 • Link to the alpha prototype available shortly on huni.net.au; feedback buttons • Wider beta launch at eResearch Australasia in October 2013 • Stay up to date through our monthly newsletter and blog feed • Follow us on Twitter - @HuNIVL
  • 11. Information design challenge: to use Linked Data and ontologies / vocabularies for data to be aligned and mapped. • Reading the data: characteristics of the data determine the ontological components selected and the major entities • Major entities identified as: people, organizations, events, relationships, places, dates, resources, and subjects • Components from ontologies already available and being reused or considered: CIDOC-CRM, FOAF, FRBR, FRBR-OO, BibFrame and PROV-O 2. INTEGRATING MEANING
  • 12. INGESTION WORKFLOW
  • 13. HuNI ONTOLOGY March 2013
  • 14. ALIGNING ONTOLOGIES
  • 15. 3. HuNI FUNCTIONALITY Data integration HuNI side Partner side Data harvest, transform and ingest Solr Search Server [HuNI Data] RDF Triple Store [HuNI Linked Data] Data analysis and mapping HuNI Virtual Laboratory Scholarly researcher workflow tasks Admin tasksPublic and citizen researcher workflow tasks Data discovery Data analysis Data sharing Analyse and annotate collection Export collection Share collection and analysis Share search results Corbicula Registration and login Profile management History recording Project management Simple search Advanced search Save search results as private collection Refine / expand collection Simple search Advanced search Deep (SPARQL-based) search Data update and publish ADB DAAO CAARP AFIRC AusStage
  • 16. • 28 Australian datasets are being harvested for integration into HuNI • HuNI gateway components are deployed on the NeCTAR Research Cloud. • They harvest the XML feeds and transform them for ingestion into two HuNI data aggregates: a Solr search server and a Jena RDF Triple Store. DATA INTEGRATION • Live data feeds are deployed at the partner sites to expose updated partner data as XML Data integration HuNI side Partner side Data harvest, transform and ingest Solr Search Server [HuNI Data] RDF Triple Store [HuNI Linked Data] Data analysis and mapping Corbicula Data update and publish ADB DAAO CAARP AFIRC AusStage
  • 17. TWO HuNI DATA AGGREGATES Solr aggregate RDF aggregate 28 0 7 14 21 24 0 7 14 21 6 partnerdatasets partnerdatasets
  • 18. TECHNOLOGY STACK for VL TOOLS • Front-end frameworks – AngularJS and Twitter Bootstrap single page Web app • Tools hosting framework – Open Social via Apache Shindig • Back-end framework – SpringMVC via Roo • Layer integration – RESTful Web services
  • 19. A researcher with a HuNI account will be able to: • Search the HuNI data • Save their search results as a private collection • Refine their collection through additional searches • Analyse and annotate their collection with their own assertions and commentary • Export their collection for further analysis • Publish and share their collections and analyses TOOLS for RESEARCHERS HuNI Virtual Laboratory Scholarly researcher workflow tasks Admin tasksPublic and citizen researcher workflow tasks Data discovery Data analysis Data sharing Analyse and annotate collection Export collection Share collection and analysis Share search results Registration and login Profile management History recording Project management Simple search Advanced search Save search results as private collection Refine / expand collection Simple search Advanced search Solr Search Server [HuNI Data]
  • 20. Researchers will be able to: • perform a “deep search” of the graphs in the RDF Triple Store; • browse by high-level facets. The large-scale aggregation of Linked Data makes explicit the relationships and connections between records across all the partner datasets, enabling the researcher to construct more complex semantic queries. TOOLS for RESEARCHERS (2) HuNI Virtual Laboratory Scholarly researcher workflow tasks Admin tasksPublic and citizen researcher workflow tasks Data discovery Data analysis Data sharing Registration and login Profile management History recording Project management Deep (SPARQL-based) search RDF Triple Store [HuNI Linked Data]
  • 21. RESEARCHER WORKFLOW: Discovery (part 1)
  • 22. VIRTUAL LABORATORY RESEARCHER WORKFLOW: Discovery (part 2)
  • 23. VIRTUAL LABORATORY RESEARCHER WORKFLOW: Discovery (part 3)
  • 24. VIRTUAL LABORATORY RESEARCHER WORKFLOW: Analysis (part 1)
  • 25. VIRTUAL LABORATORY RESEARCHER WORKFLOW – Analysis (part 2)
  • 26. VIRTUAL LABORATORY RESEARCHER WORKFLOW: Sharing
  • 27. VL PROTOTYPE
  • 28. 4. The PROJECT HuNI staff: • project director/community liaison (20%) • project manager (100%) • technical coordinator (100%) • information services coordinator (90%) • community engagement (30%) • communication coordinator (20%) • administrative support (20%) • software developer(s) NeCTAR Directorate HuNI Steering Committee Team HuNI Technical Working Group Expert Advisory Group Expert Data Group
  • 29. WEB SITE: huni.net.au
  • 30. WIKI: apidictor.huni.net.au
  • 31. HuNI: a virtual laboratory for the humanities http://huni.net.au/@HuNIVL