Arches Webinars: Intro to the Arches Platform
Part 1: Arches Capabilities
Intro to the Arches Platform is a four-part webinar series. This series is ideal for anyone interested in learning about and exploring the key capabilities, tools, concepts and software architecture of the Arches cultural heritage data management platform.
- - -
Arches is an open source software platform freely available for cultural heritage organizations to manage their heritage resource data.
More info about Arches here: www.archesproject.org
The document outlines Rik van den Bosch's work plan for developing the Soil Data Facility (SDF) from 2018-2020. Key aspects of the plan include developing technical specifications for the Global Soil Information System (GloSIS) and its data products from 2017-2018, building GloSIS infrastructure and populating data products from 2018-2020. It also discusses specifications for GloSIS point databases and grids, a vision document outlining GloSIS, engaging data providers, and connecting providers and users. The document proposes roles for various organizations in developing GloSIS, with SDF leading technical development, FAO hosting and developing the discovery hub, and all organizations working together.
Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
Implementing an AMS on a National Scale – A ModelAxiell ALM
Chezkie Kasnett, Digital Projects Manager, IT Division, The National Library of Israel
The National Library of Israel is leading a national project to deploy an online centralized network of historical archives with cultural heritage material. The objectives of the project are to provide free, on-line access to hundreds of historical archives and their holdings through cataloging and digitization, and secondly, the long-term digital preservation of these holdings.
This document discusses how aerial robotics techniques can be used to solve real-world problems. It describes how aerial robots can tackle dangerous, dull, or difficult tasks instead of humans. It then discusses Qua.R.K., an unmanned aerial vehicle system, and its stable aerial platform, protected rotors, and customizable pods for different missions. The document also mentions the ground control software used for mission planning, routing, and data collection. Finally, it lists several industries like research, mining, warehousing, and law enforcement that could benefit from Qua.R.K.'s aerial robotics capabilities.
What does it take to create a web of government Linked Data? The UK government is finding out. Our story is one of pioneers. You will hear how we are moving out of existing settlements to the wide plains of government data. How we are starting to build the first railroads across this vast territory to open a new lands of opportunity. All the time, of course, having to avoid both outlaws and the Civil War back east.
ARIADNE is an EU-funded project that aims to integrate archaeological data repositories across Europe by overcoming fragmentation and fostering data sharing. It involves 24 partners from 17 countries. The project conducts networking activities to build community and standards, provides trans-national access to online resources and training, and performs research on data integration, management, and new tools. In its first nine months, ARIADNE has established special interest groups, collected information on partners' datasets and metadata schemas, and begun designing an integrated infrastructure and catalog data model.
The document outlines Rik van den Bosch's work plan for developing the Soil Data Facility (SDF) from 2018-2020. Key aspects of the plan include developing technical specifications for the Global Soil Information System (GloSIS) and its data products from 2017-2018, building GloSIS infrastructure and populating data products from 2018-2020. It also discusses specifications for GloSIS point databases and grids, a vision document outlining GloSIS, engaging data providers, and connecting providers and users. The document proposes roles for various organizations in developing GloSIS, with SDF leading technical development, FAO hosting and developing the discovery hub, and all organizations working together.
Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
Implementing an AMS on a National Scale – A ModelAxiell ALM
Chezkie Kasnett, Digital Projects Manager, IT Division, The National Library of Israel
The National Library of Israel is leading a national project to deploy an online centralized network of historical archives with cultural heritage material. The objectives of the project are to provide free, on-line access to hundreds of historical archives and their holdings through cataloging and digitization, and secondly, the long-term digital preservation of these holdings.
This document discusses how aerial robotics techniques can be used to solve real-world problems. It describes how aerial robots can tackle dangerous, dull, or difficult tasks instead of humans. It then discusses Qua.R.K., an unmanned aerial vehicle system, and its stable aerial platform, protected rotors, and customizable pods for different missions. The document also mentions the ground control software used for mission planning, routing, and data collection. Finally, it lists several industries like research, mining, warehousing, and law enforcement that could benefit from Qua.R.K.'s aerial robotics capabilities.
What does it take to create a web of government Linked Data? The UK government is finding out. Our story is one of pioneers. You will hear how we are moving out of existing settlements to the wide plains of government data. How we are starting to build the first railroads across this vast territory to open a new lands of opportunity. All the time, of course, having to avoid both outlaws and the Civil War back east.
ARIADNE is an EU-funded project that aims to integrate archaeological data repositories across Europe by overcoming fragmentation and fostering data sharing. It involves 24 partners from 17 countries. The project conducts networking activities to build community and standards, provides trans-national access to online resources and training, and performs research on data integration, management, and new tools. In its first nine months, ARIADNE has established special interest groups, collected information on partners' datasets and metadata schemas, and begun designing an integrated infrastructure and catalog data model.
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
A presentation given by Julie Allinson at the UK Archivematica group meeting on 6th November 2015 in Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
Presented by Susanna Mornati at the 2019 DSpace North American User Group Meeting September 23 & 24, 2019 at the University of Minnesota in Minneapolis.
Abstract: DSpace-CRIS is a free open-source platform based on DSpace for Research Data and Information Management, adopted by a wide international community of universities and research centers: DSpace-CRIS Home. It complies with recommendations, open standards and technologies such as the OAI-PMH, SignPosting, and ResourceSync (recommended by the COAR Next Generation Repositories WG), it features complete ORCID integration, compliance with the CERIF model, the IIIF framework, and with the OpenAIRE Guidelines for Literature Repositories, Data Archives, CRIS Managers, to improve findability, accessibility, interoperability, and reuse of digital assets for research and cultural heritage. DSpace-CRIS collects and disseminates information about researchers' profiles, organizations, publications, patents, grants, awards, and all entities that populate the research domain and their relationships, besides storing and exposing full-text publications, datasets, and other relevant digital objects, providing persistent identifiers and long-term preservation capabilities. DSpace-RDM exposes datasets to visual exploration and M2M streaming for analysis thanks to the integration with CKAN. DSpace-GLAM enhances the fruition of the cultural heritage through the (crowd-funded) IIIF image viewer, providing remote fruition of cultural heritage and offering a great user experience. These flavors of DSpace allow to expose and share open data, open information, and open digital objects in a collaborative, interoperable, and sustainable way. The use cases of a variety of institutions in different countries and continents will be shared to show the use of this powerful technology.
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
A webinar given by Jenny Mitcham and Simon Wilson to Digital Preservation Coalition members on 25th November 2015. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
The document discusses research data services in Australia including Research Data Services (RDS), the Australian National Data Service (ANDS), National eResearch Collaboration Tools and Resources (NeCTAR), the Australian Access Federation (AAF), and Australia's Academic Research Network (AARNET). It outlines the research data lifecycle and proposes components for a national research data system, including provisioning, storage, processing, and archiving capabilities. Examples are provided for how different types of research projects could interact with such a system.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
This document summarizes the findings of the Jisc Research Data Spring project at the University of York and Hull which investigated how Archivematica could be used to provide digital preservation for research data. The project tested Archivematica, explored how it handles different file formats and research data, and identified ways to improve Archivematica and integrate it into research data management workflows. The next phases will develop Archivematica further and implement proof of concepts at York and Hull to preserve research data using Archivematica.
This document discusses various strategies and resources for archiving internet content for research purposes. It describes several existing large-scale web archives like the Internet Archive and Common Crawl, as well as national and institutional archives. It also outlines how researchers can collect targeted web archives using open-source tools or subscription-based services.
The document discusses a project to investigate using Archivematica, an open-source digital preservation system, to provide digital preservation functionality for research data at the Universities of Hull and York. The project involved three phases: exploring Archivematica and research data needs, developing Archivematica features, and implementing proof-of-concept systems at both universities. Key findings included that Archivematica could meet many preservation needs but had limitations identifying research file formats, and that collaboration was important for addressing challenges in preserving research data long-term.
Making good command decisions today is more and more being underpinned by the use of data and the insights that the data can deliver. In our rapidly changing world, we start to find that the pure volume of data becomes overwhelming. This volume of data can lead to indecision instead of better decision making. In this session, we will cover how through the use of artificial intelligence, machine learning and intelligent data routing we can enhance and support the decision-making process in times of crisis.
Ross King, Project Director of SCAPE, gave a short presentation of the EU funded project SCAPE, including descriptions of tools for planning and monitoring digital preservation, scalable computation and repositories, SCAPE Testbeds and where to learn more.
The presentation was given at the workshop ‘Preservation at Scale’ http://bit.ly/17ppAln in connection with the iPres2013 conference in Lissabon, Portugal, in September 2013.
Galaxia is a universal monitoring framework that supports monitoring infrastructure, applications, and containers across on-premise and cloud deployments. It addresses challenges around monitoring distributed and microservices applications. Galaxia supports Docker containers, VMs, applications and more through a single API and UI. It exports metrics for auto-scaling and alerting and has a roadmap to add more analytics and predictive capabilities. Galaxia's architecture includes components like the Galaxia API, exporter, and renderer that work with Prometheus and store data in MySQL.
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
Terradue is an Italian SME focused on providing cloud services for earth science research. They have developed an open platform to help scientists access and analyze large datasets through web and cloud technologies. Their goal is to stimulate new scientific applications and help researchers adapt to increasing data volumes. The platform allows scientists to share data access points, processing chains, and collaborate across distributed systems delivered as a service. Terradue is focusing on new services like data and software as a service to create marketplaces and leverage linked open data. They are also exploring how to use analytics and human resources like data scientists to help optimize the platform.
This presentation was provided by
Priscilla Caplan of The Florida Center for Library Automation and Jeremy York of The University of Michigan Library, during the NISO Webinar "What It Takes To Make It Last: E-Resources Preservation" held on February 10, 2011.
Open source glam tools for building sustainable cultural heritage and digital...LIBIS
Op donderdag 16 december 2016 gaf Roxanne Wyns van LIBIS een gastles voor het vak Online Publishing in de MA Cultural Studies / Ma Digital Humanities aan de KU Leuven met als thema ‘Open source GLAM tools for building sustainable Cultural Heritage and Digital Humanities infrastructures’.
Beschrijving:
This session focuses on a number of Cultural Heritage and Humanities infrastructure projects in which gallery, library, archival and museum (GLAM) tools have been used in combination with other open source and proprietary systems to provide sustainable and innovative environments for the management and research of diverse cultural heritage collections. The session will introduce software’s such as CollectiveAccess, Omeka, and the IIIF Mirador high-resolution viewer. Attention will also be payed to the opportunities and challenges of open source projects and best practices in standards, data interoperability and safe data storage to achieve good data management.
Implementing Archivematica, research data networkJisc RDM
This presentation discusses implementing Archivematica for preserving research data at the University of York and Hull. It covers background on the project, challenges implementing Archivematica, issues with identifying unknown file formats in research data, and future plans to move from proof of concept to production. The project tested pulling metadata from systems into Archivematica for ingest and explored packaging data for long-term preservation and access. A major challenge was the large number of unidentified file formats, which the project is addressing by developing new file format signatures.
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
This document discusses discovery systems used in academic libraries and provides projects and case studies using different discovery systems. It begins with an overview of what discovery systems are and key vendors like Primo, Summon, and EDS. It then describes projects using Summon that involved user experience studies and improvements. The case study on migrating to Summon 2 discusses the planning, analysis including surveys, design including prototypes, and implementation. Finally, it reviews implementing EDS and using its API, comparing features of Primo, Summon, and EDS.
The document discusses the CORE system, which aggregates open access research publications from multiple sources and provides text mining and analytics services. It describes CORE's three main phases: harvesting metadata and content, semantic enrichment of the data, and providing various services to users. These services include search, citation analysis, and classification tools via a public portal, mobile apps, and APIs. The document also outlines CORE's goals of improving discovery and enabling new research through large-scale analysis of publications, such as identifying trends, impact, and relationships between authors and papers.
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
A presentation given by Chris Awre, Jenny Mitcham and Sarah Romkey at RDMF14 (the DCC's Research Data Management Forum) on 9th November 2015 in York. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
The WorldMap platform http://worldmap.harvard.edu is the largest open source collaborative mapping system in the world, with over 13,000 map layers contributed by thousands of users from Harvard and around the world. Researchers may upload large spatial datasets to the system, create data-driven visualizations, edit data, and control access. Users may keep their data private, share it in groups, or publish to the world.
The user base is interdisciplinary, including scholars from the humanities, social sciences, sciences, public health, design, planning, etc. All are able to access, view, and use one another’s data, either online, via map services, or by downloading.
Current work is underway to create and maintain a global registry of map services and take us a step closer to one-stop-access for public geospatial data. Another project is working on tools to support the visualization of spatial datasets with over a billion features. Current collaborations are underway with groups inside Harvard, such as Dataverse, HarvardX, and various departments, and with groups outside Harvard, such as Cornell University and the University of Pennsylvania. Major additional contributors to the underlying source code include the WorldBank, the U.S. State Department, and the United Nations.
The source code for the WorldMap platform is available on GitHub https://github.com/cga-harvard/cga-worldmap.
Location: E25-202
Discussant: Ben Lewis is system architect and project manager for WorldMap, an open source infrastructure that supports collaborative research centered on geospatial information. Before joining Harvard, Ben was a project manager with Advanced Technology Solutions of Pennsylvania, where he led the company in adopting platform independent approaches to GIS system development. Ben studied Chinese at the University of Wisconsin and has a Masters in Planning from the University of Pennsylvania. After Penn, Ben helped start the GIS Lab at U.C. Berkeley, founded the GIS group for transportation engineering firm McCormick Taylor, and coordinated the Land Acquisition Mapping System for South Florida Water Management District. Ben is especially interested in technologies that lower the barrier to spatial technology access.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE
1) Current institutional repositories have issues with usability, interoperability, and acting primarily as silos for individual institutions' data.
2) The vision for next generation repositories is to position them as part of a globally networked infrastructure for scholarly communication, with the resources themselves, rather than the repositories, becoming the focus of services.
3) Key areas discussed for next generation repositories include improved resource discovery and content transfer using ResourceSync and Signposting, generating open usage metrics through a usage hub, and enabling annotation of content through web annotation protocols.
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
A presentation given by Julie Allinson at the UK Archivematica group meeting on 6th November 2015 in Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
Presented by Susanna Mornati at the 2019 DSpace North American User Group Meeting September 23 & 24, 2019 at the University of Minnesota in Minneapolis.
Abstract: DSpace-CRIS is a free open-source platform based on DSpace for Research Data and Information Management, adopted by a wide international community of universities and research centers: DSpace-CRIS Home. It complies with recommendations, open standards and technologies such as the OAI-PMH, SignPosting, and ResourceSync (recommended by the COAR Next Generation Repositories WG), it features complete ORCID integration, compliance with the CERIF model, the IIIF framework, and with the OpenAIRE Guidelines for Literature Repositories, Data Archives, CRIS Managers, to improve findability, accessibility, interoperability, and reuse of digital assets for research and cultural heritage. DSpace-CRIS collects and disseminates information about researchers' profiles, organizations, publications, patents, grants, awards, and all entities that populate the research domain and their relationships, besides storing and exposing full-text publications, datasets, and other relevant digital objects, providing persistent identifiers and long-term preservation capabilities. DSpace-RDM exposes datasets to visual exploration and M2M streaming for analysis thanks to the integration with CKAN. DSpace-GLAM enhances the fruition of the cultural heritage through the (crowd-funded) IIIF image viewer, providing remote fruition of cultural heritage and offering a great user experience. These flavors of DSpace allow to expose and share open data, open information, and open digital objects in a collaborative, interoperable, and sustainable way. The use cases of a variety of institutions in different countries and continents will be shared to show the use of this powerful technology.
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
A webinar given by Jenny Mitcham and Simon Wilson to Digital Preservation Coalition members on 25th November 2015. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
The document discusses research data services in Australia including Research Data Services (RDS), the Australian National Data Service (ANDS), National eResearch Collaboration Tools and Resources (NeCTAR), the Australian Access Federation (AAF), and Australia's Academic Research Network (AARNET). It outlines the research data lifecycle and proposes components for a national research data system, including provisioning, storage, processing, and archiving capabilities. Examples are provided for how different types of research projects could interact with such a system.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
This document summarizes the findings of the Jisc Research Data Spring project at the University of York and Hull which investigated how Archivematica could be used to provide digital preservation for research data. The project tested Archivematica, explored how it handles different file formats and research data, and identified ways to improve Archivematica and integrate it into research data management workflows. The next phases will develop Archivematica further and implement proof of concepts at York and Hull to preserve research data using Archivematica.
This document discusses various strategies and resources for archiving internet content for research purposes. It describes several existing large-scale web archives like the Internet Archive and Common Crawl, as well as national and institutional archives. It also outlines how researchers can collect targeted web archives using open-source tools or subscription-based services.
The document discusses a project to investigate using Archivematica, an open-source digital preservation system, to provide digital preservation functionality for research data at the Universities of Hull and York. The project involved three phases: exploring Archivematica and research data needs, developing Archivematica features, and implementing proof-of-concept systems at both universities. Key findings included that Archivematica could meet many preservation needs but had limitations identifying research file formats, and that collaboration was important for addressing challenges in preserving research data long-term.
Making good command decisions today is more and more being underpinned by the use of data and the insights that the data can deliver. In our rapidly changing world, we start to find that the pure volume of data becomes overwhelming. This volume of data can lead to indecision instead of better decision making. In this session, we will cover how through the use of artificial intelligence, machine learning and intelligent data routing we can enhance and support the decision-making process in times of crisis.
Ross King, Project Director of SCAPE, gave a short presentation of the EU funded project SCAPE, including descriptions of tools for planning and monitoring digital preservation, scalable computation and repositories, SCAPE Testbeds and where to learn more.
The presentation was given at the workshop ‘Preservation at Scale’ http://bit.ly/17ppAln in connection with the iPres2013 conference in Lissabon, Portugal, in September 2013.
Galaxia is a universal monitoring framework that supports monitoring infrastructure, applications, and containers across on-premise and cloud deployments. It addresses challenges around monitoring distributed and microservices applications. Galaxia supports Docker containers, VMs, applications and more through a single API and UI. It exports metrics for auto-scaling and alerting and has a roadmap to add more analytics and predictive capabilities. Galaxia's architecture includes components like the Galaxia API, exporter, and renderer that work with Prometheus and store data in MySQL.
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
Terradue is an Italian SME focused on providing cloud services for earth science research. They have developed an open platform to help scientists access and analyze large datasets through web and cloud technologies. Their goal is to stimulate new scientific applications and help researchers adapt to increasing data volumes. The platform allows scientists to share data access points, processing chains, and collaborate across distributed systems delivered as a service. Terradue is focusing on new services like data and software as a service to create marketplaces and leverage linked open data. They are also exploring how to use analytics and human resources like data scientists to help optimize the platform.
This presentation was provided by
Priscilla Caplan of The Florida Center for Library Automation and Jeremy York of The University of Michigan Library, during the NISO Webinar "What It Takes To Make It Last: E-Resources Preservation" held on February 10, 2011.
Open source glam tools for building sustainable cultural heritage and digital...LIBIS
Op donderdag 16 december 2016 gaf Roxanne Wyns van LIBIS een gastles voor het vak Online Publishing in de MA Cultural Studies / Ma Digital Humanities aan de KU Leuven met als thema ‘Open source GLAM tools for building sustainable Cultural Heritage and Digital Humanities infrastructures’.
Beschrijving:
This session focuses on a number of Cultural Heritage and Humanities infrastructure projects in which gallery, library, archival and museum (GLAM) tools have been used in combination with other open source and proprietary systems to provide sustainable and innovative environments for the management and research of diverse cultural heritage collections. The session will introduce software’s such as CollectiveAccess, Omeka, and the IIIF Mirador high-resolution viewer. Attention will also be payed to the opportunities and challenges of open source projects and best practices in standards, data interoperability and safe data storage to achieve good data management.
Implementing Archivematica, research data networkJisc RDM
This presentation discusses implementing Archivematica for preserving research data at the University of York and Hull. It covers background on the project, challenges implementing Archivematica, issues with identifying unknown file formats in research data, and future plans to move from proof of concept to production. The project tested pulling metadata from systems into Archivematica for ingest and explored packaging data for long-term preservation and access. A major challenge was the large number of unidentified file formats, which the project is addressing by developing new file format signatures.
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
This document discusses discovery systems used in academic libraries and provides projects and case studies using different discovery systems. It begins with an overview of what discovery systems are and key vendors like Primo, Summon, and EDS. It then describes projects using Summon that involved user experience studies and improvements. The case study on migrating to Summon 2 discusses the planning, analysis including surveys, design including prototypes, and implementation. Finally, it reviews implementing EDS and using its API, comparing features of Primo, Summon, and EDS.
The document discusses the CORE system, which aggregates open access research publications from multiple sources and provides text mining and analytics services. It describes CORE's three main phases: harvesting metadata and content, semantic enrichment of the data, and providing various services to users. These services include search, citation analysis, and classification tools via a public portal, mobile apps, and APIs. The document also outlines CORE's goals of improving discovery and enabling new research through large-scale analysis of publications, such as identifying trends, impact, and relationships between authors and papers.
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
A presentation given by Chris Awre, Jenny Mitcham and Sarah Romkey at RDMF14 (the DCC's Research Data Management Forum) on 9th November 2015 in York. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
The WorldMap platform http://worldmap.harvard.edu is the largest open source collaborative mapping system in the world, with over 13,000 map layers contributed by thousands of users from Harvard and around the world. Researchers may upload large spatial datasets to the system, create data-driven visualizations, edit data, and control access. Users may keep their data private, share it in groups, or publish to the world.
The user base is interdisciplinary, including scholars from the humanities, social sciences, sciences, public health, design, planning, etc. All are able to access, view, and use one another’s data, either online, via map services, or by downloading.
Current work is underway to create and maintain a global registry of map services and take us a step closer to one-stop-access for public geospatial data. Another project is working on tools to support the visualization of spatial datasets with over a billion features. Current collaborations are underway with groups inside Harvard, such as Dataverse, HarvardX, and various departments, and with groups outside Harvard, such as Cornell University and the University of Pennsylvania. Major additional contributors to the underlying source code include the WorldBank, the U.S. State Department, and the United Nations.
The source code for the WorldMap platform is available on GitHub https://github.com/cga-harvard/cga-worldmap.
Location: E25-202
Discussant: Ben Lewis is system architect and project manager for WorldMap, an open source infrastructure that supports collaborative research centered on geospatial information. Before joining Harvard, Ben was a project manager with Advanced Technology Solutions of Pennsylvania, where he led the company in adopting platform independent approaches to GIS system development. Ben studied Chinese at the University of Wisconsin and has a Masters in Planning from the University of Pennsylvania. After Penn, Ben helped start the GIS Lab at U.C. Berkeley, founded the GIS group for transportation engineering firm McCormick Taylor, and coordinated the Land Acquisition Mapping System for South Florida Water Management District. Ben is especially interested in technologies that lower the barrier to spatial technology access.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE
1) Current institutional repositories have issues with usability, interoperability, and acting primarily as silos for individual institutions' data.
2) The vision for next generation repositories is to position them as part of a globally networked infrastructure for scholarly communication, with the resources themselves, rather than the repositories, becoming the focus of services.
3) Key areas discussed for next generation repositories include improved resource discovery and content transfer using ResourceSync and Signposting, generating open usage metrics through a usage hub, and enabling annotation of content through web annotation protocols.
Similar to Intro to the Arches Platform - Part 1: Capabilities (20)
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Intro to the Arches Platform - Part 1: Capabilities
1. Welcome!
Intro to the Arches Platform
Part 1 of 4 – Arches Capabilities
Thursday, October 17, 2019
2. Intro to the Arches Platform Upcoming Webinars
Data Management Tools
Data Technology Concepts
Information Architecture
Part 2
CapabilitiesPart 1
Part 3
Part 4
3. Intro to the Arches Platform, Part 1 of 4: Arches Capabilities
Webinar Agenda
● Background
● Arches Platform Overview
● Arches Platform Capabilities
- Data Management Overview
- Data Visualization & Discovery
● Questions & Answers
4. Intro to the Arches Platform, Part 1 of 4: Arches Capabilities
Arches Project Background
Developed jointly by the Getty Conservation Institute
and World Monuments Fund, Arches is an open-
source software platform purpose built for
cultural heritage data management.
Arches is supported by a growing community of
heritage, preservation and technology professionals.
5. Intro to the Arches Platform, Part 1 of 4: Arches Capabilities
About the Arches Platform
● Open Source and free – no licensing fees and
unlimited users.
● Enterprise-level software
● Independent deployment by an organization or
institution.
6. Arches Platform Overview
Capabilities
Data
Management
Data Discovery &
Visualization
Data Management
Tools
Data Technology
Concepts
Information
Architecture
• Arches Collector –
Mobile App
• Resource Manager
• Arches Designer
• Reference Data
Manager
• Data Standards
• Semantics/ Ontologies
(incl. CIDOC CRM)
• Controlled Vocabularies
• Fuzzy Dates
• Software Stack
• Software Standards
ReportsTimeRelationshipsGISSearchData
Part 2
Part 1
Part 3 Part 4
Intro to the Arches Platform, Part 1 of 4: Arches Capabilities
11. HIGHLIGHTED FEATURES: Interactive Map, Basemaps,
Historic Maps, Map Overlays, Geospatial Search
ARCADE, the Arches-powered online heritage information and management system for
the City of Lincoln, UK: https://arcade.lincoln.gov.uk
12. HIGHLIGHTED FEATURES: Saved Searches, Semantic
Search, Timewheel
Arches Sample Instance using data from the Valley of the Queens Conservation
Project of the Getty Conservation Institute
13. HIGHLIGHTED FEATURES: Saved Search, Related
Resources, Reports
Arches Sample Instance using data about Frank Lloyd Wright legacy in Los
Angeles.
14. HIGHLIGHTED FEATURES: Geospatial Search,
Semantic Search, Advanced Search
Arches Sample Instance using data from the City of Los Angeles
15. HIGHLIGHTED FEATURES: Extending core Arches,
Reports, 3D Viewers
Global Digital Heritage, https://www.globaldigitalheritage.org/
17. Intro to the Arches Platform Upcoming Webinars
Data Management
Tools
Data Technology
Concepts
Information
Architecture
• Arches Collector –
Mobile App
• Resource Manager
• Arches Designer
• Reference Data
Manager
• Data Standards
• Semantics/
Ontologies (incl.
CIDOC CRM)
• Controlled
Vocabularies
• Fuzzy Dates
Part 2
• Software Stack
• Software Standards
Capabilities
• Data
Management
• Data Discovery &
Visualization
Part 1 Part 3 Part 4
19. - Thank you -
Share your feedback: contact@archesproject.org
Community Support - Arches Forum: www.groups.google.com/archesproject
Webinar repository – www.archesproject.org/videos
Editor's Notes
Recording,
As mentioned earlier, Part 2 will cover the data management tools that allows you to collect, structure, store and control access to your data. Part 3 will cover the underlying data technology concepts that help to explain how Arches deals with your data and why this is important. Part 4 will be an overview of Arches Technology & Information Architecture, including a look at the various software components and standards. We will be announcing dates for these in the future. Be sure to subscribe to our mailing list, follow us on social media or check our events page on Archesproject.org for more information as it is released.
For insights, analysis, monitoring, risk mitigation, planning, research and ultimately to improve decision making.
Thanks, Alina! And thank you to everyone out there for attending this webinar. Alina has given you a quick overview about Arches, and I wanted to remind you that if you have any questions about anything she mentioned or if you have any questions about what I’m about to cover (or anything else, actually), please use the Q&A feature to log your question and we will try to get to as many questions as possible at the end of the presentation portion of the webinar. As Alina mentioned, the Arches Platform is a robust fully featured system for cultural organizations to manage their data.
And by fully featured and robust, we mean that we created a platform that can both manage your data behind the scenes, using the latest data standards and technologies, and if you wish, publish that data in a controlled way. The users to whom you’ve given access to your data, this can be just you and your staff or it can be anyone on the web or any group in between, can then discover information visually in many different ways or really interrogate the database using the search tools that come standard with Arches. For this webinar, I’ll briefly go over the data management capabilities at a higher level and then go into a little more depth on the data discovery and visualization tools within Arches. As Alina mentioned, the 2nd webinar in this series will cover more of the data management aspects.
So regarding data management, Arches gives you the ability to Collect & Ingest data in several different ways: you can bulk import existing legacy data, you can enter new information via the Arches interface, and if you’re out in the field, you can use the Arches Collector on your mobile device to collection data and sync to your Arches instance. And not only is that data stored in Arches, but you also have the opportunity to define how that data is structured and optimized for use and who gets to access that data. Which leads to the search & discovery and visualization capabilities…
I’ll be discussing most of the data discovery and visualization capabilities today by looking at various use cases in which you might some of the features that you see in the slide. What I will not be going over in this presentation are fuzzy dates, viewers for the IIIF standard, and spectra charts, which will be covered in other webinars.
The Arches Platform is meant to be deployed by cultural heritage organizations independently, meaning that each organization makes decisions about how they want to brand their Arches-powered system, how the data is organized, and generally, how to configure the system to meet their own use case. What I’m showing you now is a screen capture of how the City of Lincoln in the UK is using Arches to manage data on both archaeological and heritage conservation data.
(START video)
ARCADE is the online heritage information and management system for the City of Lincoln, and a distinctive feature of ARCADE is the use of maps in various different ways to tell the story of the richness of heritage in Lincoln.
Basemaps > Overlays > Historic Maps > Geospatial Search
Moving from ARCADE, we’ve set up a sample Arches instance here with information loaded from the Getty Conservation Institute’s Valley of the Queens Conservation project. If you’ve been to Arches demo site, this is a similar instance of Arches. So, in the last video, we saw how you can use the geospatial features to explore data in an Arches instance. But in addition, the Arches platform has features that give you to power to make more direct queries and do so in any combination of ways. So perhaps, I have a research project in which I need to understand how many Chamber Tombs in the Valley of the Queens were constructed in the 12th Century BCE, here’s a way to do that. (start video) First, this particular instance of Arches has some searches saved so that users can access with one click a particular set of data. So, I’m going to start with selecting the Valley of the Queens saved search. (A note here is that these are easy for your Arches administrator to create and are a great way to pointing users to the curated datasets that you want them to be aware of). Next, I’m going to search for Chamber Tombs, which should further narrow down my search. And then, I’m going to use the Timewheel feature to find out how many of these chamber tombs in the Valley of the Queens were constructed in the 13th Century BCE. And then let’s see how many were constructed in the 12th century BCE to answer the initial question I posed. The Timewheel is a graphical representation of the time and date values in this particular Arches instance. In this instance, the outer ring represents the century distribution of the data.
Often, it important to understand the various relationships between whatever resources you are recording. This year, a collection of eight heritage sites were inscribed on the UNESCO World Heritage List as a showcase of American architect Frank Lloyd Wright’s legacy on Modern architecture. In this example, we’ll explore his impact on Los Angeles. First, we’ll click on the saved search, Frank Lloyd Wright in Los Angeles, and we see that we have an assortment of buildings and people that are associated with the architect. By clicking on Related Resources, we are presented with the Related Resources tool which visualizes the network of relationships present in the data. So in the case, we will explore the relationship of various buildings and people with Frank Lloyd Wright. You can navigate through the network graph OR you can use the right hand panel to navigate through all the relationships and potentially reveal additional relationships beyond the initial relationships.
Here’s another question that might be asked in the course of development planning: if there is a development project, what are the potential impacts to the historic fabric? To answer this question, let’s look at data from the City of Los Angeles. The more specific question here is: what is the potential impact of new subway line construction on Art Deco historic resources in the Miracle Mile area of Los Angeles? First, let’s define the impact area by drawing a line, using the geospatial search tools, and specifying a buffer of 1000 feet. (as a note, this is an arbitrary number and I’m sure that there are impact guidelines for tunnel boring machines) Okay, now let’s do a search of Art Deco resources in the impact area. And let’s further whittle it down by resource type. I know that I’m not interested in multifamily residential buildings, so I’ll search for that first and then double-click the term in the search box to get the reverse search. So there are ??? Art deco historic resources that are not multifamily residential buildings. Now, how many of these are on the National Register? Let’s use the Advanced Search for this. The Advanced Search gives you a snapshot of all of the available data fields in the system to search against. This is a great tool if you’re visiting an Arches-powered system with no advanced knowledge of what kind of data that that particular instance of Arches holds. So this is sample data, so only 1 resource is returned… and if you’re familiar with LA’s miracle mile, you know that number would be higher with the full set of LA’s data.
We started these examples looking at ARCADE, the City of Lincoln’s Arches implementation. And we’ll end looking at another implementation of Arches. Global Digital Heritage is a non-profit organization that is dedicated to using digital visualization, 3D virtualization, geospatial informatics, and open access solutions to make digital data and 3D models of cultural heritage freely available. In order to serve their mission of making all of their scan data and results accessible, they extended Arches and modified Arches standard report template to include various interactive 3D and video viewers. In the video, we explore the Molino de ??? and a Sketchfab model of the windmill mechanism, a Potree Pointcloud of the site, and a video that brings all of visualizations together. One thing that Alina touched upon is that Arches is an open source platform and the license under which we operate requires that any improvements to the core Arches code be made available to the rest of the Arches community. So, the ability to embed these kinds of viewers is now available to the entire Arches community, and the instructions on how to incorporate them are accessible via a post on our forum, made by Arches community member Vincent Meijer.
So, that was a brief tour of Arches’ more forward-facing capabilities, in the next parts of this webinar series, we’ll be taking a deeper dive and taking a look underneath the hood, so-to-speak….
As mentioned earlier, Part 2 will cover the data management tools that allows you to collect, structure, store and control access to your data. Part 3 will cover the underlying data technology concepts that help to explain how Arches deals with your data and why this is important. Part 4 will be an overview of Arches Technology & Information Architecture, including a look at the various software components and standards. We will be announcing dates for these in the future. Be sure to subscribe to our mailing list, follow us on social media or check our events page on Archesproject.org for more information as it is released.
And now, we’ll move into the Q&A portion of the webinar….