The document discusses open data and the CKAN open data catalog. It provides an overview of CKAN, including its data model and API. It also discusses open data initiatives like data.gov.uk and how CKAN is used to power open data portals around the world.
CKAN is an open-source data management solution for open data. It provides a platform for publishing and exposing metadata through an API and front-end interface. Major governments and communities use CKAN to organize large numbers of datasets. While it has advantages like organizing data in a structured way and providing APIs, its data model does not work for all use cases and there are no strict guidelines for dataset publishing. Extensions allow additional functionality and it can be deployed in various ways.
Data Management Systems for Government Agencies - with CKANSteven De Costa
Over the last two days (5th and 6th of November 2015) I was very happy to present to a range of Victorian Government agencies and give them some context on what data management can do for their organisations.
From first principles we went through why data was important and what infrastructure was already in place via data.vic.gov.au for them to leverage. We covered examples of how other agencies, such as the Office of Environment and Heritage in NSW, are rebuilding their data management system to provide a more efficient pipeline for publishing internal and public data.
As always, I could not help highlighting the awesome leadership of WA Parks and Wildlife and the work done by Florian Mayer as the best case example for reducing the costs and friction often involved with publishing data as contextually marked up knowledge.
We covered a number of scenarios where the concept of resource containers for data were considered. This created valuable feedback which has further galvanized my thoughts about how to further extend CKAN to meet the needs of both private and open data portals, and other forms of realtime or unstructured data.
CKAN is an open source data portal platform that is widely used around the world. It provides features like publishing and managing datasets and metadata, powerful search and API access, support for standards like DCAT, and extensive customization options. CKAN is developed by the Open Knowledge Foundation and a team of developers, with deployments in governments and organizations to make their data accessible and usable.
Drupal, CKAN and Public Data. DrupalGov 08 february 2016Steven De Costa
The document discusses the relationship between Drupal and CKAN, which are both open source platforms used for managing public data. It notes that DKAN is not the same as CKAN, and that some Australian governments use Drupal and CKAN together. It provides examples of how the two systems can integrate, such as single sign-on and pulling taxonomies from CKAN into Drupal websites. The document also outlines key features of CKAN including its data structure, user interface, and API.
The document discusses open data and the CKAN open data catalog. It provides an overview of CKAN, including its data model and API. It also discusses open data initiatives like data.gov.uk and how CKAN is used to power open data portals around the world.
CKAN is an open-source data management solution for open data. It provides a platform for publishing and exposing metadata through an API and front-end interface. Major governments and communities use CKAN to organize large numbers of datasets. While it has advantages like organizing data in a structured way and providing APIs, its data model does not work for all use cases and there are no strict guidelines for dataset publishing. Extensions allow additional functionality and it can be deployed in various ways.
Data Management Systems for Government Agencies - with CKANSteven De Costa
Over the last two days (5th and 6th of November 2015) I was very happy to present to a range of Victorian Government agencies and give them some context on what data management can do for their organisations.
From first principles we went through why data was important and what infrastructure was already in place via data.vic.gov.au for them to leverage. We covered examples of how other agencies, such as the Office of Environment and Heritage in NSW, are rebuilding their data management system to provide a more efficient pipeline for publishing internal and public data.
As always, I could not help highlighting the awesome leadership of WA Parks and Wildlife and the work done by Florian Mayer as the best case example for reducing the costs and friction often involved with publishing data as contextually marked up knowledge.
We covered a number of scenarios where the concept of resource containers for data were considered. This created valuable feedback which has further galvanized my thoughts about how to further extend CKAN to meet the needs of both private and open data portals, and other forms of realtime or unstructured data.
CKAN is an open source data portal platform that is widely used around the world. It provides features like publishing and managing datasets and metadata, powerful search and API access, support for standards like DCAT, and extensive customization options. CKAN is developed by the Open Knowledge Foundation and a team of developers, with deployments in governments and organizations to make their data accessible and usable.
Drupal, CKAN and Public Data. DrupalGov 08 february 2016Steven De Costa
The document discusses the relationship between Drupal and CKAN, which are both open source platforms used for managing public data. It notes that DKAN is not the same as CKAN, and that some Australian governments use Drupal and CKAN together. It provides examples of how the two systems can integrate, such as single sign-on and pulling taxonomies from CKAN into Drupal websites. The document also outlines key features of CKAN including its data structure, user interface, and API.
This document summarizes a presentation about the CKAN data management system. It introduces CKAN, describes its main features for publishing, finding, storing and managing datasets, engaging with users, and customizing the system. It provides examples of how CKAN is used by sites like data.gov.uk and discusses installation instructions, including required packages, configuring PostgreSQL, and deployment options.
CKAN and Australian open data updates for Wikimedia - 7 October 2015Steven De Costa
Another joint presentation covering CKAN and updates from what is going on in Australia surrounding open data initiatives.
This was my first chance to also showcase the work of WA Parks and Wildlife and Florian Mayer's approach as a data pipeline plumber.
CKANCon 2016 and IODC16 were conferences about open data. CKANCon 2016 was a one day conference for CKAN developers that included case studies, lightning talks, and discussions around the CKAN roadmap and moving to the Flask framework. IODC16 was the 4th International Open Data Conference that brought together the global open data community to discuss topics like data in education, agriculture, and more. Selected sessions included videos on tracking earthquake relief funds in Nepal and unlocking private sector data for public good.
Metadata & brokering - a modern approach #2Daniele Bailo
The second episode of metadata and brokering.
Topics covered:
1. additional definition (ontology, relational database and others)
2. the wide picture: data fabric elements from Research Data Alliance (RDA) and possible concrete implementations of those guidelines
Getting to Know CKAN, 24 June 2015, SingaporeSteven De Costa
Presented in Singapore on 24 June 2015 as part of the Infocomm Development Authority Data 101 series.
Provides an overview on what CKAN is and what organisations are using it for. The session also covered a number of topics related to the organisation of published data.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
Sergio Fernández gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
Apache Marmotta is a linked data platform that provides a linked data server, SPARQL server, and development environment for building linked data applications. It uses modular components including a triplestore backend, SPARQL endpoint, LDCache for remote data access, and an optional reasoner. Marmotta is implemented as a Java web application and uses services, dependency injection, and REST APIs.
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...4Science
DSpace-CRIS is an extended version of DSpace that offers a powerful and flexible data model to describe not only publications but all research entities and their relationships. DSpace-CRIS 7 will feature a new Angular UI and REST API in addition to functionality for compliance with OpenAire, integrating publications from external sources, bidirectional ORCID integration, and synchronizing with other systems. DSpace-CRIS also extends data modeling capabilities and provides tools for data quality, metadata management, and extensibility.
Linked Media Management with Apache MarmottaThomas Kurz
The document introduces Apache Marmotta, an open source linked data platform. It provides a linked data server, SPARQL endpoint, and libraries for building linked data applications. Marmotta allows users to easily publish and query RDF data on the web. It also includes features for multimedia management such as semantic annotation of media and extensions for querying over media fragments.
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
This document summarizes a presentation about querying geospatial data in Apache Marmotta. It introduces Apache Marmotta as an open source linked data platform, describes linked data and RDF, and explains how Marmotta supports the GeoSPARQL standard for representing and querying geospatial data on the semantic web through materialization of geospatial data and PostGIS. It provides examples of GeoSPARQL queries in Marmotta and outlines the topological relations and functions supported.
Semantic Media Management with Apache MarmottaThomas Kurz
Thomas Kurz gives a presentation on semantic media management using Apache Marmotta. He plans to create a new Marmotta module that supports storing images, annotating image fragments, and retrieving images and fragments based on annotations. This will make use of linked data platform, media fragment URIs, open annotation model, and SPARQL-MM. The goal is to create a Marmotta module and webapp that extends LDP for image fragments and provides a UI for image annotation and retrieval.
Linked Data from a Digital Object Management SystemUldis Bojars
Lightning talk about generating Linked Data from a digital object management system at the National Library of Latvia. Conference: http://swib.org/swib12/programme.php
Introduction | Categories for Description of Works of Art | CDWA-LITE Kymberly Keeton
The CDWA-Lite XML schema was created by three organizations to describe cultural works and their related information. It aims to provide core descriptive metadata for works while allowing flexibility in the level of detail provided. The schema includes 22 elements, with 19 for descriptive metadata, 3 for administrative metadata, and 9 required elements. It utilizes controlled vocabularies, XML syntax rules, and semantic elements. Records encoded in CDWA-Lite can be harvested through the Open Archives Initiative protocol.
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
This presentation given to University of Iowa Libraries on Nov. 17, 2014, discussing 1) the alphabet soup of metadata standards, e.g. CDM, VRA, CCO, METS, MODS, RDF, including sample tagging and their applications for digital libraries, and 2) why metadata matters. It does not address metadata issues and tools for metadata creation, extraction, transformation, quality control, syndication and ingest.
Enabling access to Linked Media with SPARQL-MMThomas Kurz
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
The document discusses the Linked Data Platform (LDP), which provides best practices and a simple approach for a read-write Linked Data architecture based on HTTP access to web resources described using RDF. LDP defines two types of resources - those whose state is represented in RDF (LDP-RS) and those using other formats (LDP-NR). It also defines different types of containers (LDP-BC, LDP-DC, LDP-IC) that organize contained resources and support creation, modification, and enumeration of members. LDP aims to clarify and extend existing Linked Data principles for standardized access, update, creation and deletion of resources from servers exposing their data as Linked Data.
Sergio Fernández gave a presentation on Marmotta, an open platform for linked data. He discussed Marmotta's main features like supporting read-write linked data and SPARQL/LDPath querying. He also covered Marmotta's architecture, timeline including joining the Apache incubator in 2012, and how its team of 11 committers from 6 organizations work using the Apache Way process. Fernández encouraged participation to help contribute code and documentation to the project.
Development of the irods rados plugin @ iRODS User group meeting 2014mgrawinkel
- The document discusses the development of an iRODS-RADOS resource plugin to integrate an iRODS archival system with a Ceph storage cluster for research data management.
- A key goal is to minimize layers between the iRODS resource server and the RADOS object store for efficient access while maintaining iRODS namespace and access control capabilities.
- The plugin maps the iRODS logical file system tree to the flat RADOS object namespace, handling file operations like creation, reading, writing and deletion via the RADOS API.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
This document summarizes a presentation about the CKAN data management system. It introduces CKAN, describes its main features for publishing, finding, storing and managing datasets, engaging with users, and customizing the system. It provides examples of how CKAN is used by sites like data.gov.uk and discusses installation instructions, including required packages, configuring PostgreSQL, and deployment options.
CKAN and Australian open data updates for Wikimedia - 7 October 2015Steven De Costa
Another joint presentation covering CKAN and updates from what is going on in Australia surrounding open data initiatives.
This was my first chance to also showcase the work of WA Parks and Wildlife and Florian Mayer's approach as a data pipeline plumber.
CKANCon 2016 and IODC16 were conferences about open data. CKANCon 2016 was a one day conference for CKAN developers that included case studies, lightning talks, and discussions around the CKAN roadmap and moving to the Flask framework. IODC16 was the 4th International Open Data Conference that brought together the global open data community to discuss topics like data in education, agriculture, and more. Selected sessions included videos on tracking earthquake relief funds in Nepal and unlocking private sector data for public good.
Metadata & brokering - a modern approach #2Daniele Bailo
The second episode of metadata and brokering.
Topics covered:
1. additional definition (ontology, relational database and others)
2. the wide picture: data fabric elements from Research Data Alliance (RDA) and possible concrete implementations of those guidelines
Getting to Know CKAN, 24 June 2015, SingaporeSteven De Costa
Presented in Singapore on 24 June 2015 as part of the Infocomm Development Authority Data 101 series.
Provides an overview on what CKAN is and what organisations are using it for. The session also covered a number of topics related to the organisation of published data.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
Sergio Fernández gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
Apache Marmotta is a linked data platform that provides a linked data server, SPARQL server, and development environment for building linked data applications. It uses modular components including a triplestore backend, SPARQL endpoint, LDCache for remote data access, and an optional reasoner. Marmotta is implemented as a Java web application and uses services, dependency injection, and REST APIs.
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...4Science
DSpace-CRIS is an extended version of DSpace that offers a powerful and flexible data model to describe not only publications but all research entities and their relationships. DSpace-CRIS 7 will feature a new Angular UI and REST API in addition to functionality for compliance with OpenAire, integrating publications from external sources, bidirectional ORCID integration, and synchronizing with other systems. DSpace-CRIS also extends data modeling capabilities and provides tools for data quality, metadata management, and extensibility.
Linked Media Management with Apache MarmottaThomas Kurz
The document introduces Apache Marmotta, an open source linked data platform. It provides a linked data server, SPARQL endpoint, and libraries for building linked data applications. Marmotta allows users to easily publish and query RDF data on the web. It also includes features for multimedia management such as semantic annotation of media and extensions for querying over media fragments.
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
This document summarizes a presentation about querying geospatial data in Apache Marmotta. It introduces Apache Marmotta as an open source linked data platform, describes linked data and RDF, and explains how Marmotta supports the GeoSPARQL standard for representing and querying geospatial data on the semantic web through materialization of geospatial data and PostGIS. It provides examples of GeoSPARQL queries in Marmotta and outlines the topological relations and functions supported.
Semantic Media Management with Apache MarmottaThomas Kurz
Thomas Kurz gives a presentation on semantic media management using Apache Marmotta. He plans to create a new Marmotta module that supports storing images, annotating image fragments, and retrieving images and fragments based on annotations. This will make use of linked data platform, media fragment URIs, open annotation model, and SPARQL-MM. The goal is to create a Marmotta module and webapp that extends LDP for image fragments and provides a UI for image annotation and retrieval.
Linked Data from a Digital Object Management SystemUldis Bojars
Lightning talk about generating Linked Data from a digital object management system at the National Library of Latvia. Conference: http://swib.org/swib12/programme.php
Introduction | Categories for Description of Works of Art | CDWA-LITE Kymberly Keeton
The CDWA-Lite XML schema was created by three organizations to describe cultural works and their related information. It aims to provide core descriptive metadata for works while allowing flexibility in the level of detail provided. The schema includes 22 elements, with 19 for descriptive metadata, 3 for administrative metadata, and 9 required elements. It utilizes controlled vocabularies, XML syntax rules, and semantic elements. Records encoded in CDWA-Lite can be harvested through the Open Archives Initiative protocol.
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
This presentation given to University of Iowa Libraries on Nov. 17, 2014, discussing 1) the alphabet soup of metadata standards, e.g. CDM, VRA, CCO, METS, MODS, RDF, including sample tagging and their applications for digital libraries, and 2) why metadata matters. It does not address metadata issues and tools for metadata creation, extraction, transformation, quality control, syndication and ingest.
Enabling access to Linked Media with SPARQL-MMThomas Kurz
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
The document discusses the Linked Data Platform (LDP), which provides best practices and a simple approach for a read-write Linked Data architecture based on HTTP access to web resources described using RDF. LDP defines two types of resources - those whose state is represented in RDF (LDP-RS) and those using other formats (LDP-NR). It also defines different types of containers (LDP-BC, LDP-DC, LDP-IC) that organize contained resources and support creation, modification, and enumeration of members. LDP aims to clarify and extend existing Linked Data principles for standardized access, update, creation and deletion of resources from servers exposing their data as Linked Data.
Sergio Fernández gave a presentation on Marmotta, an open platform for linked data. He discussed Marmotta's main features like supporting read-write linked data and SPARQL/LDPath querying. He also covered Marmotta's architecture, timeline including joining the Apache incubator in 2012, and how its team of 11 committers from 6 organizations work using the Apache Way process. Fernández encouraged participation to help contribute code and documentation to the project.
Development of the irods rados plugin @ iRODS User group meeting 2014mgrawinkel
- The document discusses the development of an iRODS-RADOS resource plugin to integrate an iRODS archival system with a Ceph storage cluster for research data management.
- A key goal is to minimize layers between the iRODS resource server and the RADOS object store for efficient access while maintaining iRODS namespace and access control capabilities.
- The plugin maps the iRODS logical file system tree to the flat RADOS object namespace, handling file operations like creation, reading, writing and deletion via the RADOS API.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
This document provides an overview of the HathiTrust Research Center (HTRC) architecture. It describes the key components including a portal for access, an agent for application submission, a registry for storing metadata, a secure API for programmatic access, and storage of data in a Cassandra cluster with indexing in Solr. It also outlines use cases and discusses how the architecture enables secure, non-consumptive research on copyrighted works stored in the HathiTrust digital library.
Data-center SDN is located in St.Petersburg, Russia. It is one of the largest and most modern data-centers in the North-West of Russia, constructed and operated in accordance with the Uptime Institute TIER III level recommendations. PCI DSS certified.
St.Petersburg – is one of the main gateways of IP connectivity between Russian and Europe. Data-center SDN can provide you the best connectivity with the largest Russian telecom operators (Beeline, Megafon, MTS) as well as with the international operators Orange and RETN. The data-center has direct interconnections with all the main St.Petersburg and Moscow internet exchange points.
Ownership of land ground area 7,5 acre
The design capacity of the data-center – 1437 racks 42-48U
Administrative building – 1500 sq m
Utility power supply 10 MW with increase option up to 14 МW (2 feeders 10 kV). High voltage distribution station 10 kV
Up to 8 diesel-rotary UPS, 1600 кVА each
Load per a rack – up to 40 KW
Total cooling capacity - 8,4 MW
Fuel storage 2 х 50 м3
5 sequrity perimeters
Estimated power usage effectiveness (PUE) = 1,03 – 1,2
Information Technology Innovator David Ward 2011ward2dr
David Ward is an experienced senior technology executive with over 20 years of experience in leadership roles at major financial institutions. He has expertise in areas such as technology strategies, business transformation, enterprise systems, infrastructure, and mergers and acquisitions. Throughout his career, he has delivered value to shareholders and improved customer satisfaction. Currently, he is seeking new opportunities to apply his experience and drive innovation.
The document introduces CALIENT Technologies' VPOD (Virtual POD) solution using their S-Series 3D-MEMS Optical Circuit Switch. The VPOD allows for compute resources to be shared between physical PODs at the optical layer, improving server utilization from typically less than 40% to over 50%. This can save tens or hundreds of millions of dollars in CAPEX for large data centers. The CALIENT LightConnect switching fabric and manager allow dynamic allocation of resources between VPODs on demand, improving efficiency without impacting performance.
Presentation data center and cloud architecturexKinAnx
This document outlines an enterprise network architecture with IBM networking equipment. It includes small branch offices connected via SOHO routers, large branches with branch routers, teleworkers connecting remotely via SSL VPN, and a headquarters site connecting various IBM servers and systems. The network uses MPLS/VPLS WAN connectivity between sites with firewalls, intrusion prevention, and security management provided at each location.
3D IT Architecture is a new revolutionary way to visualize your IT architecture to your stakeholders. Build a complete 3D model of your IT landscape by putting together a series of 3D objects. Just print it, cut it and glue it together. In this presentation: Data Center
These slides will cover the essential characteristics of cloud computing in the data center. Why should you consider adopting cloud architecture? We'll show you.
Data-Ed Webinar: Data Architecture RequirementsDATAVERSITY
Data architecture is foundational to an information-based operational environment. It is your data architecture that organizes your data assets so they can be leveraged in your business strategy to create real business value. Even though this is important, not all data architectures are used effectively. This webinar describes the use of data architecture as a basic analysis method. Various uses of data architecture to inform, clarify, understand, and resolve aspects of a variety of business problems will be demonstrated. As opposed to showing how to architect data, your presenter Dr. Peter Aiken will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Takeaways:
Understanding how to contribute to organizational challenges beyond traditional data architecting
How to utilize data architectures in support of business strategy
Understanding foundational data architecture concepts based on the DAMA DMBOK
Data architecture guiding principles & best practices
We are excited to announce the release of MetaFabric, a new end-to-end architecture that encompasses Juniper Network’s switching, routing, software and security portfolios. It provides the freedom and flexibility to use any technology, any protocol, any orchestration platform, any SDN controller, for any data center application with the assurance of interoperability and minimal disruption.
The walls of the data center are dissolving. Every device on the planet will eventually contain a computer, be connected to the net & be managed via intelligent software. IoT will change everything, especially how products are manufactured, sold & serviced. This presentation covers the impact that technology will have on society over the next 7 to 10 years.
Architectural Evolution Starting from HadoopSpagoWorld
Speech given by Monica Franceschini, Solution Architecture Manager at the Big Data Competencey Center of Engineering Group, in occasion of the Data Driven Innovation Rome 2016 - Open Summit.
The document discusses using AWS services like EC2, VPC, Auto Scaling and others to build a hybrid architecture that integrates an organization's on-premises data center with the AWS cloud. It provides overviews of EC2 instance types, Auto Scaling capabilities, and how to use VPC to connect networks and define routing and security rules. The hybrid model allows leveraging AWS' elastic infrastructure while integrating it with existing IT systems, enabling innovation without being constrained by data center capacity limits or costs.
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around Cisco-based big data architectures and Hortonworks Data Platform to get you started on building your modern data architecture.
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
A Scalable, Commodity Data Center Network Architecture
source : http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&cad=rja&uact=8&ved=0CFUQFjAF&url=http%3A%2F%2Fpages.cs.wisc.edu%2F~akella%2FCS740%2FF08%2FDataCenters.ppt&ei=BVxPVIy_Bse68gWwy4HAAw&usg=AFQjCNGYUB_rhG5dbCNAJyvYrhuZ0L7upg&sig2=yYxOytU7vpT7TT8-qD48CA&bvm=bv.77880786,d.dGc
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...Maria Demitras
Implementing data center best practices and using CFD models allowed Great Lakes to suggest a data center layout that would improve PUE and efficiency. Jason Hallenbeck, DCDC, explains the concepts behind how data center floor design can save or kill your PUE and cooling efficiency—as found in this proposal. Find Jason presenting at the BICSI Fall Conference on September 14th at 1:30 pm.
Finding efficient methods of deploying redundancy is paramount especially in multi-tenant and colocation data centers. Here we cover the importance of a design that allows for uptime and the ability to cost-effectively "grow as you go".
Data Center Free Cooling in the Middle EastSyskaHennessy
Analysis of the options to provide cost effective data cooling in Middle East countries.
By Noriel Ong, ASEAN Eng., MIET, PMP, PQP, ATD; Syska Hennessy Group.
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
OpenAIRE Interoperability Workshop (8 Feb. 2013).
DataCite – Bridging the gap and helping to find, access and reuse data – Herbert Gruttemeier, INIST-CNRS
EUDAT is a cross-disciplinary data infrastructure project in Horizon 2020 that aims to provide a collaborative framework for managing the exponential growth of research data. It brings together several European research communities and over 25 user communities to develop shared services and solutions for storing, finding, accessing, and analyzing large amounts of complex research data. Some key services EUDAT provides include a metadata catalogue, persistent identifiers, data staging between storage and high performance computing, a simple store for uploading and sharing data, and safe replication of data across multiple sites. The goals of EUDAT in Horizon 2020 are to consolidate and improve its core services, ensure financial sustainability, enhance interoperability with other e-infrastructures, and help bridge national and European data solutions.
Lecture at an event "SEEDS Kick-off meeting", FORS, Lausanne, Switzerland.
Related materials: http://www.snf.ch/en/funding/programmes/scopes/Pages/default.aspx
http://seedsproject.ch/?page_id=368
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
This document discusses the 5 year evolution of Dataverse, an open source data repository platform. It began as a tool for collaborative data curation and sharing within research teams. Over time, features were added like dataset version control, APIs, and integration with other systems. The document outlines challenges around maintenance and sustainability. It also covers efforts to improve Dataverse's interoperability, such as integrating metadata standards and controlled vocabularies, and making datasets FAIR compliant. The goal is to establish Dataverse as a core component of the European Open Science Cloud by improving areas like software quality, integration with tools, and standardization.
This presentation gives details on technologies and approaches towards exploiting Linked Data by building LD applications. In particular, it gives an overview of popular existing applications and introduces the main technologies that support implementation and development. Furthermore, it illustrates how data exposed through common Web APIs can be integrated with Linked Data in order to create mashups.
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
This document discusses the .Stat system, which serves as a central repository for validated statistics and metadata. .Stat connects data production, sharing, and dissemination processes. It provides three key functional areas: a data upload engine, a data delivery engine, and a data browser. .Stat can be mapped to stages in the Generic Statistical Business Process Model and incorporates standards like SDMX for dissemination, data exchange, and internal data sharing. The document outlines .Stat's current role and future plans to further support SDMX artifacts, ingest, registries, and semantic web opportunities.
Tim Davies, SMART Infrastructure Facility E-Research Coordinator, presented the SMART Data Management Systement as part of the SMART Seminar Series on Thursday, 5th March 2015.
The document provides an introduction to PREMIS (Preservation Metadata: Implementation Strategies) and its application in audiovisual archives. It discusses the challenges of digital preservation and the need for preservation metadata to ensure long-term access. It then summarizes the key aspects of PREMIS, including the PREMIS Data Dictionary, its relationship to the OAIS reference model, the five interacting entities in the PREMIS data model, and issues around implementing PREMIS in archives.
Vinod Chachra discussed improving discovery systems through post-processing harvested data. He outlined key players like data providers, service providers, and users. The harvesting, enrichment, and indexing processes were described. Facets, knowledge bases, and branding were discussed as ways to enhance discovery. Chachra concluded that progress has been made but more work is needed, and data and service providers should collaborate on standards.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
The document proposes a five-layer semantic grid architecture with a knowledge layer on top of the Gridbus broker. The knowledge layer includes two modules: semantic description and discovery. Semantic description represents grid domain knowledge in an ontology template using Protege-OWL APIs. Semantic discovery uses the Algernon inference engine to retrieve resource information based on user queries. This knowledge layer is implemented in the Gridbus broker and can support popular middleware like Globus and Alchemi.
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT
The document summarizes services provided by EUDAT, a collaborative data infrastructure funded by the European Union. It describes EUDAT's B2ACCESS identity and access management service, which allows users to access EUDAT and other services using their existing credentials. It also summarizes EUDAT's B2FIND data discovery service, B2SHARE data sharing and preservation service, and B2DROP file sharing service. It outlines their key features and integration with each other. Future plans include further deployments, improved interoperability, and enhanced user experiences across EUDAT services.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
In an expert webinar on April 15th 2020 we discussed (in Finnish) how the FAIR data principles affect service development in RDM services. I presented some relevant outputs from the FAIRsFAIR project. These are the slides (in English). The webinar will be published on the fairdata.fi service site https://www.fairdata.fi/koulutus/koulutuksen-tallenteet/
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Technical integration of data repositories status and challengesvty
This document discusses technical integration of data repositories, including:
- Previous integration initiatives focused on metadata integration using OAI-PMH and ResourceSync protocols, as well as aggregators like OpenAIRE.
- Challenges to integration include different levels of software/service maturity, maintenance of distributed applications, and use of common standards and vocabularies.
- Potential integration efforts could focus on improving FAIRness, metadata/data flexibility, and connections between repositories, software, and computing resources to better enable reuse of EOSC data and services.
Similar to EUDAT data architecture and interoperability aspects – Daan Broeder (20)
10th OpenAIRE Content Providers Community CallOpenAIRE
The document discusses OpenAIRE's Usage Counts service, which tracks usage and collects COUNTER reports to provide analytics on the usage of research outputs. It introduces the new architecture and workflows that power the service, and shows examples of usage counts data in action for content providers and individual research items. Finally, it outlines the future plans for the service, including counting more research products, moving to the latest COUNTER standards, offering additional analytics, and building a Usage Counts Hub.
OpenAIRE Content Providers Community Call, November 4th, 2020
This call was focused on the PROVIDE future developments, functionalities wishlist and PROVIDE service in EOSC.
Was also an opportunity to share the most recent updates and novelties in the OpenAIRE Content Provider Dashboard, and to get feedback from community.
Recordings: https://youtu.be/wY4fOS767Us
Follow the Community activities at https://www.openaire.eu/provide-community-calls
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE
Openness is the success factor for EOSC. OpenAIRE has been working in delivering an open access scholarly communication in Europe for the past 10 years and we now present how our work fits into the EOSC core developments
OpenAIRE Content Providers Community Call, October 7th, 2020
This call was focused on the OpenAIRE Broker Service, specifying how the service works to deploy the enrichment events to the Content Providers managers.
Was also an opportunity to share the most recent updates and novelties in the OpenAIRE Content Provider Dashboard, and to get feedback from community.
Recording: https://youtu.be/3sF4B58EGcs
Follow the Community activities at https://www.openaire.eu/provide-community-calls
OpenAIRE Content Providers Community Call, July 1st, 2020
This call was focused on Data Repositories namely the OpenAIRE Research Graph and Data Repositories, the OpenAIRE Content Acquisition Policy, and the Guidelines for Data Archive Managers.
Was also an opportunity to share the most recent updates and novelties in the OpenAIRE Content Provider Dashboard, and to get feedback from community.
Follow the Community activities at https://www.openaire.eu/provide-community-calls
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)OpenAIRE
This document provides an overview of the Open Research Gateway for the ELIXIR-GR infrastructure. It discusses how the gateway acts as a single entry point to all research products from ELIXIR-GR, including publications, datasets, software, and more. Researchers can deposit and link their work through the gateway to practice open science. Statistics, reporting, and APIs are also available to monitor impact and advance open research. The team behind the gateway is working to improve customization and user guidance to better support the ELIXIR-GR community.
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)OpenAIRE
OpenAIRE is a European infrastructure that helps stakeholders comply with open access policies by providing tools and services. It operates repositories, dashboards, and tools to help share and reuse research outputs in accordance with FAIR principles. OpenAIRE also coordinates activities through national open access desks and outreach to promote open science practices. Researchers can use OpenAIRE to publish open access works, deposit data, write data management plans, and link research outputs.
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
The Research Data Alliance (RDA) is an international organization focused on data sharing across disciplines. It has over 8,600 members from 137 countries working to reduce barriers to data sharing through developing infrastructure and community activities. RDA has numerous active interest groups and working groups focused on issues like specific scientific domains, data reference and sharing, community needs, data stewardship, and basic infrastructure. One recent focus is guidelines for data sharing during the COVID-19 pandemic.
1) A new version of the OpenAIRE Provide dashboard demo is available.
2) Several speakers shared use cases of the OpenAIRE Provide service, including from OpenstarTs, Serbian repositories, the University of Minho, and the Universidade Católica Portuguesa.
3) The agenda concluded with an invitation for comments and questions.
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
Presentation by Jacques Flores Dourojeanni (Research Data Management Consultant Utrecht University Library), as delivered during the OpenAIRE Legal Policy Webinar series on May 4th 2020.
More information and recordings: https://www.openaire.eu/item/openaire-legal-policy-webinars
20200504_Research Data & the GDPR: How Open is Open?OpenAIRE
Presentation by Prodromos Tsiavos (Senior Legal Advisor - ARC/ Director - Onassis Group) as delivered during the OpenAIRE Legal Policy Webinar series on May 4th 2020.
More information and recordings: https://www.openaire.eu/item/openaire-legal-policy-webinars
20200504_Data, Data Ownership and Open ScienceOpenAIRE
Presentation by Thomas Margoni (Senior Lecturer in Intellectual Property and Internet Law, Co-director, CREATe, University of Glasgow) as delivered during the OpenAIRE Legal Policy Webinar series on May 4th 2020.
More information and recordings: https://www.openaire.eu/item/openaire-legal-policy-webinars
20200429_Research Data & the GDPR: How Open is Open? (updated version)OpenAIRE
This document discusses how the General Data Protection Regulation (GDPR) applies to scientific research. It defines key GDPR concepts, explains how scientific research is defined under the regulation, and discusses the legal bases and purposes that can justify data processing for research. It also addresses how data subject rights may be limited for research purposes, and analyzes several cases involving issues like data sharing, further processing of data, and handling of health and publicly available data in the context of research.
20200429_Data, Data Ownership and Open ScienceOpenAIRE
Presentation by Thomas Margoni (Senior Lecturer in Intellectual Property and Internet Law, Co-director, CREATe, University of Glasgow) as delivered during the OpenAIRE Legal Policy Webinar series on April 29th 2020.
More information and recordings: https://www.openaire.eu/item/openaire-legal-policy-webinars
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
Presentation by Jacques Flores Dourojeanni (Research Data Management Consultant Utrecht University Library), as delivered during the OpenAIRE Legal Policy Webinar series on April 29th 2020.
More information and recordings: https://www.openaire.eu/item/openaire-legal-policy-webinars
COVID-19: Activities, tools, best practice and contact points in GreeceOpenAIRE
Presentation from the webinar organized by the Greek OpenAIRE and RDA Nodes (Athena RC) and Elixir-GR to inform participants of EU and national efforts, in collaboration with the following research organizations: Flemming, CERTH, HEAL-Link, Demokritos, Univ. of Athens (Medical School).
Presentation of the 2nd Content Providers Community Call, targeting the following topics: 1) OpenAIRE Content provider dashboard updates; Main topic: DSpace-CRIS for OpenAIRE: implementation of the CRIS guidelines and beyond; 3) Community questions & comments.
Presentation of the 2nd Content Providers Community Call, targeting the following topics: 1) OpenAIRE Content provider dashboard updates;
2) OpenAIRE aggregation and enrichment processes: specifications and good practices;
3) Community questions & comments.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Ukraine
Під час доповіді відповімо на питання, навіщо потрібно підвищувати продуктивність аплікації і які є найефективніші способи для цього. А також поговоримо про те, що таке кеш, які його види бувають та, основне — як знайти performance bottleneck?
Відео та деталі заходу: https://bit.ly/45tILxj
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
EUDAT data architecture and interoperability aspects – Daan Broeder
1. EUDAT
A European Collaborative Data Infrastructure
Daan Broeder
The Language Archive – MPI for Psycholinguistics
CLARIN, DASISH, EUDAT
OpenAire Interoperability Workshop
Braga, Feb. 8, 2013
2. EUDAT Key facts
Content
Project Name
EUDAT – European Data
Start date
1st October 2011
Duration
36 months
Budget
16,3 M€ (including 9,3 M€ from the EC)
EC call
Call 9 (INFRA‐2011‐1.2.2): Data infrastructure for e‐Science (11.2010)
Participants
25 partners from 13 countries (national data centers, technology
providers, research communities, and funding agencies)
Objectives
“To deliver cost‐efficient and high quality Collaborative Data
Infrastructure (CDI) with the capacity and capability for meeting
researchers’ needs in a flexible and sustainable way, across geographical
and disciplinary boundaries.”
2
5. EUDAT Core Service Areas
Community-oriented services
•
•
•
•
Simple Data Acces and upload
Core services are building blocks of EUDAT‘s
Long term preservation
Common Data Infrastructure
Shared workspaces
mainly included on bottom layer of data services
Execution and workflow (data mining,
etc.)
• Joint metadata and data visibility
Enabling services (making use of
existing services where possible
• Persistent identifier service (EPIC,
DataCite)
• Federated AAI service
• Network Services
• Monitoring and accounting
6. First Service Cases
List of 6 service/use cases identified
Safe replication: Allow communities to safely replicate data to selected data centers for
storage and do this in a robust, reliable and highly available way.
Dynamic replication: Perform (HPC) computations on the replicated data. Move (part of)
the safely replicated data to a workspace close to powerful machines and move the results
back into the archives.
Metadata: A joint metadata domain for all data that is stored by EUDAT data centers by
harvesting metadata records for all data objects from the communities. Allow to have a
catalogue to demonstrate what EUDAT stores, and to have a registry which can be used
for automatic operations such as data mining.
Research data store: A function that will help researchers mediated by the participating
communities to upload and store data which is not part of the officially handled data sets of
the community.
AAI: A solution for a working AAI system in a federation scenario.
PID: a robust, highly available and effective PID system that can be used within the
communities and by EUDAT.
6
8. EUDAT Data & Community Centers
EUDAT
Community
center
D
EUDAT
PID
Service
Harvesting
metadata
EUDAT
Community
center
EUDAT
Metadata
Service
D
EUDAT
Simple
‐store
D
D
EUDAT
data
center
LTA
facility
D
D
PRACE
HPC
center
D
EUDAT
HPC
center
HPC
workspace
HPC
workspace
EUDAT
center
registry
EUDAT
data
center
D
EUDAT
data
center
LTA
facility
9. EUDAT – Architecture Joint Metadata Catalogue
non‐EUDAT
commuities
Joint metadata will be developed as a series of
prototypes.
• Using available XML formatted metadata via OAIPMH
• ENES and CLARIN communities
• Simple mapping of a limited set of md elements
Search Interface
Safe
Replication
mapping
rules
Index
OAI PMH
Metadata Database
incl. PIDs/URLs
Simple
Store
Metadata Store
10. User perspective joint metadata catalogue
•
From the user’s perspective the following points are considered
important:
–
–
–
–
What metadata terminology can be used in the catalogue interface
What are the browsable/searchable dimensions
How is the data presented e.g. what visualization options are available
What added value can there be?
•
•
•
•
•
•
•
•
Cross community data
Commenting facility (also for metadata quality improvement)
Direct visibility of data uploaded in the EasyStore
Good visualization of different community data types
Linking to original metadata records and resource
Harmonized data citation facility
Export as LoD
Consider also its use for the community own purposes
– Starting communities, that do not yet have a proper catalogue
11. node
node
node
node
node
node node
node
node
node
Metadata Catalogue
WWW
Community A
MD repository
EUDAT xml
Community B
community
MD repository
OAIPMH
server
OAI Harvester
CKAN
OAIPMH
server
Adapter
schema A
Adapter
Schema B
EUDAT xml
community
node
node
node
node
node node
node
node
node
node
PostgreSQL
We need fast results
• Based on community practices
• No new schema developments
• Using metadata that is already out there
• Using existing and proven technology if possible
13. CLARIN particulars
WWW
OAI Harvester
different schema
Adapter
Adapter
schema A
Adapter
schema A
schema Z
CKAN
Adapter
Schema a
PostgreSQL
CLARIN uses CMDI as a metadata infrastructure
Users can specify metadata schema as they see
fit to describe a particular resource type.
This results in many (>>100) metadata schemas
14. CLARIN uses CMDI as a metadata
infrastructure
Users can specify metadata schema as
they see fit to describe a particular
resource type.
This results in many (>>100) metadata
schemas
CLARIN particulars
OAI Harvester
different schema
Adapter
generator
This is also a test if this
“pragmatic ontology”
approach can be applied
in general
Adapter
Adapter
schema A
Adapter
schema A
schema A
Metadata
schema
registry
Metadata
concept
registry
Releation
registry
CLARIN metadata
framework
CKAN
EuDat Primary
MD Store (EPS)
WWW
PostgreSQL
15. EUDAT & Interoperability
In this context we use a ‘narrow’ definition from the IEEE Glossary that defines
interoperability as:
the ability of two or more systems or components to exchange information and to use the
information that has been exchanged
Interoperability consequences within EUDAT between data and community centers
with maybe some consequences for the community practices
•
Use of iRods as a data replication software component.
–
•
•
Use of FIM and/or X509 certificates to access EUDAT services
Use of a specific PID system; with some added information items
–
•
EPIC, DataCite both based on Handle System technology
Use of OAI-PMH harvesting of metadata:
–
•
Used within that a set of policies (general) governing the exchange of data between the
centers.
possibly developing an EUDAT standardized use of a metadata concept and relation registry
Other technology choices for data-infra components as CKAN as a metadata
catalogue en INVENIO as CMS for the SimpleStore are not so intruisive since
they are central facilities that can be replaced
16. Interoperability of ideas
• There are many communities active with data
management
• Developing ideas, tools & infrastructure or both
• To make ideas come together or at least get
informed about differences. We need to talk!
• RDA – Research Data Aliance
– Supported by EC, US, Aus
– EUDAT, OpenAire via the iCordi project
– Bottom up, community driven, (including data-scientists)
19. Interlinking data and publications
Identifiers for Actors (ORCID)
data curator
data depositor
API
datasets &
metadata
reviewer
author
API
publications
Identifiers for data & publications (HS, DOI, URN)
editor
21. Interoperability
From Wikipedia, the free encyclopedia
Interoperability is the ability of diverse systems and organizations to work together (inter-operate).
The term is often used in a technical systems engineering sense, or alternatively in a broad sense,
taking into account social, political, and organizational factors that impact system to system
performance.
While interoperability was initially defined for IT systems or services and only allows for information
to be exchanged (see definition below), a more generic definition could be this one:
Interoperability is a property of a product or system, whose interfaces are completely understood, to work with
other products or systems, present or future, without any restricted access or implementation.
This generalized definition can then be used on any system, not only information technology system. It
defines several criteria that can be used to discriminate between systems that are "really" interoperable and systems that are sold as such but are not because they don't respect one of the
aforementioned criteria, namely:
- non-disclosure of one or several interfaces
- implementation or access restriction built in the product/system/service
The IEEE Glossary defines interoperability as:
the ability of two or more systems or components to exchange information and to use the information that has
been exchanged.[1]
James A. O'Brien and George M. Marakas define interoperability as:[2]
Being able to accomplish end-user applications using different types of computer systems, operating systems,
and application software, interconnected by different types of local and wide area networks.