Information Consolidation and Concentration (WP4 ForgetIT 1st year review)ForgetIT Project
The document discusses techniques for information condensation and consolidation developed as part of the ForgetIT project. It describes the role of the Extractor and Condensator components in extracting and processing information from textual and multimedia data. The Extractor performs tasks like named entity extraction and visual feature extraction from images. The Condensator then uses the extracted information to generate summaries, for example by performing text summarization or clustering images. The document also provides examples of the project's achievements in year 1, which included developing services for text and image analysis and integrating some techniques into other work packages.
Joint Information and Preservation Management (WP5 ForgetIT 1st year review)ForgetIT Project
The document discusses work done in Year 1 of the ForgetIT project to improve preservation by combining managed forgetting and contextualized remembering. Key achievements include identifying the need for a context-aware preservation manager, automatically preparing submission information packages (SIPs), and enabling smooth transitions between systems using CMIS. Focus for Year 2 includes further improving information and preservation management workflows, designing and implementing the context-aware preservation manager, and handling preservation information exchange.
Contextualization / Decontextualization (WP6 ForgetIT 1st year review)ForgetIT Project
This document discusses contextualization as part of the ForgetIT project. It presents a formal model of contextualization that defines context, interpretation, and the contextualization process. The focus of the first year is to review the state-of-the-art, develop a formal ForgetIT model of contextualization, and create prototype contextualization components. Examples are provided of contextualizing images by finding similar collections and adding context, and contextualizing text by disambiguating concepts and storing surrounding context from an ontology.
Personal Preservation (WP9 ForgetIT 1st year review)ForgetIT Project
This document summarizes the focus and achievements of the first year of the ForgetIT project, which aims to develop techniques for personal information preservation and managed forgetting. In the first year, the project focused on identifying scenarios and requirements, extending the underlying Personal Information Model ontology, and developing initial prototypes and mock-ups. Prototypes included extending the Semantic Desktop infrastructure and developing a photo organization tool leveraging the Personal Information Model. The document outlines the role of the Semantic Desktop components in the overall architecture and demonstrates early prototypes and datasets.
The Preserve-or-Forget Reference Model and Framework (WP8 ForgetIT 1st year r...ForgetIT Project
Design of the Preserve or-Forget framework architecture, definition of the integration approach for all the components developed in the other technical work packages and definition a preliminary reference model.
Managed Forgetting (WP3 - ForgetIT 1st year review)ForgetIT Project
Data model and a computation method based on Semantic Web technologies, Integration to PIMO semantic desktop and Preserve-or-Forget middleware Exploratory studies,
Collective memory analysis of public events in Wikipedia, High-impact feature analysis for content retention in the Social Web, Feature selection for efficiency and scalability
Information Consolidation and Concentration (WP4 ForgetIT 1st year review)ForgetIT Project
The document discusses techniques for information condensation and consolidation developed as part of the ForgetIT project. It describes the role of the Extractor and Condensator components in extracting and processing information from textual and multimedia data. The Extractor performs tasks like named entity extraction and visual feature extraction from images. The Condensator then uses the extracted information to generate summaries, for example by performing text summarization or clustering images. The document also provides examples of the project's achievements in year 1, which included developing services for text and image analysis and integrating some techniques into other work packages.
Joint Information and Preservation Management (WP5 ForgetIT 1st year review)ForgetIT Project
The document discusses work done in Year 1 of the ForgetIT project to improve preservation by combining managed forgetting and contextualized remembering. Key achievements include identifying the need for a context-aware preservation manager, automatically preparing submission information packages (SIPs), and enabling smooth transitions between systems using CMIS. Focus for Year 2 includes further improving information and preservation management workflows, designing and implementing the context-aware preservation manager, and handling preservation information exchange.
Contextualization / Decontextualization (WP6 ForgetIT 1st year review)ForgetIT Project
This document discusses contextualization as part of the ForgetIT project. It presents a formal model of contextualization that defines context, interpretation, and the contextualization process. The focus of the first year is to review the state-of-the-art, develop a formal ForgetIT model of contextualization, and create prototype contextualization components. Examples are provided of contextualizing images by finding similar collections and adding context, and contextualizing text by disambiguating concepts and storing surrounding context from an ontology.
Personal Preservation (WP9 ForgetIT 1st year review)ForgetIT Project
This document summarizes the focus and achievements of the first year of the ForgetIT project, which aims to develop techniques for personal information preservation and managed forgetting. In the first year, the project focused on identifying scenarios and requirements, extending the underlying Personal Information Model ontology, and developing initial prototypes and mock-ups. Prototypes included extending the Semantic Desktop infrastructure and developing a photo organization tool leveraging the Personal Information Model. The document outlines the role of the Semantic Desktop components in the overall architecture and demonstrates early prototypes and datasets.
The Preserve-or-Forget Reference Model and Framework (WP8 ForgetIT 1st year r...ForgetIT Project
Design of the Preserve or-Forget framework architecture, definition of the integration approach for all the components developed in the other technical work packages and definition a preliminary reference model.
Managed Forgetting (WP3 - ForgetIT 1st year review)ForgetIT Project
Data model and a computation method based on Semantic Web technologies, Integration to PIMO semantic desktop and Preserve-or-Forget middleware Exploratory studies,
Collective memory analysis of public events in Wikipedia, High-impact feature analysis for content retention in the Social Web, Feature selection for efficiency and scalability
Digital dark age - Are we doing enough to preserve our website heritage?Olivier Dobberkau
While creating web sites we often see their lifespan only for up to 3 to 5 years. With every relaunch
and overhaul we are confronted with content migration and short term motives to delete maybe
valuable content. On the other hand what is the value of our content? Can we assess it
meaningfully? Do we really know in which context it is used?
Scientist stated that where as we are producing more and more digital artifacts we fail to see that
we are not keeping an eye on preserving it in a manner that will enable us to find and use it in more
that a few years in the future.
This talk will introduce you the aspects of digital preservation with a special look on how TYPO3 is
preparing to help it users to create a digital heritage.
This Talk is part of the "Concise Preservation by combining Managed Forgetting and
Contextualized Remembering" Project ForgetIT. The ForgetIT project is funded by the EC within the
7th Framework Programme under the objective "Digital Preservation" (GA 600826).
Foundations of Forgetting and Remembering (WP2 - ForgetIT 1st year review)ForgetIT Project
Conceptual foundations of human and organizational remembering and forgetting in order to identify aspects of human memory and forgetting that might be helpful in the design of a digital preservation and managed forgetting system.
This document outlines a project between the Odum Institute and IQSS Dataverse team to integrate the Dataverse data repository system with iRODS, an open source data management system. The goals are to expand storage options for Dataverse, integrate curation workflows, and connect Dataverse to national research data infrastructure. A prototype will be developed to enable automated ingest of data from Dataverse to iRODS using rules and APIs. Challenges include migrating both systems to newer versions while maintaining authentication between them. An initial prototype is expected in August 2015.
The document outlines the history of building a big data platform from 2014 to 2016, starting with building a Hadoop cluster in 2014, creating the first data report page in 2015, launching products based on big data also in 2015, developing data analysis products in 2016, and making changes to the platform in 2016. It then transitions to discussing the current state of the big data platform.
The document discusses the NIF Data Federation and Concept Mapping Tool. The NIF Data Federation provides the ability to search across individually hosted neuroscience databases and datasets. It currently indexes over 232 databases containing over 358 million records. The Concept Mapping Tool was developed to manage federated resources by setting up database mappings and exporting data to Google Refine for concept mapping. The document also lists several integrated virtual databases created by NIF that combine related data from multiple sources into a single view.
iRODS is an open source data management software developed by DICE at UNC and UCSD as a follow-on to SRB. It provides a customizable, policy-driven framework for implementing data grids and managing data across heterogeneous storage resources. Key features include modularity, extensibility through microservices and rules, and interoperability with systems like HDF5, NetCDF, and storage systems through integration extensions. RENCI provides support and commercial offerings around iRODS through their E-iRODS distribution.
2016 urisa track: nhd hydro linked data registery by michael tinkerGIS in the Rockies
Michael Tinker presented on using ScienceBase and linked data to share hydrological event data beyond the standard USGS point event domains currently included in the National Hydrography Dataset (NHD). ScienceBase allows users to store and share hydro linked data in communities, generates web services, and honors FGDC-compliant metadata. A pilot project used ScienceBase to model a hydro linked data community for sharing events in the Lower Colorado River System beyond what is contained in the NHD. ScienceBase offers benefits like web services, metadata, and a place to store and share NHD hydro linked data with downstream applications.
This document discusses a project investigating the use of Archivematica to preserve research data long-term. The project team includes representatives from a library, archives, and IT. In phase one, the team produced reports on Archivematica and how to enhance it for research data management. Planned enhancements in phase two include improved workflows for research data, allowing access copies to work with different repositories, and reducing bottlenecks for large data. The deliverables will be enhancements to Archivematica, implementation plans, and presentations/papers.
Numerous scientific teams use the HDF5 format to store very large datasets. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The goal of the HDF5-iRODS Project was to develop an HDF5-iRODS module for the iRODS datagrid server that supported this capability, and to apply the technology to an NCSA/SDSC Strategic Applications Program (SAP) project, FLASH.
A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the project goal. The team implemented five HDF5 microservices functions on the iRODS server, and developed an iRODS FLASH slice client application. The client implementation also includes a JNI interface that allows HDFView, a standard tool for browsing HDF5 files, to access HDF5 files stored remotely in iRODS. Finally, three new collection client/server calls were added to the iRODS APIs, making it easier for users to query the content of an iRODS collection.
Efficient and effective: can we combine both to realize high-value, open, sca...Research Data Alliance
The document discusses the INDIGO-DataCloud project, which aims to develop an open source cloud platform for computing and data management tailored for science. It seeks to address gaps in interoperability, scalability, and data handling across public and private clouds. The project defined requirements from various scientific communities and developed components implementing its architecture to provide solutions for distributed computing and data resources.
Aashish Chaudhary gave a presentation on Kitware's work with scientific computing and visualization using HDF. HDF is a widely used data format at Kitware for domains like climate modeling, geospatial visualization, and information visualization. Kitware is looking to improve HDF support for cloud and web environments to enable streaming analytics and web-based data analysis. The company also aims to further open source collaboration and scientific computing.
The document outlines Rik van den Bosch's work plan for developing the Soil Data Facility (SDF) from 2018-2020. Key aspects of the plan include developing technical specifications for the Global Soil Information System (GloSIS) and its data products from 2017-2018, building GloSIS infrastructure and populating data products from 2018-2020. It also discusses specifications for GloSIS point databases and grids, a vision document outlining GloSIS, engaging data providers, and connecting providers and users. The document proposes roles for various organizations in developing GloSIS, with SDF leading technical development, FAO hosting and developing the discovery hub, and all organizations working together.
The document discusses Esri's tools and roadmap for working with multi-dimensional (MD) scientific data in ArcGIS. It outlines Esri's efforts to directly read HDF, GRIB, and netCDF files as raster layers or feature/table views in ArcGIS. MD mosaic datasets allow users to manage variables and dimensions across multiple files and perform on-the-fly computations and visualization of MD data. New functions have been added to improve MD data analysis and visualization, including a vector field renderer to depict raster data as vectors. Esri is also working to better support OPeNDAP data sources.
This document outlines a 3-phase project to define minimum research data management infrastructure components required for EPSRC compliance. Phase 1 will develop profiles of institutional approaches to be more discoverable and comparable. Phase 2 will gather more detailed information on approaches, infrastructure needs, and propose standards. It will also identify shared service needs and challenges to compliance. Phase 3 will identify metrics for evaluating research data management service delivery and quality.
The document summarizes the DEEP-Hybrid-DataCloud project, which received EU Horizon 2020 funding. The project aims to develop intensive computing techniques and services for extremely large datasets using specialized hardware. It will implement pilot applications in deep learning, post-processing, and online data analysis. The consortium includes 9 academic and 1 industrial partner from 6 countries. The work is organized into work packages focused on applications, testbeds, accelerated computing, hybrid cloud solutions, and delivering services. The project held its kickoff meeting in January 2018 and outlined its work program and initial design phases.
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
This presentation was given during the Japan Geosciences Union 2019. Session details can be found at http://www.jpgu.org/meeting_e2019/SessionList_en/detail/M-GI31.htm
Harris Corporation provides geospatial software and analytics tools to access and analyze scientific data from remote sensing platforms. Their ENVI and IDL software support common data formats like HDF and NetCDF and provide capabilities for calibration, bowtie correction, reprojection, and visualization of data from sensors including GOES-16, VIIRS, and ocean and weather satellites. The tools allow scientists and analysts to efficiently process large volumes of earth observation data and extract valuable information to support applications in weather forecasting, agriculture, infrastructure monitoring, and more.
Kitware uses HDF as a widely adopted data format for scientific computing and visualization across several domains. HDF supports climate modeling, geospatial data, medical imaging, and more. Kitware is looking to improve HDF support for streaming big data, cloud computing, and web applications to enable more advanced analytics and sharing of scientific data. Future work may include pure JavaScript implementations of HDF tools and optimizing performance for cloud storage.
This document describes web-based on-demand NDVI data services that provide global NDVI imagery summaries in GeoTIFF format. The services utilize MODIS satellite data from 2000-2012 processed using GDAL utilities and a C++ program. The services include NDVImax, NDVImin, and VCI metrics. The services have been tested and provide fast access. An ongoing project is developing the VWCS service using similar technologies to also be available on a web portal.
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
Ross King, Project Director of SCAPE, gave a short presentation of the EU funded project SCAPE, including descriptions of tools for planning and monitoring digital preservation, scalable computation and repositories, SCAPE Testbeds and where to learn more.
The presentation was given at the workshop ‘Preservation at Scale’ http://bit.ly/17ppAln in connection with the iPres2013 conference in Lissabon, Portugal, in September 2013.
Digital dark age - Are we doing enough to preserve our website heritage?Olivier Dobberkau
While creating web sites we often see their lifespan only for up to 3 to 5 years. With every relaunch
and overhaul we are confronted with content migration and short term motives to delete maybe
valuable content. On the other hand what is the value of our content? Can we assess it
meaningfully? Do we really know in which context it is used?
Scientist stated that where as we are producing more and more digital artifacts we fail to see that
we are not keeping an eye on preserving it in a manner that will enable us to find and use it in more
that a few years in the future.
This talk will introduce you the aspects of digital preservation with a special look on how TYPO3 is
preparing to help it users to create a digital heritage.
This Talk is part of the "Concise Preservation by combining Managed Forgetting and
Contextualized Remembering" Project ForgetIT. The ForgetIT project is funded by the EC within the
7th Framework Programme under the objective "Digital Preservation" (GA 600826).
Foundations of Forgetting and Remembering (WP2 - ForgetIT 1st year review)ForgetIT Project
Conceptual foundations of human and organizational remembering and forgetting in order to identify aspects of human memory and forgetting that might be helpful in the design of a digital preservation and managed forgetting system.
This document outlines a project between the Odum Institute and IQSS Dataverse team to integrate the Dataverse data repository system with iRODS, an open source data management system. The goals are to expand storage options for Dataverse, integrate curation workflows, and connect Dataverse to national research data infrastructure. A prototype will be developed to enable automated ingest of data from Dataverse to iRODS using rules and APIs. Challenges include migrating both systems to newer versions while maintaining authentication between them. An initial prototype is expected in August 2015.
The document outlines the history of building a big data platform from 2014 to 2016, starting with building a Hadoop cluster in 2014, creating the first data report page in 2015, launching products based on big data also in 2015, developing data analysis products in 2016, and making changes to the platform in 2016. It then transitions to discussing the current state of the big data platform.
The document discusses the NIF Data Federation and Concept Mapping Tool. The NIF Data Federation provides the ability to search across individually hosted neuroscience databases and datasets. It currently indexes over 232 databases containing over 358 million records. The Concept Mapping Tool was developed to manage federated resources by setting up database mappings and exporting data to Google Refine for concept mapping. The document also lists several integrated virtual databases created by NIF that combine related data from multiple sources into a single view.
iRODS is an open source data management software developed by DICE at UNC and UCSD as a follow-on to SRB. It provides a customizable, policy-driven framework for implementing data grids and managing data across heterogeneous storage resources. Key features include modularity, extensibility through microservices and rules, and interoperability with systems like HDF5, NetCDF, and storage systems through integration extensions. RENCI provides support and commercial offerings around iRODS through their E-iRODS distribution.
2016 urisa track: nhd hydro linked data registery by michael tinkerGIS in the Rockies
Michael Tinker presented on using ScienceBase and linked data to share hydrological event data beyond the standard USGS point event domains currently included in the National Hydrography Dataset (NHD). ScienceBase allows users to store and share hydro linked data in communities, generates web services, and honors FGDC-compliant metadata. A pilot project used ScienceBase to model a hydro linked data community for sharing events in the Lower Colorado River System beyond what is contained in the NHD. ScienceBase offers benefits like web services, metadata, and a place to store and share NHD hydro linked data with downstream applications.
This document discusses a project investigating the use of Archivematica to preserve research data long-term. The project team includes representatives from a library, archives, and IT. In phase one, the team produced reports on Archivematica and how to enhance it for research data management. Planned enhancements in phase two include improved workflows for research data, allowing access copies to work with different repositories, and reducing bottlenecks for large data. The deliverables will be enhancements to Archivematica, implementation plans, and presentations/papers.
Numerous scientific teams use the HDF5 format to store very large datasets. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The goal of the HDF5-iRODS Project was to develop an HDF5-iRODS module for the iRODS datagrid server that supported this capability, and to apply the technology to an NCSA/SDSC Strategic Applications Program (SAP) project, FLASH.
A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the project goal. The team implemented five HDF5 microservices functions on the iRODS server, and developed an iRODS FLASH slice client application. The client implementation also includes a JNI interface that allows HDFView, a standard tool for browsing HDF5 files, to access HDF5 files stored remotely in iRODS. Finally, three new collection client/server calls were added to the iRODS APIs, making it easier for users to query the content of an iRODS collection.
Efficient and effective: can we combine both to realize high-value, open, sca...Research Data Alliance
The document discusses the INDIGO-DataCloud project, which aims to develop an open source cloud platform for computing and data management tailored for science. It seeks to address gaps in interoperability, scalability, and data handling across public and private clouds. The project defined requirements from various scientific communities and developed components implementing its architecture to provide solutions for distributed computing and data resources.
Aashish Chaudhary gave a presentation on Kitware's work with scientific computing and visualization using HDF. HDF is a widely used data format at Kitware for domains like climate modeling, geospatial visualization, and information visualization. Kitware is looking to improve HDF support for cloud and web environments to enable streaming analytics and web-based data analysis. The company also aims to further open source collaboration and scientific computing.
The document outlines Rik van den Bosch's work plan for developing the Soil Data Facility (SDF) from 2018-2020. Key aspects of the plan include developing technical specifications for the Global Soil Information System (GloSIS) and its data products from 2017-2018, building GloSIS infrastructure and populating data products from 2018-2020. It also discusses specifications for GloSIS point databases and grids, a vision document outlining GloSIS, engaging data providers, and connecting providers and users. The document proposes roles for various organizations in developing GloSIS, with SDF leading technical development, FAO hosting and developing the discovery hub, and all organizations working together.
The document discusses Esri's tools and roadmap for working with multi-dimensional (MD) scientific data in ArcGIS. It outlines Esri's efforts to directly read HDF, GRIB, and netCDF files as raster layers or feature/table views in ArcGIS. MD mosaic datasets allow users to manage variables and dimensions across multiple files and perform on-the-fly computations and visualization of MD data. New functions have been added to improve MD data analysis and visualization, including a vector field renderer to depict raster data as vectors. Esri is also working to better support OPeNDAP data sources.
This document outlines a 3-phase project to define minimum research data management infrastructure components required for EPSRC compliance. Phase 1 will develop profiles of institutional approaches to be more discoverable and comparable. Phase 2 will gather more detailed information on approaches, infrastructure needs, and propose standards. It will also identify shared service needs and challenges to compliance. Phase 3 will identify metrics for evaluating research data management service delivery and quality.
The document summarizes the DEEP-Hybrid-DataCloud project, which received EU Horizon 2020 funding. The project aims to develop intensive computing techniques and services for extremely large datasets using specialized hardware. It will implement pilot applications in deep learning, post-processing, and online data analysis. The consortium includes 9 academic and 1 industrial partner from 6 countries. The work is organized into work packages focused on applications, testbeds, accelerated computing, hybrid cloud solutions, and delivering services. The project held its kickoff meeting in January 2018 and outlined its work program and initial design phases.
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
This presentation was given during the Japan Geosciences Union 2019. Session details can be found at http://www.jpgu.org/meeting_e2019/SessionList_en/detail/M-GI31.htm
Harris Corporation provides geospatial software and analytics tools to access and analyze scientific data from remote sensing platforms. Their ENVI and IDL software support common data formats like HDF and NetCDF and provide capabilities for calibration, bowtie correction, reprojection, and visualization of data from sensors including GOES-16, VIIRS, and ocean and weather satellites. The tools allow scientists and analysts to efficiently process large volumes of earth observation data and extract valuable information to support applications in weather forecasting, agriculture, infrastructure monitoring, and more.
Kitware uses HDF as a widely adopted data format for scientific computing and visualization across several domains. HDF supports climate modeling, geospatial data, medical imaging, and more. Kitware is looking to improve HDF support for streaming big data, cloud computing, and web applications to enable more advanced analytics and sharing of scientific data. Future work may include pure JavaScript implementations of HDF tools and optimizing performance for cloud storage.
This document describes web-based on-demand NDVI data services that provide global NDVI imagery summaries in GeoTIFF format. The services utilize MODIS satellite data from 2000-2012 processed using GDAL utilities and a C++ program. The services include NDVImax, NDVImin, and VCI metrics. The services have been tested and provide fast access. An ongoing project is developing the VWCS service using similar technologies to also be available on a web portal.
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
Ross King, Project Director of SCAPE, gave a short presentation of the EU funded project SCAPE, including descriptions of tools for planning and monitoring digital preservation, scalable computation and repositories, SCAPE Testbeds and where to learn more.
The presentation was given at the workshop ‘Preservation at Scale’ http://bit.ly/17ppAln in connection with the iPres2013 conference in Lissabon, Portugal, in September 2013.
Rainer Schmidt, AIT Austrian Institute of Technology, presented Scalable Preservation Workflows from SCAPE at the 5-days ‘Digital Preservation Advanced Practitioner Training’ event (http://bit.ly/1fYCvMO), hosted by DPC, in Glasgow on 15-19 July 2013.
The presentation gives an introduction to the SCAPE Platform, it presents scenarios from SCAPE Testbeds and it finally describes how to create scalable workflows and execute them on the SCAPE Platform.
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
This document discusses implementing Hadoop and Elastic MapReduce on Cloudian's scale-out object storage platform. It describes Cloudian's hybrid cloud storage capabilities and how their approach reduces costs and provides faster analytics by analyzing log and event data directly on their storage platform without needing to transform the data for HDFS. Key benefits highlighted include no redundant storage, scaling analytics with storage capacity by adding nodes, and taking advantage of multi-core CPUs for MapReduce tasks.
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
Presentation of the European project SCAPE (www.scape-project.eu) at the Elag2013 conference in Gent/Belgium. The presentation includes details about use cases and implementation at the Austrian National LIbrary.
Combinação de logs, métricas e rastreamentos para observabilidade unificadaElasticsearch
Saiba como o Elasticsearch combina com eficiência dados em um único armazenamento e como o Kibana é usado para analisá-los. Além disso, veja como os desenvolvimentos recentes ajudam a identificar e resolver problemas operacionais mais rapidamente.
Hunk - Unlocking The Power of Big Data Breakout SessionSplunk
This document discusses Splunk's Hunk product and how it allows users to analyze data stored in Hadoop using Splunk. Hunk runs natively in Hadoop using MapReduce, supports mixed mode searching that allows previewing data, and auto-deploys Splunk components to Hadoop data nodes for real-time indexing. It also provides role-based security and supports connecting to data in NoSQL databases and SQL databases through Splunk's DB Connect product.
This document provides an overview of HPE solutions for challenges in AI and big data. It discusses HPE storage solutions including aggregated storage-in-compute using NVMe devices, tiered storage using flash, disk, and object storage, and zero watt storage to reduce power usage. It also covers the Scality object storage platform and WekaIO parallel file system for all-flash environments. The document aims to illustrate how HPE technologies can provide efficient, scalable storage for challenging AI and big data workloads.
Se training storage grid webscale technical overviewsolarisyougood
The document provides an overview of StorageGRID Webscale, an object storage solution from NetApp. It discusses key concepts including how StorageGRID Webscale uses a distributed architecture with different node types to provide a global object namespace and scale to support billions of objects and petabytes of storage. The document also describes how StorageGRID Webscale leverages extensive metadata and policy-driven management to intelligently distribute and tier data across storage pools.
How Open Source Will Change How You Think about Storage - LGI Tech SummitScott Ryan
As software eats the world, open source always follows. Storage is being fundamentally disrupted by open source. This presentation covers software-defined storage and open source trends and how they affect traditional storage.
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
H2020 BigDataEurope is a flagship project of the European Union's Horizon 2020 framework programme for research and innovation. In this talk we present the Docker-based BigDataEurope platform, which integrates a variety of Big Data processing components such as Hive, Cassandra, Apache Flink and Spark. Particularly supporting the variety dimension of Big Data, it adds a semantic data processing layer, which allows to ingest, map, transform and exploit semantically enriched data. In this talk, we will present the innovative technical architecture as well as applications of the BigDataEurope platform for life sciences (OpenPhacts), mobility, food & agriculture as well as industrial analytics (predictive maintenance). We demonstrate how societal value can be generated by Big Data analytics, e.g. making transportation networks more efficient or facilitating drug research.
Rachel Heery, Julie Allinson, Jim Downing, Christopher Gutteridge and Martin Morrey, UKOLN, University of Bath, will update attendees on a three-year UK program that is developing repository infrastructure aimed at increasing open access to scholarly material, while improving management of assets in higher education institutions. This effort is designed to ensure that the emerging network of JISC (Joint Information Services Committee) Digital Repositories is well populated with content. They will present their work towards defining a lightweight Common Repository Deposit Service Description.
Matthew Hale - Open Source at the Kings FundTracy Kent
The King's Fund Information and Library Service migrated from its proprietary library management system SirsiDynix Unicorn to the open source system Koha in January 2010. The migration was a natural choice given the library's philosophy of adopting open source solutions. It was completed with no downtime and involved splitting local fields and mapping data. The implementation of Koha has provided opportunities to further develop applications and embrace the open source community.
Object Storage promises many things - unlimited scalability, both in terms of capacity and file count, low cost but highly redundant capacity and excellent connectivity to legacy NAS. But, despite these promises object storage has not caught on in the enterprise like it has in the cloud. It seems like, for the enterprise object storage just isn’t a good fit. The problem is that most object storage system’s starting capacity is too large. And while connectivity to legacy NAS systems is available, seamless integration is not. Can object storage be sized so that it is a better fit for the enterprise?
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE
1) Current institutional repositories have issues with usability, interoperability, and acting primarily as silos for individual institutions' data.
2) The vision for next generation repositories is to position them as part of a globally networked infrastructure for scholarly communication, with the resources themselves, rather than the repositories, becoming the focus of services.
3) Key areas discussed for next generation repositories include improved resource discovery and content transfer using ResourceSync and Signposting, generating open usage metrics through a usage hub, and enabling annotation of content through web annotation protocols.
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...janaskhoj
The document proposes a four-layer model for providing cloud-based archiving services that enables long-term digital preservation. The model builds on the OAIS reference model and adds a preservation layer to capture preservation metadata and package digital objects early in their lifecycle. A case study on archiving challenges in the Japanese government demonstrates how the model could integrate systems and provide automated preservation functionality across agencies using a shared cloud platform and services.
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
Data gravity is a reality when dealing with massive amounts and globally distributed systems. Processing this data requires distributed analytics processing across InterCloud. In this presentation we will share our real world experience with storing, routing, and processing big data workloads on Cisco Cloud Services and Amazon Web Services clouds.
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
An overview about the different application scenarios at the Austrian National Libraries related to Web Archiving and the Austrian Books Online project.
Similar to Computational Storage Services (WP7 ForgetIT 1st year review) (20)
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
3. Simona Rabinovici-Cohen
IBM Research - Haifa
WP 7 Presentation
Computational Storage Services
ForgetIT 1st Review Meeting, April 29-30, 2014
Kaiserslautern, Germany
4. WP Objectives
• Increase the value and outcome of preserved information over time
–Provide additional incentive for preservation
–Increase return-on-investment (ROI)
• Transform the generic storage service to a richer service with
potentially higher business value and automated preservation
processes
Focus of Year 1
• Build a consolidated platform for objects and computational
processes (storlets) that will be defined, triggered and executed
close to the data
• Utilize the OpenStack Swift open source for cloud storage
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Objectives of WP and Year 1 Focus
5. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Role in Preserve-or-Forget Architecture
6. Leveraged PDS and Storlet Engine adding:
Adapt Preservation Engine for ForgetIT
Rules mechanism
Storlets at interface proxy servers and local object servers
Multiple programming languages for storlets
New storlets:
image transformation storlet
fixity storlet
concept detection storlet
Searchable metadata contributions to OpenStack community
Integration with whole ForgetIT framework
Co-chair LTR group in SNIA to develop SIRF
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Achievements in Year 1
7. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Preservation DataStores (PDS)
PDS offloads some archiving
functionality to:
Decrease probability of data loss
Simplify the applications
Provide improved performance and
robustness
Supports automation of archiving
processes
Provides computational storage via
Storlet Engine
PDS was also storage infrastructure of EU research projects CASPAR and ENSURE
with partners: Europe Space Agency, Maccabi HMO, Tessella, Philips and more
8. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
PDS in OAIS
Functional Model
AIP
• OAIS is ISO standard reference
model for preservation
(ISO:14721:2002)
• Provide fundamental ideas,
concepts and a reference
model for long-term archives
• Archival Information Package
(AIP) - a logical structure for the
preservation object that needs
to be stored to enable future
interpretation
• Content Data Object (CDO) –
raw data to be preserved
PDS
10. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
PDS Data Model
Docket
Costa Rica 2013
Docket
Edinburgh
Object (AIP)
Aggregation
Business Photos (silver)
Object (AIP)
Aggregation
Private Photos (gold)
Tenant
Peter Stainer
Hierarchical data model
Tenant Aggregation and Tenant Docket object (AIP)
Flexible organization of assets in collections with varied preservation policies (gold,
silver, bronze)
Aggregations support dynamic and transparent configuration of data management
Metadata:
aggregation=Private
Metadata:
aggregation=Business
Docket
Toy Conference 2014
Object (AIP) Object (AIP)
Aggregation
Press Releases (gold)
Tenant
Spielwarenmessen
Metadata:
aggregation=Press
Metadata:
aggregation=Press
11. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
The Need for Computational Object Storage
• “Data is the new Oil”
– In its raw form, oil has little value
– Once processed and refined, it helps power the world
• Data deluge of content depots and unstructured data
– Documents, medical images, photos, videos, etc.
– The fastest growing type of storage by volume
– Object storage is ideal for this type of data
• Object storage for content depots generally:
– Utilizes large bandwidth to serve big data over the WAN
– Uses server-based storage with under utilized CPUs
• Process and refine the data where it is stored
– Create a computational object storage with storlets
“Data is the new
oil.”
Clive Humby
12. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Client Value for Using Storlets
Reduce bandwidth – reduce the number of bytes transferred over the
WAN
e.g. Analytics storlet
Enhance security – reduce exposure of sensitive data
e.g. De-identification storlet
Save costs – consolidate generic functions that can be used by many
applications while saving infrastructure at the client side
e.g. Curation storlet
Support compliance – monitor and document the changes to the
objects and improve provenance tracking
e.g. Transformation storlet
14. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Rules Mechanism
Enables automatic conditional invocation of storlets
Explicit storlet activation overrides implicit activation
Rules kept as per tenant editable object, with specified access
control
Configured by tenant, user, role, container, object,
content_type
Wildcards (“*”) allowed in a rule (high flexibility)
The first rule that matches the input is activated – prioritized
list of rules
Examples:
De-Identification (per Role)
Transformation (per Content Type)
Fixity (per docket)
15. ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Storlets at proxy node and object node
L2 Rack Switch
1GB Ethernet
account node - SSD
L2 Rack Switch
1GB Ethernet
L3 Switch
10GB Ethernet
Virtual IP
L3 Switch
10GB Ethernet
container node -SSD
object node - HDD
object node - HDD
proxy nodeproxy node
Swift Object Node
object
service
Swift Proxy Node
Storlet Engine
proxy
service
Storlet Engine
17. • Papers
• S. Rabinovici-Cohen, E. Henis, J. Marberg, K. Nagin, “Storlet Engine: Performing
Computations in Cloud Storage”, to be submitted
• S. Rabinovici-Cohen, R. Cummings, S. Fineberg, “Self-contained Information
Retention Format For the, to be submitted
• Posters
• S. Rabinovici-Cohen (IBM), M. Baker (HP), R. Cummings (Antesignanus), S. Fineberg
(HP), E. Henis (IBM), "Self-contained Information Retention Format (SIRF) in
ForgetIT EU Project", 6th International Systems and Storage Conference (SYSTOR),
2013
• Other Dissemination Activities
• The Storage Networking Industry Association (SNIA) published in its March 2013
Newsletter that SNIA Long Term Retention group formed a liaison with ForgetIT
• The tutorial "Combining SNIA Cloud, Tape and Container Format Technologies for
the Long Term Retention of Big Data" is given at several SNIA conferences
• Deliverables
• D7.1: Foundation of Computational Storage Services
• D7.2: Computational Storage Services First Release
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Publications