The document discusses DataONE, a project aimed at improving data repository interoperability and advancing best practices in data lifecycle management. It focuses on enabling access to multiple external data repositories from within a HUB environment. This would allow users to aggregate and integrate disparate datasets for new analyses, and enable reproducible workflows. The goal is to address issues around scattered and dispersed data by improving discovery, integration and long-term preservation of datasets.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
This document discusses using MapReduce and HDFS to efficiently process large remote sensing images in parallel. It provides context on the large volume of data from remote sensing (e.g. 1.2GB for a 1km resolution image with 0.5 billion pixels) and challenges of storage, transport and processing. It reviews literature on related projects processing large datasets and key concepts of HDFS for robust distributed storage and MapReduce for parallel processing. Finally, it outlines a planned approach involving initial simple algorithms and expanding to more complex spatial and temporal processing.
The document outlines an agenda for a computing facilities event, including a morning session on high performance computing and NeSI with demonstrations, and an afternoon session featuring a local researcher case study and group discussions on how NeSI can benefit projects. There will also be representatives available to discuss the Science DMZ and research identity federation.
Are cloud based virtual labs cost effective? (CSEDU 2012)Nane Kratzke
Cost efficiency is an often mentioned strength of cloud computing. In times of decreasing educational budgets virtual labs provided by cloud computing might be an interesting alternative for higher education organizations or IT training facilities. This contribution analyzes the cost advantage of virtual educational labs provided via cloud computing means and compare these costs to costs of classical ed- ucational labs provided in a dedicated manner. This contribution develops a four step decision making model which might be interesting for colleges, universities or other IT training facilities planning to implement cloud based training facilities. Furthermore this contribution provides interesting findings when cloud computing has economical advantages in education and when not. The developed four step decision making model of general IaaS applicability can be used to find out whether an IaaS cloud based virtual IT lab approach is more cost efficient than a dedicated approach.
The document discusses the Wf4Ever project, which aims to create a technological infrastructure for preserving and enabling efficient retrieval and reuse of scientific workflows across disciplines. The project will develop complex research objects that account for the static and dynamic nature of workflows. It will also semantically archive workflows and associated materials to allow for advanced search and recommendation. The project aims to support scientific communities in collaboratively sharing, reusing, and evolving workflows. Key challenges include ensuring quality, preservation, sharing/reuse, classification of workflows and associated resources.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
This document discusses using MapReduce and HDFS to efficiently process large remote sensing images in parallel. It provides context on the large volume of data from remote sensing (e.g. 1.2GB for a 1km resolution image with 0.5 billion pixels) and challenges of storage, transport and processing. It reviews literature on related projects processing large datasets and key concepts of HDFS for robust distributed storage and MapReduce for parallel processing. Finally, it outlines a planned approach involving initial simple algorithms and expanding to more complex spatial and temporal processing.
The document outlines an agenda for a computing facilities event, including a morning session on high performance computing and NeSI with demonstrations, and an afternoon session featuring a local researcher case study and group discussions on how NeSI can benefit projects. There will also be representatives available to discuss the Science DMZ and research identity federation.
Are cloud based virtual labs cost effective? (CSEDU 2012)Nane Kratzke
Cost efficiency is an often mentioned strength of cloud computing. In times of decreasing educational budgets virtual labs provided by cloud computing might be an interesting alternative for higher education organizations or IT training facilities. This contribution analyzes the cost advantage of virtual educational labs provided via cloud computing means and compare these costs to costs of classical ed- ucational labs provided in a dedicated manner. This contribution develops a four step decision making model which might be interesting for colleges, universities or other IT training facilities planning to implement cloud based training facilities. Furthermore this contribution provides interesting findings when cloud computing has economical advantages in education and when not. The developed four step decision making model of general IaaS applicability can be used to find out whether an IaaS cloud based virtual IT lab approach is more cost efficient than a dedicated approach.
The document discusses the Wf4Ever project, which aims to create a technological infrastructure for preserving and enabling efficient retrieval and reuse of scientific workflows across disciplines. The project will develop complex research objects that account for the static and dynamic nature of workflows. It will also semantically archive workflows and associated materials to allow for advanced search and recommendation. The project aims to support scientific communities in collaboratively sharing, reusing, and evolving workflows. Key challenges include ensuring quality, preservation, sharing/reuse, classification of workflows and associated resources.
This document summarizes key aspects of computational research methods and the myExperiment platform. It discusses how myExperiment allows researchers to automate, share, and reuse workflows and other methods. It also addresses challenges around reproducibility, provenance, collaboration, and incentives for sharing methods. MyExperiment provides social features and aims to build a community around openly exchanging and improving computational research techniques.
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
This document discusses the development of standards for the exchange of field spectral datasets. It notes the importance of metadata for determining the quality and representativeness of spectral data obtained in the field. A workshop was held in 2012 to discuss best practices for data collection and exchange and key conclusions included the need for standards to facilitate accurate comparison across studies and the role of thorough metadata. Work is ongoing to enhance the SPECCHIO system for hosting spectral libraries and metadata and establishing it as the international tool for storage and exchange of spectral datasets.
The explosion of data creation across all scholarly disciplines necessitates corresponding efforts to create new solutions for its management and use. Ever-growing repositories and datasets within require organization, identification, description, publication, discovery, citation, preservation, and curation to allow these materials to realize their potential in support of data-driven, often interdisciplinary research. What infrastructures and technical environments are required for this work? Can new approaches, specifications, standards and best practices be created? Are there partnerships and collaborations that exist or can be pursued? This webinar, Part 2 of a two-part NISO series on data, will explore these and other questions
Publication and long term archival of observational data in the field of environmental sciences is a challenging topic of today's eScience research. The amount of effort that goes into technical and scientific quality assurance prior to publication is considerable and might well turn out to be a barrier to data publication. Our project's goal is to lower the amount of manual effort and, at the same time, increase data quality in the process of submitting observational data for publication – in this case meteorological observational data. This goal is divided into the following subgoals:
Establish a standard procedure for the publication of observational data in the area of meteorology including quality information.
Develop a workflow system for the automatisation of the publication process.
Make the procedure usable for environmental sciences in general.
Integration of the procedure into an existing central data repository for meteorology (CERA data base at the World Data Center for Climate).
This talk is about the current state of the project from an eResearch and technical point of view.
Big Data and Advanced Data Intensive ComputingJongwook Woo
MapReduce is not working well at real time processing and iterative algorithm, which are mostly for machine learning and graph algorithms. This slide shows Spark, Giraph and Hadoop use cases in Science not in Business.
Paper ref:
Missier, Paolo, Bertram Ludascher, Saumen Dey, M Wang, Timothy McPhillips, Shawn Bowers, and Michael Agun. âGoldenTrail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository.â In Procs. 7th International Digital Curation Conference (IDCC), 2011.
Introduction to Research Data Management for postgraduate studentsMarieke Guy
The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.
Cloud Economics in Training and SimulationNane Kratzke
This document discusses a presentation on cloud economics in training and simulation. It begins with defining cloud computing and outlining its essential characteristics like on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Some postulated use cases for cloud computing are then discussed, including training and education. Real-world data is then presented from a course that utilized Amazon Web Services, analyzing costs, cost drivers, and server usage. The findings provide insights into the economics of educational cloud usage.
1. The document discusses the challenges of widespread adoption of e-research technologies by everyday researchers. While early adopters found success, most researchers are not using the infrastructure services that have been created.
2. It argues that repositories and other e-research tools need to focus on the needs and perspectives of researchers. Researchers work with data, so tools should emphasize data sharing and metadata. They should also support collaboration and open participation in the scientific process.
3. For technologies to truly enable new forms of research, their use needs to become integrated into the everyday work of all researchers, not just a specialized few. Systems must be easy to use, empower researchers' autonomy, and intersect seamlessly with digital and physical
White Paper: Hadoop in Life Sciences — An Introduction EMC
This White Paper reviews the Apache Hadoop technology, its components — MapReduce and Hadoop Distributed File System — and its adoption in the life sciences with an example in Genomics data analysis.
Este documento contiene 100 preguntas frecuentes sobre conceptos financieros. La mayoría de las preguntas tienen respuestas claras pero algunas son matizables. El objetivo es ayudar al lector a recordar, aclarar y discutir conceptos útiles en finanzas como flujos de caja, valor contable, creación de valor, tasas de descuento, valoración de empresas y divisiones, y estructura de capital. Cada pregunta tiene una breve respuesta al final.
Este documento propone un marco unificado para la enseñanza de lenguas en la escuela (L1, L2 y LE) basado en cuatro pilares: 1) la enseñanza por competencias y tareas, 2) el currículum integrado, 3) el aprendizaje cooperativo e interactivo, y 4) el uso de las TIC. El objetivo es superar el aislamiento de las asignaturas de lenguas y permitir la inclusión de todos los estudiantes.
Côte d'Ivoire ranks 167 out of 183 economies on the ease of doing business. Starting a business and dealing with construction permits are particularly difficult, taking over a month and over 500 days respectively. While getting credit has relatively strong legal rights, the depth of credit information and coverage of public registries are low compared to the best performers globally.
El documento describe la evolución del pensamiento administrativo desde sus orígenes hasta la escuela clásica. Comenzó con la administración práctica en organizaciones y luego se desarrolló como un campo de estudio influenciado por la Revolución Industrial y el comercialismo. Las escuelas de la administración científica y teoría clásica dominaron en el siglo XX, enfocándose la primera en la eficiencia de los trabajadores y la segunda en la estructura organizacional. Ambas trataron de aplicar métodos científicos para mejorar la
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
This document provides an overview of a webinar on DataONE, a project that aims to provide tools and approaches for supporting the data life cycle. The webinar covered three key challenges in data management: preservation and planning, discovery, and innovation. It discussed how DataONE is working to address these challenges through its coordinated network of member nodes that allow for data preservation, sharing and discovery. The webinar also demonstrated some of DataONE's tools like the DMPTool for data management planning and the Investigator Toolkit for data analysis and visualization.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
This document summarizes key aspects of computational research methods and the myExperiment platform. It discusses how myExperiment allows researchers to automate, share, and reuse workflows and other methods. It also addresses challenges around reproducibility, provenance, collaboration, and incentives for sharing methods. MyExperiment provides social features and aims to build a community around openly exchanging and improving computational research techniques.
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
This document discusses the development of standards for the exchange of field spectral datasets. It notes the importance of metadata for determining the quality and representativeness of spectral data obtained in the field. A workshop was held in 2012 to discuss best practices for data collection and exchange and key conclusions included the need for standards to facilitate accurate comparison across studies and the role of thorough metadata. Work is ongoing to enhance the SPECCHIO system for hosting spectral libraries and metadata and establishing it as the international tool for storage and exchange of spectral datasets.
The explosion of data creation across all scholarly disciplines necessitates corresponding efforts to create new solutions for its management and use. Ever-growing repositories and datasets within require organization, identification, description, publication, discovery, citation, preservation, and curation to allow these materials to realize their potential in support of data-driven, often interdisciplinary research. What infrastructures and technical environments are required for this work? Can new approaches, specifications, standards and best practices be created? Are there partnerships and collaborations that exist or can be pursued? This webinar, Part 2 of a two-part NISO series on data, will explore these and other questions
Publication and long term archival of observational data in the field of environmental sciences is a challenging topic of today's eScience research. The amount of effort that goes into technical and scientific quality assurance prior to publication is considerable and might well turn out to be a barrier to data publication. Our project's goal is to lower the amount of manual effort and, at the same time, increase data quality in the process of submitting observational data for publication – in this case meteorological observational data. This goal is divided into the following subgoals:
Establish a standard procedure for the publication of observational data in the area of meteorology including quality information.
Develop a workflow system for the automatisation of the publication process.
Make the procedure usable for environmental sciences in general.
Integration of the procedure into an existing central data repository for meteorology (CERA data base at the World Data Center for Climate).
This talk is about the current state of the project from an eResearch and technical point of view.
Big Data and Advanced Data Intensive ComputingJongwook Woo
MapReduce is not working well at real time processing and iterative algorithm, which are mostly for machine learning and graph algorithms. This slide shows Spark, Giraph and Hadoop use cases in Science not in Business.
Paper ref:
Missier, Paolo, Bertram Ludascher, Saumen Dey, M Wang, Timothy McPhillips, Shawn Bowers, and Michael Agun. âGoldenTrail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository.â In Procs. 7th International Digital Curation Conference (IDCC), 2011.
Introduction to Research Data Management for postgraduate studentsMarieke Guy
The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.
Cloud Economics in Training and SimulationNane Kratzke
This document discusses a presentation on cloud economics in training and simulation. It begins with defining cloud computing and outlining its essential characteristics like on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Some postulated use cases for cloud computing are then discussed, including training and education. Real-world data is then presented from a course that utilized Amazon Web Services, analyzing costs, cost drivers, and server usage. The findings provide insights into the economics of educational cloud usage.
1. The document discusses the challenges of widespread adoption of e-research technologies by everyday researchers. While early adopters found success, most researchers are not using the infrastructure services that have been created.
2. It argues that repositories and other e-research tools need to focus on the needs and perspectives of researchers. Researchers work with data, so tools should emphasize data sharing and metadata. They should also support collaboration and open participation in the scientific process.
3. For technologies to truly enable new forms of research, their use needs to become integrated into the everyday work of all researchers, not just a specialized few. Systems must be easy to use, empower researchers' autonomy, and intersect seamlessly with digital and physical
White Paper: Hadoop in Life Sciences — An Introduction EMC
This White Paper reviews the Apache Hadoop technology, its components — MapReduce and Hadoop Distributed File System — and its adoption in the life sciences with an example in Genomics data analysis.
Este documento contiene 100 preguntas frecuentes sobre conceptos financieros. La mayoría de las preguntas tienen respuestas claras pero algunas son matizables. El objetivo es ayudar al lector a recordar, aclarar y discutir conceptos útiles en finanzas como flujos de caja, valor contable, creación de valor, tasas de descuento, valoración de empresas y divisiones, y estructura de capital. Cada pregunta tiene una breve respuesta al final.
Este documento propone un marco unificado para la enseñanza de lenguas en la escuela (L1, L2 y LE) basado en cuatro pilares: 1) la enseñanza por competencias y tareas, 2) el currículum integrado, 3) el aprendizaje cooperativo e interactivo, y 4) el uso de las TIC. El objetivo es superar el aislamiento de las asignaturas de lenguas y permitir la inclusión de todos los estudiantes.
Côte d'Ivoire ranks 167 out of 183 economies on the ease of doing business. Starting a business and dealing with construction permits are particularly difficult, taking over a month and over 500 days respectively. While getting credit has relatively strong legal rights, the depth of credit information and coverage of public registries are low compared to the best performers globally.
El documento describe la evolución del pensamiento administrativo desde sus orígenes hasta la escuela clásica. Comenzó con la administración práctica en organizaciones y luego se desarrolló como un campo de estudio influenciado por la Revolución Industrial y el comercialismo. Las escuelas de la administración científica y teoría clásica dominaron en el siglo XX, enfocándose la primera en la eficiencia de los trabajadores y la segunda en la estructura organizacional. Ambas trataron de aplicar métodos científicos para mejorar la
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
This document provides an overview of a webinar on DataONE, a project that aims to provide tools and approaches for supporting the data life cycle. The webinar covered three key challenges in data management: preservation and planning, discovery, and innovation. It discussed how DataONE is working to address these challenges through its coordinated network of member nodes that allow for data preservation, sharing and discovery. The webinar also demonstrated some of DataONE's tools like the DMPTool for data management planning and the Investigator Toolkit for data analysis and visualization.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
This document discusses research objects as a framework for facilitating the exchange and reuse of digital knowledge. Research objects are defined as semantically rich aggregations of resources that support a research objective. They allow for workflows, data, documents and other resources to be bundled together and shared. The document outlines several motivating projects, challenges in developing research object models and vocabularies, and a vision for how research objects could allow research to be more efficient, effective and ethical through increased reuse of digital knowledge.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
This document discusses Juan de Dios Santander Vela's work on the Wf4Ever project to preserve scientific workflows. The Wf4Ever project aims to develop technological infrastructure for preserving, retrieving, and reusing scientific workflows across disciplines. Mr. Santander Vela has worked on making radio astronomy archives and tools interoperable with the Virtual Observatory and is now applying his expertise to the Wf4Ever project goals of archiving, classifying, indexing, and providing access to scientific workflows and materials in semantic repositories. Preserving workflows is important for astronomy research as it allows experiments to be reproduced, repeated, reused, re-purposed, and collaborated on.
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...TERN Australia
This document discusses infrastructure for ecosystem science in Australia. It begins by outlining the multi-disciplinary nature of ecosystem science and challenges in funding infrastructure to support data collection, storage, analysis and sharing across disciplines. It promotes a collaborative approach through the TERN network to establish shared infrastructure and standards. Examples are given of coordinated data collection, processing, storage and analysis projects enabled by TERN. The document argues that infrastructure like TERN improves the efficiency and effectiveness of ecosystem science in Australia.
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi
This document summarizes an overview of ethical frameworks for sharing and reusing qualitative research data presented at a workshop. It discusses the role of archives in facilitating ethical data sharing and building trust. Formal procedures for sharing confidential research data, such as obtaining informed consent and restricting access, are described. The need to consider duties to others beyond direct research participants in the ethical debate is also highlighted.
The Digital Curation Centre was created to help build skills and capabilities around research data management in UK higher education by providing support and guidance to address challenges that individual institutions cannot tackle alone. The document discusses why managing research data has become important due to factors like large datasets, funder requirements, and the need for open science. It also examines some of the challenges around issues like scale, infrastructure needs, policies, and developing skills and incentives around data management.
This document discusses the need to make research data more discoverable and usable by connecting disparate data through metadata. Currently, the majority of research data is stored in isolated locations like personal hard drives, resulting in lost opportunities for analysis across experiments. The document advocates for culture change where researchers curate and share their data in centralized repositories to enable new insights from aggregating and comparing data in connected ways. This would help address challenges like variability between specimens and complexity in living systems that reductionist approaches cannot capture alone. Ensuring long-term sustainability of data repositories and defining roles for libraries and institutions are also discussed.
On demand access to Big Data through Semantic TechnologiesPeter Haase
The document discusses enabling on-demand access to big data through semantic technologies. It describes how semantic technologies like Linked Data and ontologies can be used to virtually integrate and provide access to large, heterogeneous datasets across different data silos. The key points are that semantic technologies allow for big data to be accessed and analyzed on-demand in a self-service manner through a "Linked Data as a Service" approach, providing scalable end user access to big data.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
The document discusses the ISA (Investigation/Study/Assay) framework for enabling data reuse and reproducibility in bioscience research. The ISA framework provides a generic format for rich experimental descriptions and an infrastructure of open source software tools. It aims to minimize the burden of reporting, curating, sharing data and metadata from bioscience experiments to enable comprehension, reuse of data, and reproducibility. The framework promotes community engagement to develop community standards and document use cases.
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...TERN Australia
This presentation outlines how Australia's ecosystem research network TERN can improve ecosystem science and management through long-term data collection and sharing. It discusses the need for sustained ecosystem data infrastructure to address challenges like how ecosystems are changing over time. TERN aims to build a collaborative network where data publication and reuse is standard practice. This will allow large-scale, coordinated data collection and analysis across disciplines. Sustaining long-term essential data collection, modeling, and synthesis through TERN can better inform decision-making and implement evidence-based environmental policy.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Curation of scientifica data: Challenges for repositoriesChris Rusbridge
This document discusses challenges related to curating scientific data in repositories. It notes that data is increasingly important as evidence and for verifying scientific results. However, data loses meaning without proper context and curation beginning in the research workflow. The document examines issues like data formats, metadata, access and reuse, citation, and technological challenges for repositories in dealing with diverse data. It also explores who performs data curation roles like individuals, institutions, communities, publishers and national services.
Similar to DataONE_cobb_hubbub2012_20120924_v05 (20)
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
DataONE_cobb_hubbub2012_20120924_v05
1. DataONE:
An
interoperable
data
repositories
case
study
John
W.
Cobb
R&D
Staff
and
DataONE
Leadership
Team
Member
Oak
Ridge
Na;onal
Laboratory
HUBbub
2012
,
the
HUBzero
conference
Indianapolis,
IN
24
September
2012
2. Acknowledgment:
• Authorship:
This
talk
represents
work
of
the
en;re
DataONE
extended
team.
• It
especially
draws
upon
slide
material
from
• Bill
Michener,
UNM
(esp.
recent
DataONE
AHM
Sept.
18,
2012)
• Amber
Budden
–
DataONE
Ass.
Dir.
For
CE
• DataONE
is
an
NSF
supported
project
(OCI-‐0830944)
2
3. Hubs
and
data
repositories
• A
personal
view
(apologies
for
a
possibly
mis-‐informed
speaker)
• HUB-‐roots
(history
and
pre-‐history)
• PUNCH:
web
portal
for
running
tools
(DOI:
10.1109/40.846308)
• -‐>
NanoHUB:
Applica;on
orchestra;on
environment
• +
RAPPTURE:
Rapid
Applica;on
por;ng
and
development
• +
Framelesss
VNC
windows
–
seamless
hosted
environment
on
clients!
• +
Rich
collabora;ve
environment
and
rich
user
experience
!!
(“wishlist”)
• Repurpose:
Hubzero
-‐>
hubs
explode
(ex.
NEESHub
a
cri;cal
advantage
for
largest
research
award
in
Purdue
history)
• Now
(and
recent
past)
turn
to
Hub+Data
Integra;on.
Some
successes
already
• Opportunity:
Richer
interac;ons
between
HUB’s
and
mul;ple
data
repositories
• Perhaps
for
example:
Enable
mul;-‐project
collabora;on
within
PURR?
• Or:
Integrate
NEES
DB’s
with
SCEC
simula;ons
and
IRIS
waveforms?
3
4. Mul;ple
data
repository
access?
• HUB
+
Database
exists
• HUB
+
external
data
repository
access
use
case.
• But
…..
What
if?
• Access
mul;ple
(possibly
external)
repositories
from
within
a
HUB
environment?
• Access
mul;ple
external
repositories
with
similar
data?
Say
aggregate
all
data
from
state
hydrologists?
C.f.
driNET
hip://drinet.hubzero.org
• Integrate
disparate
data
sets
for
new
and
novel
analysis.
Recall
Noshir
Contractor’s
comments
this
morning:
teaming
and
interdisciplinary
work
has
increased
impact
(Wuchty,
Jones,
Uzzi)
• Enable
reproducible
analysis
and
synthesis
via
a
automated
workflow
to
create
synthe;c
data
products
• Programma;c
access
• More
integra;on
(more
than
just
raw
search
terms
a
la
Google)
• …
• What
do
you
want
to
discover
today?
(to
paraphrase
Microsol)
4
5. DataONE
mo;va;on
• DataONE
is
a
project
to
address
Plan
these
issues
• Build
(assemble/aggregate)
Analyze
Collect
data
repository
interoperability
• Advance
state
of
the
prac;ce
data
lifecycle
management
• Planning
Integrate
Assure
• Deposi;on
• Metadata
genera;on
• Seman;c
integra;on
Discover
Describe
• Workflow
and
provenance
• Analysis
Preserve
• Synthesis
• Focus
on
a
broad
science
area
• Deploy
a
working
CI
and
grow
it
• DataONE
–
Data
Observa;on
Network
Earth
5
7. Pressing
issues
for
the
digital
data
lifecycle
Plan
Analyze
Collect
Integrate
Assure
Discover
Describe
Preserve
7
8. Mul;ple
data
sources
–
mutually
reinforcing
Increasing
Process
Knowledge
Decreasing
Spa;al
Coverage
Intensive
science
sites
and
experiments
Extensive
science
sites
Volunteer
&
educa;on
networks
Remote
sensing
Adapted
from
CENR-‐OSTP
8
9. Scaiered
data
sources
“finding
the
needle
in
the
haystack”
Data
are
massively
dispersed
• Ecological
field
sta;ons
and
research
centers
(100s)
• Natural
history
museums
and
biocollec;on
facili;es
(100s)
• Agency
data
collec;ons
(100s
to
1000s)
• Individual
scien;sts
(1000s
to
10,000s
to
100,000s)
9
11. Preserva;on:
Poor
data
prac;ce
“data
entropy”
Time
of
publica?on
Specific
details
General
details
Re?rement
or
Informa?on
Content
career
change
Accident
Death
Time
(Michener
et
al.
1997)
11
12. Preserva;on:
Data
longevity
Resource
Study Resource Type
Half-life
Rumsey (2002) Legal Citations 1.4 years
Harter and Kim (1996) Scholarly Article Citations 1.5 years
Koehler (1999 and 2002) Random Web Pages 2.0 years
Spinellis (2003) Computer Science 4.0 years
Citations
Markwell and Brooks Biological Science 4.6 years
(2002) Education Resources
Nelson and Allen (2002) Digital Library Object 24.5 years
Koehler,
W.
(2004)
Informa(on
Research
9(2):
174.
12
13. The Long Tail of Orphan Data
The
Ultra-‐violet
Most of the bytes
divergence
are at the high end,
Specialized repositories
but most of the
(e.g. GenBank, PDB)
datasets are at the
Volume
low end – Jim Gray
The
Infrared
Orphan data
Catastrophe
(B. Heidorn)
Rank frequency of datatype
13
14. Data
deluge
and
interoperability
“the
flood
of
increasingly
heterogeneous
data”
Data
are
heterogeneous
• Syntax
• (format)
• Schema
• (model)
• Seman;cs
• (meaning)
Jones
et
al.
2007
14
15. Metadata
universe
(mul;-‐verse)
• There
are
a
mul;tude
of
metadata
standards
• Discipline
and
sub-‐discipline
specific
• Each
with
different
terms
and
context
Source:
Jenn
Riley,
Indiana
U.
Digital
Librarian
hip://www.dlib.indiana.edu/~jenlrile/metadatamap/
Via
John
Kunze,
Cal.
Dig.
Lib
15
16. Each
dot
is
its
own
standard
!
“…billions
and
billions
of
worlds
…”
–
Carl
Sagan
16
17. DataONE
CI
architectural
Elements
• Hard-‐core
cyberinfrastructure
(CI)
• CI
Member
Node
(MN)
data
repositories
• Coordina;ng
Node
(CN)
global
metadata
repo’s
Investigator Toolkit
Simple,
but
powerful
REST
API/SPI
for
Web Interface Analysis, Visualization Data Management
•
universal
access
Client Libraries
• Inves;gator
toolkit
(ITK)
solware
tools
Java Python Command Line
to
allow
access
to
the
data
repository
collec;ve
via
familiar
access
idioms
Member Nodes Coordinating Nodes
• Cultural
and
wetware
issues
Service Interfaces
Resolution Discovery
Educa;onal
Materials
Service Interfaces
• Replication Registration
• Best
prac;ces
Bridge to non-DataONE Coordination Layer
• Workshops
and
tutorials
Member Node services Identifiers Catalog
• Surveys
and
assessments
Preservation Monitor
Data Repository
• Scien;st,
policymaker,
ci;zen
Object Store Index
engagement
• Collabora;on,
governance,
and
sustainability
hip://mule1.dataone.org/ArchitectureDocs-‐current/
17
19. Key
Cyberinfrastructure
Elements
• Unique
iden;fiers
• Search
and
deliver
• Replica;on
• Federated
iden;ty
Usable
by
People
and
their
Agents
19
20. Suppor;ng
the
data
lifecycle
ORC
Node
UCSB
Node
UNM
Node
1. Deposi;on/acquisi;on/ingest
}
2. Cura;on
and
metadata
management
The
data
3. Protec;on,
including
privacy
lifecycle
4. Discovery,
access,
use,
and
dissemina;on
5. Interoperability,
standards,
and
integra;on
6. Evalua;on,
analysis,
and
visualiza;on" 20
21. DataONE
Supports
Data
Preserva;on
Three
major
components
for
a
Member
Nodes
flexible,
scalable,
sustainable
• diverse
ins;tu;ons
Coordina?ng
Nodes
network
• serve
local
community
• retain
complete
metadata
Inves?gator
Toolkit
• provide
resources
for
catalog
managing
their
data
• indexing
for
search
• retain
copies
of
data
• network-‐wide
services
• ensure
content
availability
(preserva;on)
• replica;on
services
21
22. DataONE
sa;sfies
arch
requirements
• Enables
integra;on
of
mul;ple
geographically
diverse
and
metadata
diverse
repositories
• Presents
collec;ve
search
results
across
mul;ple
repository
• Provides
a
unified
API/SPI
for
search
and
programma;c
interface
hip://mule1.dataone.org/ArchitectureDocs-‐current/
• DataONE
content
has
unique
iden;fiers
(DOI’s)
for
referencable/citable
data
objects
• Supports
both
large
datasets
and
the
long-‐tails
22
23. DataONE
spurs
innova;on
• Enables
new
analysis
and
synthesis
efforts
by
integra;ng
tasks
across
repositories
• Provides
means
for
data
replica;on
and
basis
for
repositories
to
build
“data
wills”
or
“data
trust”
plans
• Provides
a
plavorm
to
develop
advanced
interoperable
workflow
tools
and
seman;c
integra;on
tools
23
25. DataONE:
Suppor;ng
Scien;fic
Data
Preserva;on,
Discovery,
and
Innova;on
Current
Member
Nodes:
Coming
Soon:
Current
Tools:
Tools
Coming
Soon:
Queensland
University
of
Technology
25
29. Plans
per
template
(as
of
June
2012)
400
Approximate
number
of
plans
per
template
350
339
300
287
Templates
of
greatest
interest
to
the
DataONE
community
in
red;
250
2,302
unique
users
to
date
197
200
159
150
133
133
124
101
100
71
65
60
46
37
36
50
34
17
15
6
0
hip://dmptool.org
29
30. ✔ Check
for
best
prac;ces
✔ Create
metadata
✔ Connect
to
ONEShare
Data
&
Metadata
(EML)
30
37. Inves;gator
Toolkit
Support
Plan
DMP-Tool
Analyze
Collect
Kepler
Integrate
Assure
Discover
Describe
Preserve
37
38. Explora;on,
Visualiza;on,
and
Analysis
Diverse
bird
observa;ons
and
Model
results
environmental
data
from
300,00
loca;ons
in
the
US
Occurrence
of
Indigo
Bun?ng
(2008)
integrated
and
analyzed
using
High
Performance
Compu;ng
Resources
Land
Cover
Jan
Apr
Jun
Sep
Dec
Meteorology
• Examine
paierns
of
migra;on
MODIS
–
Spa;o-‐Temporal
Exploratory
• Infer
how
climate
Remote
Model
iden;fies
factors
change
may
affect
affec;ng
paierns
of
migra;on
sensing
data
bird
migra;on
38
39. Public
Par?cipa?on
in
Scien?fic
Research
Conference:
4-‐5
August
2012
in
Portland,
Oregon
USA
prior
to
Ecological
Society
of
America
mee;ng
(6-‐10
Aug.):
hip://www.birds.cornell.edu/citscitoolkit/conference/2012
39
40. User
Assessments
Scien;sts:
BL
Scien;sts:
FU
Library
Policies:
BL
Library
Policies:
FU
Librarians:
BL
Librarians:
FU
Policy
Makers:
BL
Policy
Makers:
FU
Educators:
BL
Educators:
FU
Year
1
Year
2
Year
3
Year
4
Year
5
40
41. What
standard
do
you
currently
use?
676
266
95 95 96 97
12 21 26
DIF DwC DC EML FGDC Open ISO My Lab none
GIS
Metadata
language
41
42. Many
are
interested
in
sharing
data
Willing
to
share
data
across
a
broad
81%
group
of
researchers
Willing
to
place
at
least
some
of
my
78%
data
into
a
central
data
repository
with
no
restric;ons
Appropriate
to
create
new
datasets
76%
from
shared
data
Willing
to
place
all
of
my
data
into
a
41%
central
data
repository
with
no
restric;ons
0%
20%
40%
60%
80%
100%
Percent
agree
42
43. Modeler
Scientist
Manager
Resource
Ecological
Data Librarians
Data
Service
User
Matrix
Investigator
ToolKit
Data
Management
Planning
Best
Practices
Tools
Database
Training
Curricula
43
47. DataONE:
Next
steps
• Member
node
growth
• Number
of
member
nodes
• Increase
the
number
and
size
of
data
sets
• Sustainably
• In
terms
of
resource
needs
form
MN’s
• In
terms
of
resource
demands
on
DataONE
• New
Inves;gator
toolkit
tools
(strategically)
• An
increasing
number
of
science
use
cases
with
more
breakthrough
science
• Also,
re-‐purposing
DataONE
CI
outside
of
Bio/Eco/
Env
areas
in
strategic
collabora;ve
partnerships
47
48. Ack:
DataONE
Team
and
Sponsors
• Amber
Budden,
Roger
Dahl,
Rebecca
Koskela,
Bill
• Ewa
Deelman
Michener,
Robert
Nahf,
Skye
Roseboom,
Mark
Servilla
• Deborah
McGuinness
•
Dave
Vieglais
• Suzie
Allard,
Nick
Dexter,
Kimberly
Douglass,
• Jeff
Horsburgh
Carol
Tenopir,
Robert
Waltz,
Bruce
Wilson
• John
Cobb,
Bob
Cook,
Ranjeet
Devarakonda,
• Robert
Sandusky
Giri
Palanismy,
Line
Pouchard
• Patricia
Cruse,
John
Kunze
• Bertram
Ludaescher
• Sky
Bristol,
Mike
Frame,
Richard
Huffine,
Viv
• Peter
Honeyman
Hutchison,
Jeff
Moriseie,
Jake
Weltzin,
Lisa
Zolly
• Stephanie
Hampton,
Chris
Jones,
Mai
• Cliff
Duke
Jones,
Ben
Leinfelder,
Andrew
Pippin
• Paul
Allen,
Rick
Bonney,
Steve
Kelling
• Carole
Goble
• Ryan
Scherle,
Todd
Vision
• Donald
Hobern
• Randy
Butler
• David
DeRoure
LEON LEVY
FOUNDATION
48
49. Ques;ons?
Contact
Points
John
W.
Cobb,
Ph.D.
Oak
Ridge
John
W.
Cobb,
Ph.D.
Oak
Ridge
Na;onal
Lab
cobbjw@ornl.gov
865.576.5439
hip://www.dataone.org/
hip://docs.dataone.org
49