Ilya Zaslavsky, David Valentine, Amarnath Gupta, Stephen Richard, Tanu Malik
Presentation given in the afternoon Architecture Forum Session on Day 1, June 24 at the EarthCube All-Hands Meeting
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
The document discusses publishers' roles in data sharing and challenges in open science. It notes that while most scientists agree access to others' data would benefit research, fewer are willing to share their own data due to lack of training and incentives. Publishers are working to establish data sharing guidelines and integrate platforms to store, share, and analyze research data and tools. However, many questions remain around publishing data science given distributed and interconnected data, tools, and knowledge networks. Publishers will need to transition from pipelines to platforms and enable these new network effects.
This document outlines recommendations from a project investigating institutional data management at a UK university. It finds that while some data management capabilities exist, practices are largely ad hoc with significant variation between departments. Researchers desire more storage and backup support. Recommendations include developing a university-wide data repository, comprehensive backup services, research data lifecycle training, and embedding exemplary practices. Pilot projects in archaeology and chemistry aim to test training and metadata frameworks. A sustainable business model is needed to provide coherent, affordable data management support across all disciplines over the long term.
Big Data As a service - Sethuonline.com | Sathyabama University Chennaisethuraman R
An Efficient Framework for Data As A Service in Hadoop EcoSystem.
R.Sethuraman M.E,(PhD).,
Assistant Professor,
Faculty of Computing,
Dept of Computer Science Engineering,
Sathyabama University
http://Sethuonline.com
The NSF DataNet Program aims to create exemplar data infrastructure organizations called DataNet Partners to provide researchers with access to data and advance research. SEAD is one such DataNet Partner that provides lightweight data services for sustainability science. It acts as an active content repository and curation service, and is developing tools for community exploration of data. The current focus is on an end-user workshop, conference demonstrations, and interface redesign to refine models for supporting the full lifecycle of research data objects.
Clinical Decision Support Systems (CDSS) were explicitly introduced in the 90’s with the aim of providing knowledge to clinicians in order to influence its decisions and, therefore, improve patients’ health care. There are different architectural approaches for implementing CDSS. Some of these approaches are based on cloud computing, which provides on-demand computing resources over the internet. The goal of this paper is to determine and discuss key issues and approaches involving architectural designs in implementing a CDSS using cloud computing. To this end, we performed a standard Systematic Literature Review (SLR) of primary studies showing the intervention of cloud computing on CDSS implementations. Twenty-one primary studies were reviewed. We found that CDSS architectural components are similar in most of the studies. Cloud-based CDSS are most used in Home Healthcare and Emergency Medical Systems. Alerts/Reminders and Knowledge Service are the most common implementations. Major challenges are around security, performance, and compatibility. We concluded on the benefits of implementing a cloud-based CDSS since it allows cost-efficient, ubiquitous and elastic computing resources. We highlight that some studies show weaknesses regarding the conceptualization of a cloud-based computing approach and lack of a formal methodology in the architectural design process.
This document discusses the need for improved scientific data management systems to support data-driven discovery. It proposes adopting a digital asset management (DAM) approach used in creative fields like photography. Key points:
- Current scientific data management is manual and cannot scale with increasing data volumes and complexity, slowing the pace of discovery.
- A DAM framework is proposed to automate data acquisition, organization, access and sharing using metadata and models tailored for each scientific domain.
- The framework would transform how scientists interact with data, facilitating analysis and reproducibility.
- An initial DAM platform called DERIVA is presented and has been evaluated positively in early use cases.
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
The document discusses publishers' roles in data sharing and challenges in open science. It notes that while most scientists agree access to others' data would benefit research, fewer are willing to share their own data due to lack of training and incentives. Publishers are working to establish data sharing guidelines and integrate platforms to store, share, and analyze research data and tools. However, many questions remain around publishing data science given distributed and interconnected data, tools, and knowledge networks. Publishers will need to transition from pipelines to platforms and enable these new network effects.
This document outlines recommendations from a project investigating institutional data management at a UK university. It finds that while some data management capabilities exist, practices are largely ad hoc with significant variation between departments. Researchers desire more storage and backup support. Recommendations include developing a university-wide data repository, comprehensive backup services, research data lifecycle training, and embedding exemplary practices. Pilot projects in archaeology and chemistry aim to test training and metadata frameworks. A sustainable business model is needed to provide coherent, affordable data management support across all disciplines over the long term.
Big Data As a service - Sethuonline.com | Sathyabama University Chennaisethuraman R
An Efficient Framework for Data As A Service in Hadoop EcoSystem.
R.Sethuraman M.E,(PhD).,
Assistant Professor,
Faculty of Computing,
Dept of Computer Science Engineering,
Sathyabama University
http://Sethuonline.com
The NSF DataNet Program aims to create exemplar data infrastructure organizations called DataNet Partners to provide researchers with access to data and advance research. SEAD is one such DataNet Partner that provides lightweight data services for sustainability science. It acts as an active content repository and curation service, and is developing tools for community exploration of data. The current focus is on an end-user workshop, conference demonstrations, and interface redesign to refine models for supporting the full lifecycle of research data objects.
Clinical Decision Support Systems (CDSS) were explicitly introduced in the 90’s with the aim of providing knowledge to clinicians in order to influence its decisions and, therefore, improve patients’ health care. There are different architectural approaches for implementing CDSS. Some of these approaches are based on cloud computing, which provides on-demand computing resources over the internet. The goal of this paper is to determine and discuss key issues and approaches involving architectural designs in implementing a CDSS using cloud computing. To this end, we performed a standard Systematic Literature Review (SLR) of primary studies showing the intervention of cloud computing on CDSS implementations. Twenty-one primary studies were reviewed. We found that CDSS architectural components are similar in most of the studies. Cloud-based CDSS are most used in Home Healthcare and Emergency Medical Systems. Alerts/Reminders and Knowledge Service are the most common implementations. Major challenges are around security, performance, and compatibility. We concluded on the benefits of implementing a cloud-based CDSS since it allows cost-efficient, ubiquitous and elastic computing resources. We highlight that some studies show weaknesses regarding the conceptualization of a cloud-based computing approach and lack of a formal methodology in the architectural design process.
This document discusses the need for improved scientific data management systems to support data-driven discovery. It proposes adopting a digital asset management (DAM) approach used in creative fields like photography. Key points:
- Current scientific data management is manual and cannot scale with increasing data volumes and complexity, slowing the pace of discovery.
- A DAM framework is proposed to automate data acquisition, organization, access and sharing using metadata and models tailored for each scientific domain.
- The framework would transform how scientists interact with data, facilitating analysis and reproducibility.
- An initial DAM platform called DERIVA is presented and has been evaluated positively in early use cases.
NITRD Big Data Interagency Working Group Workshop: Pioneering the Future of Federally Supported Data Repositories Jan 13, 2021 - Opening comments on where we are and one suggestion of where we might go with an International Data Science Institute (IDSI) - A blue sky view.
Infrastructure for Supporting Computational Social ScienceDerek Hansen
This document discusses the need for infrastructure research to support computational social science. It notes current limitations with relying solely on corporate or third-party tools for data access and analysis. Specifically, these tools are not designed for research needs, duplication of effort is required, APIs are limited and changing, and maintaining third-party tools is challenging. The document proposes a large-scale collaborative solution involving data handling and processing, human-computer interaction, and legal/social considerations to better enable social science research. Collaboration with groups like CASCI and DSST is suggested.
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.
Philip Bourne presented on the NIH's Big Data to Knowledge (BD2K) initiative and the Associate Director for Data Science (ADDS) office. The goals of BD2K are to use data science to accelerate biomedical research and enhance health outcomes. BD2K supports various centers, projects, and training programs related to data discovery, standards, cloud computing, sustainability, and workforce development. The ADDS office oversees BD2K and aims to establish a sustainable data science ecosystem and well-trained workforce to enable major scientific discoveries through data-driven research.
This document discusses using data mining techniques for energy resource management. It outlines objectives like classification, regression, forecasting and anomaly detection. Techniques covered include cluster analysis, classification trees, neural networks, genetic algorithms and Bayesian models. Applications involve meeting strategic objectives, making business/engineering decisions and energy budgets. Challenges with large data include integration, wasted time and disconnects. The document proposes solutions like Extract-Transform-Load, Hadoop and cloud computing. The methodology uses a geographic information system, forecasting engines and an application programming interface to transfer and analyze data.
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
Presentation at Dementia Conference (Evington Initiative) held at Wellcome Trust, 22-23 October 2012. Acknowledgements to McKinsey & Company, also Tim Clark (MGH) and Iain Buchan (University of Manchester), for input to slides.
NIH Data Commons - Note: Presentation has animations Vivien Bonazzi
Presented at the Data Commons & Data Science Workshop (University of Chicago - Centre for Data Intensive Science):
NB- there are animations in these slides so static slides might not view well
This document proposes a new model for biomedical computing called the NIH Commons that aims to improve data sharing, reduce costs, and increase access to computational resources. It would provide investigators with credits that can be used at multiple cloud computing providers. This 3-year pilot program seeks to test if this approach can enhance data sharing and lower costs compared to current practices. Key aspects that will need to be defined include the requirements for cloud providers to participate, criteria for approving credit requests, and metrics to evaluate the model's effectiveness. Feedback will be sought from experts to help design and implement this new biomedical computing framework.
• Improve Data Management with Semantic Data Integration
• Discuss the issues of data variety and data uncertainty
• Moving from Big Data to Big Analysis
• How to apply Analysis to Big Data (Big Analysis)
• Benefits of Advanced Analytics in Life Science
Conceptual Architecture for USDA and NSF Terrestrial Observation Network Inte...Brian Wee
This document discusses the need for interoperability between research infrastructures like NEON and LTAR to better understand the impacts of climate change on agro-ecosystems. It proposes a requirements-driven interoperability framework to integrate data from these organizations. This framework is based on defining science requirements and measurements, establishing common algorithms and protocols, ensuring traceability of measurements, and developing supporting informatics. The goal is to seamlessly integrate data to help address challenges around food security and sustainability under a changing climate.
Life science requirements from e-infrastructure:initial results from a joint...Rafael C. Jimenez
This document summarizes a workshop on life science requirements from e-infrastructure held by BioMedBridges. It discusses how big data is affecting challenges like data growth outpacing storage and transfer speeds. Potential solutions proposed include improving storage, compression, networking, partitioning data, and computing approaches like clouds. The workshop concluded that e-infrastructures need to better understand research infrastructure problems, evaluate bottlenecks, discuss solutions, and define requirements as big data will change current approaches to data sharing and management.
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
Table of Content - International Journal of Managing Information Technology (...IJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
This talk will provide a means to discuss the capture, integration and dissemination of data across large enterprises. We will show how data variety is continuing to grow, meaning new data sources are steadily becoming available for use in analysis. Data veracity is also of importance since a large amount of data is fuzzy (uncertain) in nature. The ability to integrate these various data sources and provide improved capabilities to understand and use it is of increasing importance in today’s pharma climate. We call this Reference Master Data Management (RMDM).
This talk will span an arc of data lifecycle management, beginning with instrument data, moving across to clinical studies, production, regulatory affairs and finally e-archiving (see Fig. 1). I will show how these systems can use a common semantics for modeling of important metadata, which can apply the FAIR principles of Findability, Accessibility, Interoperability and Reusability to a common “semantic hub” that can connect data sources of different varieties across the enterprise. ADF files, for example, use their Data Description layer to provide semantic metadata about file contents. Similarly, semantics can be used to describe clinical trials data, regulatory data, etc., through to archiving, for improved storage and search over long periods of time.
Trust threads : Active Curation and Publishing in SEADBeth Plale
Describes Trust Threads, minimalist approach to provenance capture to enhance trustworthiness of published data. Implemented as part of SEAD's Active Curation and Publishing Services. At National Data Integrity Conference, Ft. Collins, Colorado, May 2015.
Building Data Ecosystems for Accelerated Discoveryadamkraut
Large federated data ecosystems require diverse teams that can design, build, and integrate a broad range of services to support scientific workflows. Our collaborative team operates at the intersection of science, technology, and data to assess, implement, and teach the key capabilities and capacities modern healthcare and life science needs. Learn the data management techniques, tools, platforms, and frameworks that are proven to be effective at solving complex problems at scale.
Cyberenvironments integrate shared and custom cyberinfrastructure resources into a process-oriented framework to support scientific communities and allow researchers to focus on their work rather than managing infrastructure. They enable more complex multi-disciplinary challenges to be tackled through enhanced knowledge production and application. Key challenges include coordinating distributed resources and users without centralization and evolving systems rapidly to keep pace with advancing science.
The document summarizes presentations from three perspectives on progress towards open and interoperable research data service workflows:
1) Angus Whyte of the Digital Curation Centre discussed new DCC guidance and design principles for integrating research data service workflows.
2) Rory Macneil of Research Space discussed integrating their ELN with University of Edinburgh's DataShare and Harvard's Dataverse repositories using open standards.
3) Stuart Lewis of University of Edinburgh discussed their DataVault prototype for packaging data to be archived from a Jisc Research Data Spring project. The case studies illustrate challenges and opportunities for improving integration between active data management and long-term preservation services.
NITRD Big Data Interagency Working Group Workshop: Pioneering the Future of Federally Supported Data Repositories Jan 13, 2021 - Opening comments on where we are and one suggestion of where we might go with an International Data Science Institute (IDSI) - A blue sky view.
Infrastructure for Supporting Computational Social ScienceDerek Hansen
This document discusses the need for infrastructure research to support computational social science. It notes current limitations with relying solely on corporate or third-party tools for data access and analysis. Specifically, these tools are not designed for research needs, duplication of effort is required, APIs are limited and changing, and maintaining third-party tools is challenging. The document proposes a large-scale collaborative solution involving data handling and processing, human-computer interaction, and legal/social considerations to better enable social science research. Collaboration with groups like CASCI and DSST is suggested.
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.
Philip Bourne presented on the NIH's Big Data to Knowledge (BD2K) initiative and the Associate Director for Data Science (ADDS) office. The goals of BD2K are to use data science to accelerate biomedical research and enhance health outcomes. BD2K supports various centers, projects, and training programs related to data discovery, standards, cloud computing, sustainability, and workforce development. The ADDS office oversees BD2K and aims to establish a sustainable data science ecosystem and well-trained workforce to enable major scientific discoveries through data-driven research.
This document discusses using data mining techniques for energy resource management. It outlines objectives like classification, regression, forecasting and anomaly detection. Techniques covered include cluster analysis, classification trees, neural networks, genetic algorithms and Bayesian models. Applications involve meeting strategic objectives, making business/engineering decisions and energy budgets. Challenges with large data include integration, wasted time and disconnects. The document proposes solutions like Extract-Transform-Load, Hadoop and cloud computing. The methodology uses a geographic information system, forecasting engines and an application programming interface to transfer and analyze data.
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
Presentation at Dementia Conference (Evington Initiative) held at Wellcome Trust, 22-23 October 2012. Acknowledgements to McKinsey & Company, also Tim Clark (MGH) and Iain Buchan (University of Manchester), for input to slides.
NIH Data Commons - Note: Presentation has animations Vivien Bonazzi
Presented at the Data Commons & Data Science Workshop (University of Chicago - Centre for Data Intensive Science):
NB- there are animations in these slides so static slides might not view well
This document proposes a new model for biomedical computing called the NIH Commons that aims to improve data sharing, reduce costs, and increase access to computational resources. It would provide investigators with credits that can be used at multiple cloud computing providers. This 3-year pilot program seeks to test if this approach can enhance data sharing and lower costs compared to current practices. Key aspects that will need to be defined include the requirements for cloud providers to participate, criteria for approving credit requests, and metrics to evaluate the model's effectiveness. Feedback will be sought from experts to help design and implement this new biomedical computing framework.
• Improve Data Management with Semantic Data Integration
• Discuss the issues of data variety and data uncertainty
• Moving from Big Data to Big Analysis
• How to apply Analysis to Big Data (Big Analysis)
• Benefits of Advanced Analytics in Life Science
Conceptual Architecture for USDA and NSF Terrestrial Observation Network Inte...Brian Wee
This document discusses the need for interoperability between research infrastructures like NEON and LTAR to better understand the impacts of climate change on agro-ecosystems. It proposes a requirements-driven interoperability framework to integrate data from these organizations. This framework is based on defining science requirements and measurements, establishing common algorithms and protocols, ensuring traceability of measurements, and developing supporting informatics. The goal is to seamlessly integrate data to help address challenges around food security and sustainability under a changing climate.
Life science requirements from e-infrastructure:initial results from a joint...Rafael C. Jimenez
This document summarizes a workshop on life science requirements from e-infrastructure held by BioMedBridges. It discusses how big data is affecting challenges like data growth outpacing storage and transfer speeds. Potential solutions proposed include improving storage, compression, networking, partitioning data, and computing approaches like clouds. The workshop concluded that e-infrastructures need to better understand research infrastructure problems, evaluate bottlenecks, discuss solutions, and define requirements as big data will change current approaches to data sharing and management.
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
Table of Content - International Journal of Managing Information Technology (...IJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
This talk will provide a means to discuss the capture, integration and dissemination of data across large enterprises. We will show how data variety is continuing to grow, meaning new data sources are steadily becoming available for use in analysis. Data veracity is also of importance since a large amount of data is fuzzy (uncertain) in nature. The ability to integrate these various data sources and provide improved capabilities to understand and use it is of increasing importance in today’s pharma climate. We call this Reference Master Data Management (RMDM).
This talk will span an arc of data lifecycle management, beginning with instrument data, moving across to clinical studies, production, regulatory affairs and finally e-archiving (see Fig. 1). I will show how these systems can use a common semantics for modeling of important metadata, which can apply the FAIR principles of Findability, Accessibility, Interoperability and Reusability to a common “semantic hub” that can connect data sources of different varieties across the enterprise. ADF files, for example, use their Data Description layer to provide semantic metadata about file contents. Similarly, semantics can be used to describe clinical trials data, regulatory data, etc., through to archiving, for improved storage and search over long periods of time.
Trust threads : Active Curation and Publishing in SEADBeth Plale
Describes Trust Threads, minimalist approach to provenance capture to enhance trustworthiness of published data. Implemented as part of SEAD's Active Curation and Publishing Services. At National Data Integrity Conference, Ft. Collins, Colorado, May 2015.
Building Data Ecosystems for Accelerated Discoveryadamkraut
Large federated data ecosystems require diverse teams that can design, build, and integrate a broad range of services to support scientific workflows. Our collaborative team operates at the intersection of science, technology, and data to assess, implement, and teach the key capabilities and capacities modern healthcare and life science needs. Learn the data management techniques, tools, platforms, and frameworks that are proven to be effective at solving complex problems at scale.
Cyberenvironments integrate shared and custom cyberinfrastructure resources into a process-oriented framework to support scientific communities and allow researchers to focus on their work rather than managing infrastructure. They enable more complex multi-disciplinary challenges to be tackled through enhanced knowledge production and application. Key challenges include coordinating distributed resources and users without centralization and evolving systems rapidly to keep pace with advancing science.
The document summarizes presentations from three perspectives on progress towards open and interoperable research data service workflows:
1) Angus Whyte of the Digital Curation Centre discussed new DCC guidance and design principles for integrating research data service workflows.
2) Rory Macneil of Research Space discussed integrating their ELN with University of Edinburgh's DataShare and Harvard's Dataverse repositories using open standards.
3) Stuart Lewis of University of Edinburgh discussed their DataVault prototype for packaging data to be archived from a Jisc Research Data Spring project. The case studies illustrate challenges and opportunities for improving integration between active data management and long-term preservation services.
This document discusses several studies on user engagement in research data curation. It finds that institutional repositories for data were developed without input from researchers, leading to systems that did not meet researchers' needs. Barriers to open data sharing included concerns over commercial use and maintaining ownership. Successful data curation requires understanding disciplinary differences and developing trusted relationships with researchers through dialogue early in projects.
Just a few slides I put together to quickly introduce the idea of Virtual Research Environments (VRE) at the University of Lincoln. All content was borrowed from JISC's work on VREs.
The document discusses the need for an NIH Data Commons to address challenges with data sharing and storage. It describes how factors like increasing data volumes, availability of cloud technologies, and emphasis on FAIR data principles are driving the need for a centralized data platform. The proposed NIH Data Commons would provide findable, accessible, interoperable and reusable data through cloud-based services and tools. It would enable data-driven science by facilitating discovery, access and analysis of biomedical data across different sources. Plans are outlined to develop and test an initial Data Commons pilot using existing genomic and other biomedical datasets.
The document discusses the Materials Genome Initiative (MGI) and the High-Throughput Experimental Materials Collaboratory (HTE-MC). It describes NIST's role in supporting MGI through developing a materials innovation infrastructure. It outlines the vision for HTE-MC, which would integrate high-throughput synthesis and characterization tools across multiple institutions through a shared network and data management platform. This would provide broader access to experimental facilities and materials data to support accelerated materials discovery. A workshop was held in 2018 to discuss establishing the HTE-MC concept and defining its technical, operational and business models.
The document describes a project between Mendeley and Symplectic to increase rates of unmandated deposit into institutional repositories. By integrating repository deposit directly into the Mendeley research collaboration tool, researchers will be able to easily sync their publications from Mendeley into their local institutional repository with a single click. This is expected to greatly increase deposit rates by removing barriers like copyright uncertainty and the time needed to submit publications manually.
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
SDSC is a leader in high performance computing, data-intensive computing, and scientific data management. It focuses on "Big Data", "versatile computing", and "life sciences applications". The SDSC Data Science Office provides expertise, systems, and training for data science applications. Genomic analysis poses big data and computing challenges including data management, integration, and coordination and workflow management. New tools are needed to address these challenges. bioKepler is an example of a Kepler module for data-parallel bioinformatics. Training is also needed at the interface of domains to build the next generation of interdisciplinary scientists. SDSC works with industry partners through various strategies like sponsored research and providing access to systems and expertise.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
The document discusses the need to transition from digital preservation projects to infrastructure by integrating preservation into the data production process. It outlines gaps between repositories and producer communities and suggests bridging these gaps by developing interoperable tools across production and preservation workflows, and embedding preservation awareness and skills into research training and practice. Building comprehensive digital preservation infrastructure requires considering both technical and social aspects across local, global, and disciplinary contexts.
1. The document discusses using eResearch approaches like shared data, analyses, and cyberinfrastructure to support collaborative research on free/libre and open source software (FLOSS).
2. The authors are replicating and extending several FLOSS research papers using workflow tools to make the analyses reusable, flexible, and easy to share.
3. Preliminary results found that eResearch approaches show promise for advancing social science research by facilitating analysis extension, replication, and sensitivity testing.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
Bridging Gaps and Broadening Participation inToday's and Future Research Com...Sandra Gesing
Research computing is in an exciting era and has never as fast evolved as in the last 20 years. We can nowadays answer research questions that we could not even ask two decades ago. This has led to discoveries such as the analyses of DNA from Next-Generation Sequencing technologies. The increased complexity of software, data, hardware and lab instruments demands for more openness and sharing of data and methods. Researchers and educators are not necessarily IT specialists though. Thus, a further trend in research computing is the shift from system-centric design to user-centric design and interdisciplinary teams – complex solutions are offered in self-explanatory user interfaces, so-called science gateways or virtual research environments. I will present solutions and projects supporting users to be able to focus on their research questions without the need to become acquainted with the nitty-gritty details of the complex research computing infrastructure. Key aspects of the presented projects are usability and interoperability of computational methods, reproducibility of research results as well as sustainability of research software. Sustainability of research software has many facets. I advocate for improving the diversity in workforce development, career paths for research software engineers and for incentivizing their work via means beyond the traditional academic rewarding system.
Research process and research data management. Many universities are looking at how they can better serve the needs of researchers. Ken Chad Consulting worked with the University of Westminster to look the needs and attitudes of researchers and admin staff in terms of research data management (RDM). The result led the University to look first at the whole lifecycle and workflows of research administration. This in turn led to the innovative, rapid development of a system to support researchers and admin staff. Presented by Suzanne Enright (University of Westminster) and Ken Chad at the annual UKSG conference in April 2014
The document summarizes research into developing a single research portal at Westminster University to improve research processes. It found that researchers were unaware of formal research data management practices and struggled with disconnected systems. A proposed solution is a central portal allowing easier identification of support needs, visibility of research, and collaboration. An initial focus on doctoral projects saw time savings. Next steps involve managing research outputs through a single interface. Key lessons are that researchers prefer easy solutions and involvement in development.
This document outlines a Ph.D. proposal to examine the use of workflow engines and coupling frameworks in developing hydrologic modeling systems. Specifically, it will develop hydrologic models within the TRIDENT workflow engine and OpenMI coupling framework to evaluate their capabilities for building community modeling systems. The research will include developing component models, building sample workflows, and testing models on three sites. The goal is to contribute optimized hydrologic modeling tools and assess the suitability of these approaches for collaborative hydrologic modeling.
The document describes Dr. Mahdi Fahmideh's background and research interests which include disruptive technologies like cloud computing, IoT, blockchain, and data analytics. It provides examples of Dr. Fahmideh's research output, including a process model developed using design science research for migrating legacy applications to the cloud. The document identifies knowledge gaps in the current literature around cloud migration processes and outlines Dr. Fahmideh's research objective to develop a generic, customizable cloud migration process model.
Australia's Environmental Predictive CapabilityTERN Australia
Federating world-leading research, data and technical capabilities to create Australia’s National Environmental Prediction System (NEPS).
Community consultation presentation.
3-12 February 2020
Dr Michelle Barker (Facilitator)
(Presentation v5)
Similar to AHM 2014: Enterprise Architecture for Transformative Research and Collaboration Across Geoscinces (20)
EarthCube Community Webinar held Tuesday, Dec. 9th at 11:00 PST/2:00 EST for a virtual kick-off of the new 'Demonstration Phase' of EarthCube, including statements from your Leadership Council members and an update from NSF Program Officer, Eva Zanzerkia.
Engagement Team monthly meeting 10.10.2014EarthCube
The document outlines the agenda and priorities for an EarthCube Demonstration Governance Engagement Team meeting in October 2014. The agenda includes an introduction, announcing a team representative to the Leadership Council, developing internal leadership, reviewing priorities and logistical functions, and discussing future meeting schedules. Key priorities and deliverables for the team are to develop an outreach and communications plan to engage the EarthCube community and stakeholders through compiling science use cases. Housekeeping, meeting leadership, point of contact roles, work management, and collaboration with other groups are listed as important logistical functions for the team.
The document summarizes the agenda and priorities for an October meeting of the Science Standing Committee. The agenda includes an introduction, announcing committee representatives, developing internal leadership, and reviewing priorities and logistical functions. The committee's year 1 intended outcome is to support work to complete the year 1 deliverable of developing science use cases. Their priorities are housekeeping tasks like assigning a meeting lead and point of contact for the oversight office.
This document summarizes an EarthCube meeting to discuss funded demonstration projects and governance. It outlines the agenda, including introductions from new project teams and a discussion of the role of funded projects. Key points include that the Test Governance project will coordinate the demonstration governance process and report outcomes to NSF. Both the Technology & Architecture Committee and Science Committee outlined initial steps, including forming subcommittees to analyze use cases and gaps. The meeting concluded with a discussion of how funded projects can best work with standing committees through formal work plans, representatives, and regular communication.
Technology and Architecture Committee meeting slides 10.06.14EarthCube
The October meeting agenda of the EarthCube Technology and Architecture Standing Committee included:
1) Welcome and introductions
2) Announcement of new committee representatives
3) Discussion of the committee's internal leadership structure and responsibilities, including coordinating with other groups, monitoring working groups, and sponsoring new working groups.
4) Review of timelines for upcoming milestones and deliverables and discussion of future meeting schedules.
EarthCube Governance Intro for Solar Terrestrial End-user WorkshopEarthCube
Presentation by the EarthCube Test Enterprise Governance project for the Solar Terrestrial Research End-User Workshop, Newark, New Jersey, August 14, 2014.
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...EarthCube
The document discusses the CSDMS Standard Names, which provide unambiguous naming conventions for describing process models, data sets, and their associated variables. The standard names aim to avoid ambiguity and domain-specific terminology. They support naming quantities, processes, mathematical operations, assumptions, and more. Developing and applying standard names helps different models to automatically match variables and understand each other.
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...EarthCube
This panel will address data heterogeneity issues in EarthCube from the perspective of semantic building blocks and cyberinfrastructure. The panel, convened by Gary Berg-Cross of SOCoP, will feature co-conveners Pascal Hitzler of Wright State University, Kerstin Lehnert of LDEO, Columbia University, and Peter Wiebe of Woods Hole Oceanographic Institution. Additional panelists will include Scott Peckham of University of Colorado Boulder, Anthony Aufdenkampe of Stroud Water Research Center, Tim Finin of University of Maryland Baltimore County, and Krzysztof Janowicz of University of California Santa Barbara.
AHM 2014: Revisting Governance Model, Preparing for Next StepsEarthCube
The document lists several potential priorities for EarthCube including developing an emergent architecture, identifying and promoting success stories, providing guidelines for shared services, developing common end user training, benchmarking progress against scientific needs, creating a prototype to demonstrate connectivity and functionality, documenting scientific workflows, and coordinating projects. Additional options mentioned are scoping and articulating a vision, identifying collaborations, documenting use cases, engaging academia in education, improving data management plans and data discovery, establishing light governance led by scientists, tying different design efforts together, determining funding mechanisms, adopting standards, enabling participation from diverse fields, and engaging stakeholders.
AHM 2014: Integrated Data Management System for Critical Zone ObservatoriesEarthCube
Presentation by Anthony Aufdenkampe during the Addressing Data Heterogeneity, Semantic Building Bloack & CI Perspective Session on Day 2, June 25 at the EarthCube All-Hands Meeting
The document discusses the CSDMS Standard Names, which are naming conventions developed by the Community Surface Dynamics Modeling System (CSDMS) modeling framework to facilitate automatic coupling of models and data sets from different contributors. The naming conventions follow an object-oriented approach where each standard variable name is composed of an object name and quantity name joined by double underscores. This allows framework software to retrieve numerical values for variables based on their standardized names. The naming conventions were designed according to criteria such as avoiding ambiguity, using widely understood terminology, and supporting mathematical operations and assumptions. They address challenges of automatic semantic mediation when coupling diverse resources that use different naming systems.
The document discusses a watershed modeling system called BCube that aims to decrease the effort of watershed initialization by brokering various global geospatial and environmental data required for watershed modeling. BCube allows researchers to focus on scientific research by providing a single access point to the different data formats and sources for elevation, soils, land use, weather, and other data needed to set up and run watershed models. The document provides an overview of the types of data BCube can broker and the workflow where a scientist requests data for a watershed area and BCube returns the available options to choose from.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
How to Get CNIC Information System with Paksim Ga.pptx
AHM 2014: Enterprise Architecture for Transformative Research and Collaboration Across Geoscinces
1. EarthCube Conceptual Design:
Enterprise Architecture for
Transformative Research
and Collaboration
Across the Geosciences
http://workspace.earthcube.org/transformative-research-collaboration
ILYA ZASLAVSKY, DAVID VALENTINE, AMARNATH GUPTA
San Diego Supercomputer Center/UCSD
STEPHEN RICHARD
Arizona Geological Survey
TANU MALIK
University of Chicago
2. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
The Science Enterprise
• Ask questions
• Collect information
• Formulate hypotheses
• Test hypotheses to
determine which (if any)
provide satisfactory answer
• Document, curate, and
disseminate data and
results.
…. AND INCREASINGLY:
• Integrate data, analyses,
models across domains
• Collaborate: leverage pooled expertise and resources
increasing amount of data produced in modern science. LSDMA
bridges the gap between data production and data analysis using
a novel approach by combining specific community support and
generic, cross community development. In the Data Life Cycle
Labs (DLCL) experts from the data domain work closely with
scientific groups of selected research domains in joint R&D
where community-specific data life cycles are iteratively
optimized, data and meta-data formats are defined and
standardized, simple access and use is established as well as data
and scientific insights are preserved in long-term and open
accessible archives.
Keywords: data management, data life cycle, data intensive
computing, data analysis, data exploration, LSDMA, support, data
infrastructure
I. INTRODUCTION
Today data is knowledge – data exploration has become the
4th pillar in modern science besides experiment, theory, and
simulation as postulated by Jim Gray in 2007 [1]. Rapidly
increasing data rates in experiments, measurements and
simulation are limiting the speed of scientific production in
various research communities and the gap between the
generated data and data entering the data life cycle (cf. Fig1) is
widening. By providing high performance data management
components, analysis tools, computing resources, storage and
services it is possible to address this challenge but the
realization of a data intensive infrastructure at institutes and
universities is usually time consuming and always expensive.
The introduced “Large Scale Data Management and Analysis”
(LSDMA) project extends the services for research of the
Helmholtz Association of research centers in Germany with
community specific Data Life Cycle Laboratories (DLCL). The
The LSDMA project initiated at the Karlsruhe Institute of
Technology (KIT), builds on the familiarity with supporting
local scientists at a computer center, the knowledge of running
the Grid Computing Centre Karlsruhe (GridKa) [2] as the
German Tier 1 hub in the World Wide LHC Computing
infrastructure [3], the Large Scale Data Facility (LSDF) [4] and
the experience with the very successful Simulation Labs [5]
that specialize at supporting HPC users.
Figure 1. The scientific data life cycle
3. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Design Framework:
Federation of Systems
Research enterprise includes subsystems at the project, program and
agency level, many of which are independent of NSF
• Requirements are a moving target
• Emergent behavior is to be expected
• Technology is constantly changing
• Community governance within constraints of funding agencies
• Evolutionary process and adaptation:
• Lots of variation; Mechanism to select ‘fittest’; Composability
• Technology must foster delegation of responsibilities and communication:
• Promote self-organization, Cultivate ideas, Maintain feedback between
subsystems
• Reliability: responsiveness, robustness, correctness
• Identity of system is based on shared goals and practices
4. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Communication loops
Bottom-up Studies
Top-down Studies
Cross-Domain Scientists
Trends and
Patterns
Data
interoperability
best practices
Scientific Governance
Success stories
Technical Governance
Data Providers
Feasibility
Priorities
Strategies
Data Products
Options
Costs
Problems
and issues
Related work
Questions and
clarifications
Questions and
clarifications
5. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Communication metrics
7. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Converging on reference
architecture semantics
Analysis of existing building blocks, and their variability
Component
System
Function
Description
Interfaces
Implementation
Steward Organization
Availability
Reference
Developing cross-domain vocabularies, connecting domain models
8. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Requirements Process
Workshop Summaries
Surveys
Architecture Designs
Analyze what worked
Incorporate social
technologies
Inventory CI building blocks
9. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Concerns
Hitting the right level of granularity in the design
Identifying necessary communication channels
Account for all key perspectives
Fixing the scope and technologies
Balancing current and future requirements
Harmonizing technical and social subsystems and managing
interactions between them
Uneven standardization and convergence across domains
and functional components
Constructing a self-organizing plug-and-play system
Inventorying building blocks
10. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Summary
System is defined by:
Specifications for interfaces and interchange formats (the gateways)
Definition of key functional components at an abstract level
Discovery, Workflow s, Data processing, annotation, documentation
Technology needs to support
Communication between subsystems (people and machines)
Collection of metrics required to assess what is working (selection
of the fittest)
Assembly of components