Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
I review three frameworks for analytic operations that are designed to improve the value obtained when deploying analytic models into products, services and internal operations.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
I review three frameworks for analytic operations that are designed to improve the value obtained when deploying analytic models into products, services and internal operations.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Open science is yielding active efforts to make data from research available for broader use. But data have restrictions on them (privacy, sensitivity restrictions; regulated by statute or otherwise) that can limit their ability to be made available more broadly. In this talk we offer that there are alternate approaches to the spectrum of data sharing options that offers more control over data than full sharing yet are more contributory than no sharing at all. We offer the controlled compute environment, or capsule, as a viable new approach for computational analysis of data that have restrictions. The compute environment increases the range of possibilities for facilitating science through data reuse, an objective of open science. This talk frames the capsule, and provides experience based on one such capsule used in HathiTrust for research with copyrighted materials.
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
Abstract
Good data stewardship is the cornerstone of knowledge, discovery, and innovation in research. The FAIR Data Principles address data creators, stewards, software engineers, publishers, and others to promote maximum use of research data. The principles can be used as a framework for fostering and extending research data services.
This talk will provide an overview of the FAIR principles and the drivers behind their development by a broad community of international stakeholders. We will explore a range of topics related to putting FAIR data into practice, including how and where data can be described, stored, and made discoverable (e.g., data repositories, metadata); methods for identifying and citing data; interoperability of (meta)data; best-practice examples; and tips for enabling data reuse (e.g., data licensing). Practical examples of how FAIR is applied will be provided along the way.
Presenter: Christopher Erdmann, Engagement, support, and training expert on the NHLBI BioData Catalyst project at University of North Carolina Renaissance Computing Institute
dkNET Webinars Information: https://dknet.org/about/webinar
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
Lesson 7 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Data Mesh is the decentralized architecture where your units of architecture is a domain driven data set that is treated as a product owned by domains or teams that most intimately know that data either creating it or they are consuming it and re-sharing it and allocated specific roles that have the accountability and the responsibility to provide that data as a product abstracting away complexity into infrastructure layer a self-serve infrastructure layer so that create these products more much more easily.
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Open science is yielding active efforts to make data from research available for broader use. But data have restrictions on them (privacy, sensitivity restrictions; regulated by statute or otherwise) that can limit their ability to be made available more broadly. In this talk we offer that there are alternate approaches to the spectrum of data sharing options that offers more control over data than full sharing yet are more contributory than no sharing at all. We offer the controlled compute environment, or capsule, as a viable new approach for computational analysis of data that have restrictions. The compute environment increases the range of possibilities for facilitating science through data reuse, an objective of open science. This talk frames the capsule, and provides experience based on one such capsule used in HathiTrust for research with copyrighted materials.
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
Abstract
Good data stewardship is the cornerstone of knowledge, discovery, and innovation in research. The FAIR Data Principles address data creators, stewards, software engineers, publishers, and others to promote maximum use of research data. The principles can be used as a framework for fostering and extending research data services.
This talk will provide an overview of the FAIR principles and the drivers behind their development by a broad community of international stakeholders. We will explore a range of topics related to putting FAIR data into practice, including how and where data can be described, stored, and made discoverable (e.g., data repositories, metadata); methods for identifying and citing data; interoperability of (meta)data; best-practice examples; and tips for enabling data reuse (e.g., data licensing). Practical examples of how FAIR is applied will be provided along the way.
Presenter: Christopher Erdmann, Engagement, support, and training expert on the NHLBI BioData Catalyst project at University of North Carolina Renaissance Computing Institute
dkNET Webinars Information: https://dknet.org/about/webinar
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
Lesson 7 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Data Mesh is the decentralized architecture where your units of architecture is a domain driven data set that is treated as a product owned by domains or teams that most intimately know that data either creating it or they are consuming it and re-sharing it and allocated specific roles that have the accountability and the responsibility to provide that data as a product abstracting away complexity into infrastructure layer a self-serve infrastructure layer so that create these products more much more easily.
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons Learned in Academic and Life Science Settings
Dan Housman, Recombinant by Deloitte
The Recombinant by Deloitte team has worked with organizations such as Kimmel Cancer Center as a model to adapt existing mature i2b2 implementations to meet business and scientific needs. Other organizations are increasingly focused on how to use cloud and high performance computing models to achieve different performance levels. Advanced initiatives are progressing to link commercial tools such as Qlikview to explore tranSMART data and to solve for key gaps in scientific pipelines. Dan will present recent lessons learned, new capabilities, and some of the impact on the path forwards for future tranSMART updates.
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertWansoo Im
A Framework for Geospatial Web Services for Public Health
by Leslie Lenert, MD, MS, FACMI, Director
National Center for Public Health Informatics, CCHIS, CDC
June 8 2009 URISA Public Health Conference
uploaded by Wansoo Im, Ph.D.
URISA Membership Committee Chair
http://www.gisinpublichealth.org
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
Project 1 Template
(Due on Week 4)
Name:
Date:
Executive summary
Briefly (no more than one page) describe the requirements and introduce the topics of the project.
List three benefits of using Azure
Explain what the company is struggling with and how the company could benefit from using the cloud.
Explain Azure cloud types and deployment models
Provide a detailed overview of Azure cloud types and deployment models.
Define Azure termsExplain each term in detail: · tenant· management group· subscription· resource group· resourcesExplain the significance of FedRAMP
Explain the need for certification by the Federal Risk and Authorization Management Program (FedRAMP) and what security assurance it provides. Propose an Azure governance model
The governance model should include the following:
· identity management
· access management groups
· security controls
· network services
· blueprintsReferences
Include at least five references using 7th edition APA citation style.
Cyber Domain Consultants
1
Cloud Services and Technology
Olugbenga Adeyemo
University of Maryland Global Campus
CCA 610 9042 Cloud Services and Technologies
Professor Mike Varnado
October 15, 2022
IT Business Requirements
Requirement analysis also referred to as requirements engineering is the way of identifying user’s projection for a new product.
As Ballot online is transiting over to cloud functionalities one route to determine the requirements includes breaking up business requirements into functional and nonfunctional system.
Functional requirements are the origin or heartbeat of a system, they analyze what the system need to fulfill. an example of functional is authenticating user’s logging into the system and sales system allowing user to record customer sales.
Nonfunctional requirements describe the backing elements of functional requirements. Nonfunctional requirements read out how the system would act and is a restriction on that behavior. An example of non-functional requirement is Speed: speed determines how system manages workloads when using different applications at the same time.
Functional Requirements for Ballot Online
This requirement that describes what Ballot Online application ought to do.
The functional requirement for ballot online will look like:
1.a. Client Focus: Client should be able to log in and out into the specific application.
b. Unauthorized client or user should not be able to log in or access the application.
c. System should allow new client to register without any issues.
2.Transaction, correction, adjustment, and cancellation:
The voters should be able to adjust, make corrections, and cancel their vote before they cast their votes.
3.Authneticatio: System should restrict voters from casting multiple votes.
b. There should be a notification when a vote is cast.
c. Voters should see ballot display on one screen.
4. Authorization Level: During the voting session the system will give access to.
Presentation given at the Consorcio Madrono conference on Data Management Plans in Horizon 2020 http://www.consorciomadrono.es/info/web/blogs/formacion/217.php
Learn more about Hitachi Content Platform Anywhere by visiting http://www.hds.com/products/file-and-content/hitachi-content-platform-anywhere.html
and more information on the Hitachi Content Platform is at http://www.hds.com/products/file-and-content/content-platform
In this paper, the authors describe an approach for sharing sensitive medical data with the consent of the
data owner. The framework builds on the advantages of the Semantic Web technologies and makes it
secure and robust for sharing sensitive information in a controlled environment. The framework uses a
combination of Role-Based and Rule-Based Access Policies to provide security to a medical data
repository as per the FAIR guidelines. A lightweight ontologywas developed, to collect consent from the
users indicating which part of their data they want to share with another user having a particular role.
Here, the authors have considered the scenario of sharing the medical data by the owner of data, say the
patient, with relevant persons such as physicians, researchers, pharmacist, etc. To prove this concept, the
authors developed a prototype and validated using the Sesame OpenRDF Workbench with 202,908 triples
and a consent graph stating consents per patient.
SURVEY ON DYNAMIC DATA SHARING IN PUBLIC CLOUD USING MULTI-AUTHORITY SYSTEMijiert bestjournal
The continuous development of cloud computing,seve ral trends are opening up to new forms of outsourci ng. Public data integrity auditing is not secure and efficient for shared dynamic data. In existing scheme figure out the collusion attack and provide an efficient public integrity au diting scheme,with the help of secure group user r evocation based on vector commitment and verifier�local revocation group signature. It provides secure and efficient s cheme which support public checking and efficient user revocati on. Problem of existing work they used TPA (Third p arty auditor) for key generation and key agreement. Use of TPA as central system if it fails then whole system gets failed. If we are working with cloud,user identity is major conc ern because user doesn�t want to reveal his persona l information to public. This concept not included in it. In this paper,based these con�s we proposed a dynamic dat a sharing in public cloud using multi-authority system. The prop osed scheme is able to protect user�s privacy again st each single authority. .
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
Robert L. Grossman, Practical Methods for Identifying Anomalies That Matter in Large Datasets, O’Reilly, Strata + Hadoop World, San Jose, California, February 20, 2015.
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
This is a talk I gave at the Strata Conference and Hadoop World in New York City on October 28, 2013. It describes predictive modeling in the context of modeling an adversary's behavior.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Some Proposed Principles for Interoperating Cloud Based Data Platforms
1. Some Proposed Principles for
Interoperating Cloud Based Data Platforms
Robert L. Grossman
Center for Translational Data Science
University of Chicago and
Open Commons Consortium
NIH Workshop on
Cloud-Based Platforms Interoperability
October 3, 2019
Draft 1.5
2. Josh Denny (Vanderbilt), David Glazer (Verily Life Sciences), Robert L.
Grossman (University of Chicago), Benedict Paten (University of California
at Santa Cruz), Anthony Philippakis (Broad Institute)
3. Data Biosphere Principles 1. modular, composed of
functional components
with well-specified
interfaces;
2. community-driven,
created by many
groups to foster a
diversity of ideas;
3. open, developed
under open-source
licenses that enable
extensibility and reuse,
with users able to add
custom, proprietary
modules as needed;
and
4. standards-based
Ingest
Explore
HCA
Analysis
Engine
Examples of Data Environments
Portals
Data
Generators
Researchers
Ingest
Explore
CRDC
Methods
Repo
Work-
Spaces
Store
Use in cloud
Ingest
Store
Explore
AoU
Store
Figure: Courtesy of Anthony Philippakis, Broad Institute
4. The question today: how do we go from building data commons to
building data ecosystems of interoperating data resources,
computational resources, and applications that explore, analyze,
visualize and share data and knowledge?
Cloud-based platforms
Cloud-based data ecosystems
of multiple platforms
6. Some Problems Today
• Platforms that refuse to expose any API and instead require all users
to use their platform or application, usually for competitive reasons.
• Platforms that bring data from other resources and platforms into
their system, but don’t let your data out.
• Platforms that don’t interoperate with other systems with the same
or greater security and compliance and blame security and
compliance.
7. Incentives /
Disincentives for
Interoperating
USG / NFP / For profits
Platform Builders /
Platform Operators
Researchers /
Research Consortiums
Patients / Data Generators
Patients Partnered Research
Many incentives to interoperate
Fewer incentives to interoperate
Some incentives to interoperate
8. Let’s Distinguish: Technical Guidelines vs Operating Principles
• Common vision: we have a common vision of interoperating to
accelerate research, improve patient outcomes and leverage
resources.
• Operating principles include questions about which platforms can
interoperate, whether a platform will expose an API, whether a
platform will be open and support different applications or will be
closed and only support a single application, etc.
• Technical guidelines can follow technical best practices (e.g. use a
persistent digital ID not tied to a particular domain or location
within a domain) or standards (e.g. GA4GH TES).
It may be helpful to think of policies as on an orthogonal axis.
9. Principles To Support a Data Ecosystem
• Use Digital IDs
• Interoperate with third party
authentication and authorization
services
• Expose your data through an API
• Expose your data model through an
API
• Interoperate with other trusted data
platforms with similar security &
compliance
• Process authorized queries and
computations from other systems
and return the results (scatter /
gather)
Please
• Refuse to expose any API and
instead require all users to use your
platform or application
• Bring data from other resources and
platforms into your system, but
don’t let your data out.
• Refuse to interoperate with other
systems with the same or greater
security and compliance
Please don’t
10. Narrow Middle Architecture
*Robert L. Grossman, Progress Towards Cancer Data Ecosystems, The Cancer Journal: The Journal of Principles & Practice
of Oncology, May/June, 2018.
11. Architectures for Data Ecosystems
• A simple data ecosystem can be built when a data commons exposes an API that can support a collection
of third party applications that can access data from the commons.
• More complex data ecosystems arise when multiple data commons and data clouds can interoperate and
support a collection of third party applications by using a common set of core services (called framework
services) that provide support for authentication, authorization, digital IDs, metadata, importing,
exporting and harmonization of phenotype data, etc.
Bioinformaticians curating
and submitting data
Researchers analyzing data
and making discoveries
cloud-based
platforms
container-based
workspaces
ML/AI apps
notebooks
data commons
• Authentication
• Authorization
• Digital IDs
• Importing, exporting &
harmonization of clinical data
• Can be multiple implementations
that trust each other & interop
12. Towards a Definition of a Trust Platform
• Before we discuss the operating principles, we need one definition. Let’s say that
Platform A trusts Platform B (so that B is trusted platform) if Platform B
i) operates with a set of policies, procedures and controls that have been
reviewed and approved by Platform A;
ii) the organizations associated with Platform A and Platform B have a formal
signed agreement describing any costs, liabilities, intellectual property issues,
data or data use limitations, etc. that may be associated with the interoperation
of the two platforms.
• As an example, two data commons that both operate with FISMA Moderate security
and compliance (or more generally follow NIST 800-53) and are operated by two
different NIH Institutes or Centers would, in general, each treat each other as trusted
platforms.
• With this definition, two platforms would directly trust each other. At the end we look
at more general trust relationships among members of a consortium or other larger
organization.
13. 1. Interoperate with other trusted platforms: if another trusted
platform is part of your data ecosystem or wants to create an
ecosystem with you, then interoperate with it.
2. Follow the golden rule of data resources: if you take someone else’s
data, let them have access to your data (assuming you have, or can
establish, a trust relationship with them).
Proposed Operating Principles (Draft 1.5)
14. 3. Support the principle of least restrictive access: Provide another
trusted platform access to your data in the least restrictive manner
possible.
- With rare exceptions, a data resource should provide an API so
that application in other trusted platforms can access data directly.
- If this is not possible due to the sensitivity of your data, then
support the ability for approved queries or analyses to be run over
your data and the results returned. Sometimes this is called an
analysis or query gateway.
Proposed Operating Principles (Draft 1.5)
15. 4. Agree on standards, compete on implementations:
- It is important to open up your ecosystem to competition, less it stagnates.
- What this principle means is that a platform should expose its data and
resources via APIs so that other applications and systems can be part of your
ecosystem.
- It is not necessary for the sponsor of a data resource to necessarily fund
other systems or applications, but it is important not to implicitly create a
monopoly by requiring all users of your data to use a particular application or
system.
- Remember that not all researchers have the same requirements, or the same
preferences, and in general a mix of applications, systems and platforms is
better than requiring the use of a single application or system.
Proposed Operating Principles (Draft 1.5)
16. 5. Support patient partnered research: Support patient partnered
research so that individuals can provide their data and have control
over it within your system. If you cannot do this today, add this to your
platform roadmap.
Proposed Operating Principles (Draft 1.5)
17. Trusted Platforms
• A trust relationship between two resources in a data ecosystem requires agreements
between two organizations about a number of matters, including: security;
compliance; liability; data egress charges; and infrastructure costs.
• For this reason, a formal agreement between two different organizations or a memo
between two different units within an organization or agency is usually required.
• As an example, an Interconnection Security Agreement (ISA) between two platforms
would serve this purpose.
• A consortium of platforms can also sign formal agreements. For example, the Open
Commons Consortium agreements for the BloodPAC Consortium.
Bilateral trust
relationships
Consortium trust
relationships
Federated trust
relationships
Isolated
platform
19. 19
For More Information
Robert L. Grossman, Some Proposed Principles for
Interoperating Data Commons, Medium, October 1, 2019,
http://bit.ly/222QYY
Robert L. Grossman
robert.grossman@uchicago.edu
@BobGrossman