I review three frameworks for analytic operations that are designed to improve the value obtained when deploying analytic models into products, services and internal operations.
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
This is a talk I gave at the Strata Conference and Hadoop World in New York City on October 28, 2013. It describes predictive modeling in the context of modeling an adversary's behavior.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/09/responsible-ai-tools-and-frameworks-for-developing-ai-solutions-a-presentation-from-intel/
Mrinal Karvir, Senior Cloud Software Engineering Manager at Intel, presents the “Responsible AI: Tools and Frameworks for Developing AI Solutions” tutorial at the May 2023 Embedded Vision Summit.
Over 90% of businesses using AI say trustworthy and explainable AI is critical to business, according to Morning Consult’s IBM Global AI Adoption Index 2021. If not designed with responsible considerations of fairness, transparency, preserving privacy, safety and security, AI systems can cause significant harm to people and society and result in financial and reputational damage for companies.
How can we take a human-centric approach to design AI solutions? How can we identify different types of bias and what tools can we use to mitigate those? What are model cards, and how can we use them to improve transparency? What tools can we use to preserve privacy and improve security? In this talk, Karvir discusses practical approaches to adoption of responsible AI principles. She highlights relevant tools and frameworks and explores industry case studies. She also discusses building a well-defined response plan to help address an AI incident efficiently.
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.
This talk was presented at The Molecular Medicine Tri-Conference/Bio-IT West on March 11, 2019.
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
This is a talk I gave at the Strata Conference and Hadoop World in New York City on October 28, 2013. It describes predictive modeling in the context of modeling an adversary's behavior.
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/09/responsible-ai-tools-and-frameworks-for-developing-ai-solutions-a-presentation-from-intel/
Mrinal Karvir, Senior Cloud Software Engineering Manager at Intel, presents the “Responsible AI: Tools and Frameworks for Developing AI Solutions” tutorial at the May 2023 Embedded Vision Summit.
Over 90% of businesses using AI say trustworthy and explainable AI is critical to business, according to Morning Consult’s IBM Global AI Adoption Index 2021. If not designed with responsible considerations of fairness, transparency, preserving privacy, safety and security, AI systems can cause significant harm to people and society and result in financial and reputational damage for companies.
How can we take a human-centric approach to design AI solutions? How can we identify different types of bias and what tools can we use to mitigate those? What are model cards, and how can we use them to improve transparency? What tools can we use to preserve privacy and improve security? In this talk, Karvir discusses practical approaches to adoption of responsible AI principles. She highlights relevant tools and frameworks and explores industry case studies. She also discusses building a well-defined response plan to help address an AI incident efficiently.
A top-down look at current industry and technology trends for Big Data, Data Analytics and Machine Learning (cognitive technologies, AI etc.). New slides added for Ark Group presentation on 1st December 2016.
Explainability for Natural Language ProcessingYunyao Li
Tutorial at AACL'2020 (http://www.aacl2020.org/program/tutorials/#t4-explainability-for-natural-language-processing).
More recent version: https://www.slideshare.net/YunyaoLi/explainability-for-natural-language-processing-249912819
Title: Explainability for Natural Language Processing
@article{aacl2020xaitutorial,
title={Explainability for Natural Language Processing},
author= {Dhanorkar, Shipi and Li, Yunyao and Popa, Lucian and Qian, Kun and Wolf, Christine T and Xu, Anbang},
journal={AACL-IJCNLP 2020},
year={2020}
Presenter: Shipi Dhanorkar, Christine Wolf, Kun Qian, Anbang Xu, Lucian Popa and Yunyao Li
Video: https://www.youtube.com/watch?v=3tnrGe_JA0s&feature=youtu.be
Abstract:
We propose a cutting-edge tutorial that investigates the issues of transparency and interpretability as they relate to NLP. Both the research community and industry have been developing new techniques to render black-box NLP models more transparent and interpretable. Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP researchers, our tutorial has two components: an introduction to explainable AI (XAI) and a review of the state-of-the-art for explainability research in NLP; and findings from a qualitative interview study of individuals working on real-world NLP projects at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability in NLP. Then, we will discuss explainability for NLP tasks and report on a systematic literature review of the state-of-the-art literature in AI, NLP, and HCI conferences. The second component reports on our qualitative interview study which identifies practical challenges and concerns that arise in real-world development projects which include NLP.
Better Living Through Analytics - Strategies for Data DecisionsProduct School
Data is king! Get ready to understand how a successful analytics team can empower managers from product, marketing, and other areas to make effective, data-driven decisions.
Louis Cialdella, a data scientist at ZipRecruiter, shared some case studies and successful strategies that he has used at ZipRecruiter as well as previous experiences. The purpose of this data talk was to enlighten people on how to make sure that analysts can successfully partner with other departments and get them the information they need to do great things.
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Jon Mead
'Machine learning’ is one of those cringy phrases, almost (if not already) taboo in the world of high-tech SaaS. Applying true machine learning to an organization’s product(s), however, can have real benefit for the business, its clients, and the industry as a whole. From credit card fraud investigations to the way that a car is built, machine learning has permeated our everyday life without a common understanding of what it is and how to implement it.
Explainability for Natural Language ProcessingYunyao Li
NOTE: Please check out the final version here with small but important updates and links to downloadable version and recording: https://www.slideshare.net/YunyaoLi/explainability-for-natural-language-processing-249992241
Updated version on our popular tutorial on "Explainability for Natural Language Processing" as a tutorial at KDD'2021.
Title: Explainability for Natural Language Processing
@article{kdd2021xaitutorial,
title={Explainability for Natural Language Processing},
author= {Marina Danilevsky, Dhanorkar, Shipi and Li, Yunyao and Lucian Popa and Kun Qian and Anbang Xu},
journal={KDD},
year={2021}
}
Presenter: Marina Danilevsky, Dhanorkar, Shipi and Li, Yunyao and Lucian Popa and Kun Qian and Anbang Xu
Website: http://xainlp.github.io/
Abstract:
This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable.Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability inNLP. Then, we will discuss explainability for NLP tasks and reporton a systematic literature review of the state-of-the-art literaturein AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2J5SAcT.
Detlef Nauck explains why the testing of data is essential, as it not only drives the machine learning phase itself, but it is paramount for producing reliable predictions after deployment. Testing the decisions made by a deployed machine learning model is equally important to understand if it delivers the expected business value. Filmed at qconlondon.com.
Detlef Nauck is Chief Research Scientist for Data Science with BT's Research and Innovation Division. He is leading a group of scientists working on research into Data Science, ML and AI. He focuses on establishing best practices in DS for conducting analytics professionally and responsibly leading to new ways of analysing data for achieving better insights.
Similar to Some Frameworks for Improving Analytic Operations at Your Company (20)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
Robert L. Grossman, Practical Methods for Identifying Anomalies That Matter in Large Datasets, O’Reilly, Strata + Hadoop World, San Jose, California, February 20, 2015.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
This is a talk I gave at a Northwestern University - Complete Genomics Workshop on April 21, 2011 about using clouds to support research in genomics and related areas.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Some Frameworks for Improving Analytic Operations at Your Company
1. Some Frameworks to Improve Analytic Operations
Robert L. Grossman
Analytic Strategy Partners LLC
& University of Chicago
June 6, 2019
2. 1. Case Study: IBM Oncology*
*This section is adapted from Robert L. Grossman, The Strategy and Practice of Analytics, 2020, to appear.
3. On Aug 11, 2018, the Wall Street Journal reported: “In many cases, [Watson] didn’t
add much value. In some cases, Watson wasn’t accurate. Watson can be tripped up by
a lack of data in rare or recurring cancers, and treatments are evolving faster than
Watson’s human trainers can update the system.”
4. IBM Watson for Oncology
• Based upon the success of IBM Watson winning Jeopardy in 2011, IBM branded a new type
of computing called cognitive computing and launched a new Division of IBM that thus far
has spent over $15B.
• The technology is based upon natural language processing and includes a framework for
answering questions.
• Watson Oncology is critically dependent upon access to cancer data, expert oncologists, and
a smart team of software engineers working for several years.
• Watson Oncology business model: does Watson replace the oncologist, double check the
oncologist, augment the oncologist for difficult cases? IBM charges $200 - $1000 / patient.
• The Wall Street Journal reports: “In many cases, the tools didn’t add much value. In some
cases, Watson wasn’t accurate. Watson can be tripped up by a lack of data in rare or
recurring cancers, and treatments are evolving faster than Watson’s human trainers can
update the system.”
Source: Daniela Hernandez and Ted Greenwald, IBM Has a Watson Dilemma, Wall Street Journal, August 11, 2018.
5. Framework: Develop, Deploy and Extract (DDE)
Develop the model Deploy the model Extract value from the Model
• Build oncology models
using IBM’s cognitive
computing technology.
• Develop models to
determine what type
and sub-type of cancer
does the patient have?
• Is the model
integrated into an
existing system or
stood up as a
separate system?
• Does the model
make a diagnosis
(Dx), recommend a
treatment, or both?
• Does the model
replace the
oncologist making
the diagnosis (Dx)?
• Does the model
double check the
oncologist?
• Does the model
handle difficult
cases?
6. 2. What Are Analytic Operations?
This section is adapted from Robert L. Grossman, Developing an AI Strategy: a Primer, Open Data Press, 2020, available online at analyticstrategy.com
7. Analytic
strategy
Analytic algorithms & models Analytic operations
“Amateurs talk about tactics,
but professionals study
logistics."
- Gen. Robert H. Barrow, USMC
(Commandant of the Marine
Corps)
Amateurs talk about
analytic models but
professionals study
analytic operations and
analytic infrastructure.
Analytic Infrastructure
The Analytic Diamond
This section is adapted from Robert L. Grossman, Developing an AI Strategy:
a Primer, Open Data Press, 2020, available online at analyticstrategy.com
9. Analytic algorithms
and models
Analytic operations
1. How do you deploy the model
into operational systems?
2. How do you quickly detect
drift and update a deployed
model?
3. What actions are associated
with a model and what value
do they provide?
4. Are there “hooks” that can
increase the value of the
actions?
Deploy the model
Extract value
Analytic infrastructure
10. Analytic algorithms
and models
5. Are there segments in which
more specialized actions can
increase the value?
6. How do you measure and report
the value generated to the
product/service owner and
other stakeholders?
7. How do you provide the required
data and model security?
8. How do you provide the required
privacy and compliance?
Analytic infrastructure
Analytic operations
Extract value
Protect
11. Get the data, set up
the infrastructure,
put in place the
compliance and
security, etc.
Analyze &
model the
data.
Deploy the solution with
the model in a manner that
provides value to the
organization.
Time
Effort
Get the data Build a
model
Deploy the model
12. 3. Scores, Actions and Measures—
The SAM Framework*
This section is adapted from Robert L. Grossman, Developing an AI Strategy: a Primer, Open Data Press, 2020, available online at analyticstrategy.com
13. Scores vs Actions
Score Action
Credit Likelihood to default Do you offer a prescreened card? If so,
what is the credit line and interest rate?
Response Likelihood to response
to an offer
Which ad from an inventory do I offer
and how many impressions?
Hospital
readmission
Likelihood to be
readmitted to a hospital
within 90 days
Delay release from hospital; follow up
after release; etc.
Election
model
Likelihood of support
(which candidate do you
support). Likelihood of
voting.
If support is high, ask for $ or to
volunteer. If likelihood is below a
threshold, help them make a plan to get
out to vote (GOTV)
15. Presidential Campaigns
• Goal: Win 270 electoral votes
• Three models
o Support: Prob that a vote is for your candidate?
o Persuasion: Can someone be persuaded
to vote for Candidate?
o Turnout: Will someone vote?
• Actions
o Send email, knock on their door,
o Ask for dollars, ask them to volunteer
(if they are likely to vote for your candidate)
o Help them make a plan to get out and vote
o Have them talk to their Facebook friends (build FB apps)
o Target actions around specific events in specific states
16. Framework: Segment by Score, Action by Cell (SSAC)
Low High
Low
High
Support for
the candidate
Likelihood of
turnout
Persuade
Ignore
actions associated
with each cell
segment by scores
Help with
GOTV plans
Ask for $, get
to volunteer
17. 4. Ways to Deploy Models*
*This section is adapted from Robert L. Grossman, The Strategy and Practice of Analytics, 2020, to appear.
18. Exploratory Data Analysis
Get and
clean the data
Build model in
dev/modeling environment
Initial deployment
Use champion-challenger
methodology to improve
model
Analytic modeling
Analytic operations
Deploy
model
Retire model and deploy
improved model
Select analytic
problem &
approach
Scale up
deployment
ModelDev
AnalyticOps
Perf.
data
Data Scientists
Enterprise IT
Life cycle of a model
*Source: Robert L. Grossman, The Strategy and Practice of Analytics, 2020, to appear.
19. The Five Main Approaches (E3RW)
1. Embed analytics in databases
2. Scoring Engines (import/export
models)
3. Encapsulate models using
containers (and virtual machines)
4. Read a table of parameters
5. Wrap algo code or analytic system
(and perhaps create a service)
Approaches (E3RW)
• Use languages for analytics, such as
PMML and PFA & analytic engines
• Use languages for workflows, such
as CWL & workflow engines
• Use containers and container-
orchestration systems for
automating software deployment
and scale out, such as Docker &
Kubernetes
Techniques
21. Summary
• It’s important to understand the differences between the data
scientists building the model, the enterprise IT team deploying the
model, and the product team ensuring the model provides value.
• We introduced three frameworks:
o Develop, Deploy and Extract (DDE) Framework
o The Scores-Actions-Measures (SAM) Framework
o The Segment by Score and Action by Cells (SSAC) Framework
• AnalyticOps: i) Deploying models and workflows with Analytic Engines
(PMML & PFA); ii) Analytic Containers with Software Deployment
Automation Environments (e.g. Kubernetes).
• We discussed the IBM Watson for Oncology case study.
22. Develop, Deploy and Extract (DDE) Framework
Scores
Actions
Measures of
the Responses
Estimate the value provided
by the model
Find “hooks” and strategies to
increase the value
Gain consensus on the
value and communicate it
Train the model
Validate the model
Package the model
Develop the model Deploy the model Extract value from the Model
23. Questions
Additional information about some of the topics discussed here can be found in my book: Developing an AI
Strategy: a Primer, Open Data Press, 2020, available online at analyticstrategy.com
25. Abstract: There is a lot of information and best practices available so data scientists
can build analytic models, but much less about how analytic models can best be
integrated into a company's products, services or operations, which we call analytic
operations. We describe three frameworks so that a company or organization can
improve its analytic operations and explain the frameworks using case studies.
About RLG: Robert L. Grossman is a Partner at Analytic Strategy Partners LLC, which
he founded in 2016. From 2002-2015, he was the Founder and Managing Partner at
Open Data Group, which built and deployed predictive models over big data for
Fortune 500 companies. He is also the Frederick H. Rawson Distinguished Service
Professor of Medicine and Computer Science and the Jim and Karen Frank Director
of the Center for Translational Data Science (CTDS) at the University of Chicago.