Digital biomarkers for preventive personalised healthcarePaolo Missier
A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
brief overview of the ReComp project (http://recomp.org.uk) on Selective recurring re-computation of complex analytics, and a brief outlook for the P4@NU project on seeking digital biomarkers for age-0related metabolic diseases
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
talk for paper published at ICWE2019:
Primo F, Missier P, Romanovsky A, Mickael F, Cacho N. A customisable pipeline for continuously harvesting socially-minded Twitter users. In: Procs. ICWE’19. Daedjeon, Korea; 2019.
Digital biomarkers for preventive personalised healthcarePaolo Missier
A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
brief overview of the ReComp project (http://recomp.org.uk) on Selective recurring re-computation of complex analytics, and a brief outlook for the P4@NU project on seeking digital biomarkers for age-0related metabolic diseases
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
talk for paper published at ICWE2019:
Primo F, Missier P, Romanovsky A, Mickael F, Cacho N. A customisable pipeline for continuously harvesting socially-minded Twitter users. In: Procs. ICWE’19. Daedjeon, Korea; 2019.
Machine Learning for Medical Image Analysis:What, where and how?Debdoot Sheet
A great career advice for EECS (Electrical, electronics and computer science) graduates interested in machine vision and some advice for a PhD career in Medical Image Analysis.
In this work, we describe the field research, design, and comparative deployment of a multimodal medical imaging user interface for breast screening. The main contributions described here are threefold: 1) The design of an advanced visual interface for multimodal diagnosis of breast cancer (BreastScreening); 2) Insights from the field comparison of Single-Modality vs Multi-Modality screening of breast cancer diagnosis with 31 clinicians and 566 images; and 3) The visualization of the two main types of breast lesions in the following image modalities: (i) MammoGraphy (MG) in both Craniocaudal (CC) and Mediolateral oblique (MLO) views; (ii) UltraSound (US); and (iii) Magnetic Resonance Imaging (MRI).
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEYijdpsjournal
Breast cancer has become a common factor now-a-days. Despite the fact, not all general hospitals
have the facilities to diagnose breast cancer through mammograms. Waiting for diagnosing a breast
cancer for a long time may increase the possibility of the cancer spreading. Therefore a computerized
breast cancer diagnosis has been developed to reduce the time taken to diagnose the breast cancer and
reduce the death rate. This paper summarizes the survey on breast cancer diagnosis using various machine
learning algorithms and methods, which are used to improve the accuracy of predicting cancer. This survey
can also help us to know about number of papers that are implemented to diagnose the breast cancer.
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIScsandit
Clinical documents are a repository of information about patients' conditions. However, this
wealth of data is not properly tapped by the existing analysis tools. Dengue is one of the most
widespread water borne diseases known today. Every year, dengue has been threatening lives
the world over. Systems already developed have concentrated on extracting disorder mentions
using dictionary look-up, or supervised learning methods. This project aims at performing
Named Entity Recognition to extract disorder mentions, time expressions and other relevant
features from clinical data. These can be used to build a model, which can in turn be used to
predict the presence or absence of the disease, dengue. Further, we perform a frequency
analysis which correlates the occurrence of dengue and the manifestation of its symptoms over
the months. The system produces appreciable accuracy and serves as a valuable tool for
medical experts.
GET IEEE BIG DATA,JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
Assessing Effectiveness of Information Presentation Using Wearable Augmented ...CSCJournals
Technological intervention that supports data transfer of sending summary of the patient vitals through the transfer of care would be a great benefit to the trauma care department. This paperfocuses on presenting the effectiveness of information presentation on using wearable augmented reality devices to improve human decision making during transfer of care for surgicaltrauma, and to improve user experience and reduce cognitive workload. The results of this experiment can make significant contributions to design guidelines for information presentation on small form factors especially in time critical decision-making scenarios.This could potentially help medical responders in the trauma care center to prepare for treatment materials such asmedicines, diagnostic procedures, bringing in specialized doctors or consulting the advice of experienced doctors and calling in support staff as required, and so on.
di Riccardo Bellazzi
Università di Pavia
ICS Maugerio Pavia
Slide per l'incontro dal titolo "Big data, machine learning e medicina di precisione."
10 maggio 2018, Milano, Fondazione Giannino Bassetti
Video integrale: https://www.fondazionebassetti.org/it/focus/2018/08/big_data_machine_learning_e_me.html
Machine Learning for Medical Image Analysis:What, where and how?Debdoot Sheet
A great career advice for EECS (Electrical, electronics and computer science) graduates interested in machine vision and some advice for a PhD career in Medical Image Analysis.
In this work, we describe the field research, design, and comparative deployment of a multimodal medical imaging user interface for breast screening. The main contributions described here are threefold: 1) The design of an advanced visual interface for multimodal diagnosis of breast cancer (BreastScreening); 2) Insights from the field comparison of Single-Modality vs Multi-Modality screening of breast cancer diagnosis with 31 clinicians and 566 images; and 3) The visualization of the two main types of breast lesions in the following image modalities: (i) MammoGraphy (MG) in both Craniocaudal (CC) and Mediolateral oblique (MLO) views; (ii) UltraSound (US); and (iii) Magnetic Resonance Imaging (MRI).
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS –A SURVEYijdpsjournal
Breast cancer has become a common factor now-a-days. Despite the fact, not all general hospitals
have the facilities to diagnose breast cancer through mammograms. Waiting for diagnosing a breast
cancer for a long time may increase the possibility of the cancer spreading. Therefore a computerized
breast cancer diagnosis has been developed to reduce the time taken to diagnose the breast cancer and
reduce the death rate. This paper summarizes the survey on breast cancer diagnosis using various machine
learning algorithms and methods, which are used to improve the accuracy of predicting cancer. This survey
can also help us to know about number of papers that are implemented to diagnose the breast cancer.
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIScsandit
Clinical documents are a repository of information about patients' conditions. However, this
wealth of data is not properly tapped by the existing analysis tools. Dengue is one of the most
widespread water borne diseases known today. Every year, dengue has been threatening lives
the world over. Systems already developed have concentrated on extracting disorder mentions
using dictionary look-up, or supervised learning methods. This project aims at performing
Named Entity Recognition to extract disorder mentions, time expressions and other relevant
features from clinical data. These can be used to build a model, which can in turn be used to
predict the presence or absence of the disease, dengue. Further, we perform a frequency
analysis which correlates the occurrence of dengue and the manifestation of its symptoms over
the months. The system produces appreciable accuracy and serves as a valuable tool for
medical experts.
GET IEEE BIG DATA,JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
Assessing Effectiveness of Information Presentation Using Wearable Augmented ...CSCJournals
Technological intervention that supports data transfer of sending summary of the patient vitals through the transfer of care would be a great benefit to the trauma care department. This paperfocuses on presenting the effectiveness of information presentation on using wearable augmented reality devices to improve human decision making during transfer of care for surgicaltrauma, and to improve user experience and reduce cognitive workload. The results of this experiment can make significant contributions to design guidelines for information presentation on small form factors especially in time critical decision-making scenarios.This could potentially help medical responders in the trauma care center to prepare for treatment materials such asmedicines, diagnostic procedures, bringing in specialized doctors or consulting the advice of experienced doctors and calling in support staff as required, and so on.
di Riccardo Bellazzi
Università di Pavia
ICS Maugerio Pavia
Slide per l'incontro dal titolo "Big data, machine learning e medicina di precisione."
10 maggio 2018, Milano, Fondazione Giannino Bassetti
Video integrale: https://www.fondazionebassetti.org/it/focus/2018/08/big_data_machine_learning_e_me.html
Detection of myocardial infarction on recent dataset using machine learningIJICTJOURNAL
In developing countries such as India, with a large aging population and limited access to medical facilities, remote and timely diagnosis of myocardial infarction (MI) has the potential to save the life of many. An electrocardiogram is the primary clinical tool utilized in the onset or detection of a previous MI incident. Artificial intelligence has made a great impact on every area of research as well as in medical diagnosis. In medical diagnosis, the hypothesis might be doctors' experience which would be used as input to predict a disease that saves the life of mankind. It is been observed that a properly cleaned and pruned dataset provides far better accuracy than an unclean one with missing values. Selection of suitable techniques for data cleaning alongside proper classification algorithms will cause the event of prediction systems that give enhanced accuracy. In this proposal detection of myocardial infarction using new parameters is proposed with increased accuracy and efficiency of the existing model. Additional parameters are used to predict MI with more accuracy. The proposed model is used to predict an early diagnosis of MI with the help of expertise experiences and data gathered from hospitals.
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Jake Chen
: COVID-19 has profoundly impacted all our lives. Not all such impacts in science are negative. For example, how we adapt to online learning, remote mentorship, and online teamwork may become new “norms” of future scientific collaborations, breaking down institutional boundaries to communication. The COVID-19 pandemic has united the scientific community more than ever, through more than 3600 clinical trials, 60,000 peer-reviewed publications, 80,000 SARS-CoV-2 genome sequences, 100,000 COVID-19 open software tools, and a global community of scientists, with which all of us are working hard to find epidemiological patterns, diagnosis, therapeutics, and vaccines in a “War Against COVID-19”. In this talk, I will define and characterize data-driven medicine primarily through my personal journey in the past ten months, having witnessed the rapid “weaponizing of data science tools” in our community’s fight against COVID-19 (including ours, at http://covid19.ubrite.org/). I will review up-to-date COVID-19 literature, especially those related to how biomedical informatics, data science, and artificial intelligence have been applied in accelerating COVID-19 breakthrough discoveries, from basic research to clinical practice. I will end by sharing my thoughts on how the future of medicine in cancer and other translational areas can benefit from the proactive incorporation of new “data science engines.”
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
a brief intro on the data challenges associated with working with Health Care data, with a few examples, both from literature and our own, of traditional approaches (Latent Class Analysis, Topic Modelling) and a perspective on Language-based modelling for Electronic Health Records (EHR).
probably more references than actual content in here!
Realising the potential of Health Data Science:opportunities and challenges ...Paolo Missier
A guest lecture given to a group of healthcare professionals as part of an Information Management course at Newcastle University, on working with healthcare data to generate disease risk prediction models
Predictive Analytics and Machine Learning for Healthcare - DiabetesDr Purnendu Sekhar Das
Machine Learning on clinical datasets to predict the risk of chronic disease conditions like Type 2 Diabetes mellitus beforehand; as well as predicting outcomes like hospital readmission using EMR RWE data.
Proposed Model for Chest Disease Prediction using Data Analyticsvivatechijri
Chest diseases if not properly diagnosed in early stages can be fatal. Because of lack of skilled
knowledge or experiences of real life practitioners, many a times one chest disease is wrongly diagnosed for the
other, which leads to wrong treatment. Due to this the actual disease keeps on growing and become fatal. For
example, muscular chest pains can be treated for the heart disease or COPD is treated for Asthma. Early
prediction of chest disease is crucial but is not an easy task. Consequently, the computer based prediction system
for chest disease may play a significant role as a pre-stage detection to take proper actions with a view to recover
from it. However the choice of the proper Data Mining classification method can effectively predict the early
stage of the disease for being cured from it. In this paper, the three mostly used classification techniques such as
support vector machine (SVM), k-nearest neighbour (KNN) and artificial neural network (ANN) have been studied
with a view to evaluating them for chest disease prediction.
Will Yu of Lumiata provides an overview of using real-time big analytics with ever-learning graph combining hundreds of healthcare data sets. Presented at YTH Live 2014 plenary session "Mapping Big Data, Infographics and other Good Stuff."
K-Nearest Neighbours based diagnosis of hyperglycemiaijtsrd
AI or artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction. As a result, Artificial Intelligence is gaining Importance in science and engineering fields. The use of Artificial Intelligence in medical diagnosis too is becoming increasingly common and has been used widely in the diagnosis of cancers, tumors, hepatitis, lung diseases, etc... The main aim of this paper is to build an Artificial Intelligent System that after analysis of certain parameters can predict that whether a person is diabetic or not. Diabetes is the name used to describe a metabolic condition of having higher than normal blood sugar levels. Diabetes is becoming increasingly more common throughout the world, due to increased obesity - which can lead to metabolic syndrome or pre-diabetes leading to higher incidences of type 2 diabetes. Authors have identified 10 parameters that play an important role in diabetes and prepared a rich database of training data which served as the backbone of the prediction algorithm. Keeping in view this training data authors developed a system that uses the artificial neural networks algorithm to serve the purpose. These are capable of predicting new observations (on specific variables) from previous observations (on the same or other variables) after executing a process of so-called learning from existing training data (Haykin 1998).The results indicate that the performance of KNN method when compared with the medical diagnosis system was found to be 91%. This system can be used to assist medical programs especially in geographically remote areas where expert human diagnosis not possible with an advantage of minimal expenses and faster results. Abid Sarwar"K-Nearest Neighbours based diagnosis of hyperglycemia" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-1 , December 2017, URL: http://www.ijtsrd.com/papers/ijtsrd7046.pdf http://www.ijtsrd.com/computer-science/artificial-intelligence/7046/k-nearest-neighbours-based-diagnosis-of-hyperglycemia/abid-sarwar
A 45minute talk on the basics of Web 2, IT and medicine, particularly focussing on Web 2 tools that can be used by doctors and patients. Also a brief look at accessing these and other tools via portable means, demonstrated with my iPhone.
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...DATAVERSITY
In this presentation, our speaker, Dr. Michel Dumontier, will explore the use of Semantic Web technologies to reduce the overwhelming burden of integrating clinical data with public biomedical data, and enabling a new generation of translational research and their clinical application.
Background: The digital twin paradigm holds great promise for healthcare, most importantly efficiently integrating many disparate healthcare data sources and servicing complex tasks like personalizing care, predicting health outcomes, and planning patient care, even though many technical and scientific challenges remain to be overcome. Objective: As part of the QUALITOP project, we conducted a comprehensive analysis of diverse healthcare data, encompassing both prospective and retrospective datasets, along with an in-depth examination of the advanced analytical needs of medical institutions across five European Union countries. Through these endeavors, we have systematically developed and refined a formal Personal Medical Digital Twin (PMDT) model subjected to iterative validation by medical institutions to ensure its applicability, efficacy, and utility. Findings: The PMDT is based on an interconnected set of expressive knowledge structures that are calibrated to capture an individual patient’s psychosomatic, cognitive, biometrical and genetic information in one personal digital footprint in a manner that allows medical professionals to run various models to predict an individual’s health issues over time and intervene early with personalized preventive care.Conclusion: At the forefront of digital transformation, the PMDT emerges as a pivotal entity, positioned at the convergence of Big Data and Artificial Intelligence. This paper introduces a PMDT environment that lays the foundation for the application of comprehensive big data analytics, continuous monitoring, cognitive simulations, and AI techniques. By integrating stakeholders across the care continuum, including patients, this system enables the derivation of insights and facilitates informed decision-making for personalized preventive care.
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
Talk at GenomeTrakr network meeting Sept 23 2015 in Washington DC. On Canada's open source Integrated Rapid Infectious Disease Analysis (IRIDA) bioinformatics platform - aiding genomic epidemiology analysis for public health agencies with planned open data release and linkage to GenomeTrakr. Discussed perspectives, challenges, solutions for getting more GenomeTrakr participation internationally.
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
A talk given at the DATAPLAT workshop, co-located with the IEEE ICDE conference (May 2024, Utrecht, NL).
Data Provenance for Data Science is our attempt to provide a foundation to add explainability to data-centric AI.
It is a prototype, with lots of work still to do.
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
In this presentation, given to graduate students at Universita' RomaTre, Italy, we suggest that concepts well-known in Data Provenance can be exploited to provide explanations in the context of data-centric AI processes. Through use cases (incremental data cleaning, training set pruning), we build up increasingly complex provenance patterns, culminating in an open question:
how to describe "why" a specific data item has been manipulated as part of data processing, when such processing may consist of a complex data transformation algorithm.
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
A talk given at the BDA4HM workshop, IEEE BigData conference, Dec. 2023
please see paper here:
https://drive.google.com/file/d/1vN08G0FWxOSH1Yeak5AX6a0sr5-EBbAt/view
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
Momentum has been growing into research to better understand the dynamics of multiple long-term conditions-multimorbidity (MLTC-M), defined as the co-occurrence of two or more long-term or chronic conditions within an individual. Several research efforts make use of Electronic Health Records (EHR), which represent patients' medical histories. These range from discovering patterns of multimorbidity, namely by clustering diseases based on their co-occurrence in EHRs, to using EHRs to predict the next disease or other specific outcomes. One problem with the former approach is that it discards important temporal information on the co-occurrence, while the latter requires "big" data volumes that are not always available from routinely collected EHRs, limiting the robustness of the resulting models. In this paper we take an intermediate approach, where initially we use about 143,000 EHRs from UK Biobank to perform time-independent clustering using topic modelling, and Latent Dirichlet Allocation specifically. We then propose a metric to measure how strongly a patient is "attracted" into any given cluster at any point through their medical history. By tracking how such gravitational pull changes over time, we may then be able to narrow the scope for potential interventions and preventative measures to specific clusters, without having to resort to full-fledged predictive modelling. In this preliminary work we show exemplars of these dynamic associations, which suggest that further exploration may lead to On behalf of the AI-MULTIPLY consortium. Funded by NIHR AIM Development grant to AI-MULTIPLY actionable insights into patients' medical trajectories.
Digital biomarkers for preventive personalised healthcarePaolo Missier
A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
a talk given at the VLDB 2021 conference, August, 2021, presenting our paper:
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier, P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507–520, January, 2021.
http://doi.org/10.14778/3436905.3436911
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Paolo Missier
a talk given at the 2nd IEEE Blockchain conference, Atlanta, US ?july 2019.
here is the paper: http://homepages.cs.ncl.ac.uk/paolo.missier/doc/Decentralised_Marketplace_USA_Conference___Accepted_Version_.pdf
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Data Science for (Health) Science:tales from a challenging front line, and how to cross a few T's
1. Data Science for (Health) Science:
tales from a challenging front line, and how to cross a few T's
Paolo Missier
School of Computing
Newcastle University, UK
March 2021
A talk given to
The School of Information Sciences
Center for Informatics Research in Science and Scholarship
University of Illinois Urbana-Champaign
paolo.missier@ncl.ac.uk
LinkedIn: paolomissier
Twitter: @PMissier
2. 2
The message:
1. “Data Science” for Health is hard. The hard part is the data
2. “AI for Health” is (Deep) Machine Learning
3. Ethics. Fairness. Trust. Acceptance.
4. Data Provenance for Data Science: Solution or distraction?
• Transparency
• Trustworthiness
• Traceability
4. 4
AI for healthcare – the UK landscape
https://www.turing.ac.uk/research/research-programmes/health-and-medical-sciences
AI and data science will improve the detection, diagnosis, and treatment of
illness. They will optimise the provision of services, and support health service
providers to anticipate demand and deliver improved patient care.
• Explainability / Interpretability
• Exploiting EHR (Electronic Health Records)
• Image interpretation
• Fairness, Bias
• Ethical issues in …
• Predicting <disease / critical event> …
5. 5
Personalised, Predictive, Preventive, Participatory Medicine (P4)
Price ND, Magis AT, Earls JC, et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds.
Nat Biotechnol. 2017;35:747.
6. 6
(*) Data-Driven, Personalised, Predictive, Preventive, Participatory
D2P4 (*)
Healthcare
research
• Cleaning
• Integration
• Alignment
• Imputation
• NLP
• …
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
Image interpretation / Deep Learning
… “AI”…
(plus traditional statistics!)
7. 7
Big Data for Health Care
Genomics for
personalized medicine
personal monitors /
wearables
Medical Records
Article Source: Big Data: Astronomical or Genomical?
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, et al. (2015) Big Data: Astronomical or Genomical?. PLOS Biology 13(7):
e1002195. https://doi.org/10.1371/journal.pbio.1002195
8. 9
D2P4 Accelerometry
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
• Cleaning
• Integration
• Alignment
• Imputation
• NLP
• …
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
9. 10
Digital biomarkers
Digital biomarkers come from "novel sensing systems capable of continuously tracking
behavioral signals […] capture people's everyday routines, actions, and physiological
changes that can explain outcomes related to health, cognitive abilities, and more”
(Choudhury 2018).
Choudhury, Tanzeem. 2018. “Making Sleep Tracking More User Friendly.” Communications of the ACM 61 (11): 156–156.
https://doi.org/10.1145/3266285
- physical activity
- glucose levels
- blood oxygen
levels
- …
Inexpensive scalable personalised self-monitoring
10. 11
A first project: markers from accelerometers?
Initial study Digital biomarkers + UK Biobank Dataset + Type 2 Diabetes outcome
- physical activity
- glucose levels
- blood oxygen levels
- …
Aligned with the P4 agenda
Readily available dataset
(+) 3,500+ features
(+) multi-omics coverage
(+) genomics
(+) links to EHR
(+) Activity monitors made in Newcastle!
(-) Limited follow-ups – little longitude
(-) Population not random
(-) Activity data / person very limited
100K
Activity traces
11. 13
Using wearable activity trackers to predict Type-2 Diabetes
Objective: To determine the extent to which accelerometer traces can be used to distinguish individuals with
Type-2 Diabetes (T2D) from normoglycaemic controls, and to quantify their limitations.
Lam B, Catt M, Cassidy S, Bacardit J, Darke P, Butterfield S, Alshabrawy O, Trenell M, Missier P
Using Wearable Activity Trackers to Predict Type 2 Diabetes: Machine Learning–Based Cross-sectional Study of the UK Biobank Accelerometer
Cohort -- JMIR Diabetes. 20/01/2021:23364 (forthcoming/in press)
Feature
extraction
Clustering
Classification
??
13. 15
Filter:
Accelerometry study?
103,712
Split criteria:
Type 2 Diabetes?
At baseline: 2,755
Through EHR analysis: 1,321
Total: 4,076
Non-Diabetes
99,636
Filter:
EHR data available?
19,852
502, 664
All UK Biobank participants:
Filter:
QC on activity traces
3,103
Positives:
T2D vs Norm-0
Physical Impairment analysis
Severe impairment
1,666
No impairment
8,463
A great UG project!
your (biomedical) dataset may not be as big as it looks
T2D vs Norm-1
15. 17
Ongoing work
Are there better embedded representations for acceleremetry data?
Can they be used as predictors for other outcomes?
Representation learning
Embedded
feature space
LSTM Autoencoder
Outcome:
Insulin sensitivity
DIRECT
DB
Standard classification
16. 19
D2P4 COVID
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
17. D. Ferrari1, Prof. F. Mandreoli1, Prof. G. Guaraldi2
Prof. P. Missier
Predicting respiratory failure in patients with COVID-19
pneumonia: a case study from Northern Italy
Peak of Italian Covid crisis (March 2020 onwards)
Issue: ICU Capacity
Question: will my next patient require ICU resources? How soon?
(1)
(2)
Machine learning in predicting respiratory failure in patients with COVID-19 pneumonia—Challenges, strengths, and opportunities in a global health
emergency
Ferrari D, Milic J, Tonelli R, Ghinelli F, Meschiari M, et al. (2020) Machine learning in predicting respiratory failure in patients with COVID-19 pneumonia—Challenges,
strengths, and opportunities in a global health emergency. PLOS ONE 15(11): e0239172. https://doi.org/10.1371/journal.pone.0239172
18. 21
Study structure
Applied Machine Learning driven by a clinical question
An example of typical data science pattern:
• Data selection inclusion, exclusion criteria
• Data preparation / cleaning
• Variable selection
• Model learning multiple models
• Model evaluation
With additional challenges:
“Live” evolving dataset with multiple versions of a patients database
• changes in recording practices
• Inconsistencies
• Lots of missing data
Small data: 198 patients 1068 observations 31-90 variables (symptoms, lab biomarkers)
In the data collection period, the dataset
was growing daily with the average of 84
new records per day, with a mean of 10 new
data points/patient.
out of the initial sample of 295 patients
and 2,889 data points available, 198
patients contributed to generate 1068
valuable observations. In detail, 603
observations contributed to the
definition of respiratory failure (PaO2/
FiO2 < 150 mmHg) and 465 did not
meet this definition.
Each data point included a complex record of observations
from multiple categories: (1) signs and symptoms, (2) blood
biomark- ers, (3) respiratory assessment with PaO2/FiO2, (4)
history of comorbidities (available in a sub- set of 119
patients). Some variables were collected daily, and others
were recorded upon clinical indications.
20. 24
Modelling Requirements
• Parsimonious few variables
• Robust to missing data imputation not an option
• Explainable Trust
• model reveals the relative importance of each variable for each prediction it
makes
• Minimize the number of false negatives
• risk of under-estimating the severity of a patient’s condition
21. 26
Approach
• Parsimonious feature ranking and selection
• Robust to missing data
• Explainable Shapley values
• Minimize FN bespoke loss function
Ensemble of Decision trees
22. 27
Testing multiple models - Results
Parsimony:
Model 1 - suboptimal prediction accuracy
Model 2:
Adding biomarkers including respiratory variables increased performance
Model 3:
boosted mixed model - still requires about 20 variables
From a physician’s perspective, a cluster of 20 variables may be difficult to manage in routine clinical practice.
What our approach offers in support to the decision-making process is a simple interpretation of the predictions.
24. 29
Summary
Good results on “live” data, predicting a useful outcome for the purpose of ICU management
Major selling points:
• Variables (relatively) easy to collect in routine visits and in-hospital
• Models are explainable, medics can reality-check against their own understanding
… Opened the door to further collaborations:
New project on PACS: Post-Acute Covid Syndrome:
Following up recovery paths for 300 patients across 5 hospitals
25. 30
D2P4 EHR analysis for dynamic risk prediction
D2P4 (*)
Healthcare
research
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
Survival analysis
Longitudinal prediction models
27. 32
Clinical Risk Prediction Models
Healthy participant or
missing data/under-
reported conditions?
Number/pattern of
records is a proxy
for health?
Informed presence bias
Individuals in EHR data are systematically different to those who are not (Goldstein et al, 2016)
28. 36
Case study: Type 2 Diabetes
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
30
40
50
60
70
80
Pre⌧diabetes
D
iabetes
R
em
ission
G
lycated
hem
oglobin
HbA
1c
(m
m
ol/m
ol)
Participant:
R
ED
A
C
T
ED
●
●
●
●
4
6
8
10
12
●
Prim
ary
care
records
U
K
B
B
visit
●
●
●
N
orm
oglycaem
ic
Pre⌧diabetic
D
iabetic
Fasting
plasm
a
glucose
(m
m
ol/l)
P r im a r y ca r e
Se
con d a r y ca r e
E v e
n t
O b s
D r u g
D ia g
O p
1987
(age
X
)
1991
(age
X
)
1995
(age
X
)
1999
(age
X
)
2003
(age
X
)
2007
(age
X
)
2011
(age
X
)
2015
(age
X
)
Estim
ated
observation
period
R
ecord
D
iabetes
record
Electronic
health
records
Figure
17:
Example
output
of
the
phenotyping
tool.
39
29. 37
Case study: Type 2 Diabetes – remission study
Type 2 diabetes remission
Longitudinal phenotyping with large–scale observational data
Philip Darke
EPSRC Centre for Doctoral
Training in Cloud Computing for
Big Data Newcastle University
UK Biobank is a UK–based prospective study into illness in mid- ukbiobank.ac.uk
dle and old age with over 500,000 participants. Diabetes is one of
the most prevalent conditions in the cohort with nearly 70,000 diag-
noses2 expected by 2027. Study data is collected at participant visits 2
Naomi Allen, et al. UK Biobank:
Current status and what it means
for epidemiology. Health Policy and
Technology, 1(3):123–126, September
2012. doi : 10.1016/ j.hlpt.2012.07.003
and via linkage to national datasets including EHR data. These data
have been used to longitudinally phenotype over 200,000 partici-
pants for diabetes as illustrated in figure 1. The approach will be
expanded to all participants when further data is released.
●
●
● ●
● ●
● ● ●
● ●
30
40
50
60
HbA1c
(mmol/mol)
Pre−diabetes Type 2 diabetes Remission
●
● ●
● ●
●
●
● ● ● ●
● ● ●
●
70
80
90
100
Weight
(kg)
Biguanides
12.5
15.0
17.5
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Figure 1: Model output showing
HbA1c, weight, periods of medication
and inferred diabetic status for an
example participant. Long–term
remission was achieved by sustained
weight loss post diagnosis.
Many of those diagnosed with type 2 diabetes experience a sub-
sequent period of remission. Some relapse whilst others achieve
long–term remission and cease anti–diabetes medication. This
project will examine the pathways to remission at scale using ob-
30. 38
D2P4 MLTC-M
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multiple Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
NLP
31. 39
<event
name>
Multimorbidity and Long-Term Conditions
Patients with multimorbidities have the greatest healthcare needs and generate the
highest expenditure in the health system.
There is an increasing focus on identifying specific disease combinations for
addressing poor outcomes.
Matrix factorization / factor analysis
Clustering
Multiple correspondence analysis
Network analysis
…
Which data?
Fragmented / disconnected data sources
Data access
Data governance
32. 40
D2P4 NAFLD / non-alcohol fatty liver disease
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
• Cleaning
• Integration
• Alignment
• Imputation
• NLP
• …
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
Image interpretation / Deep Learning
… “AI”…
33. 41
D2P4 NAFLD / NASH
NASH = non-alcoholic steatohepatitis
Aims:
- integrate cross-sectional and longitudinal outcomes clinical data with
a multi-dimensional ‘omics’ record
- Hypothesis: a precision medicine approach leads to better
understanding of individuals’ trajectories
- Personalised biomarkers liquid biopsy
Dataset: European NAFLD Registry
7,750 patients with histologically proven NAFLD/NASH
- Omics (cross-sectional)
- Longitudinal follow ups
Methods:
- Precision: clustering
- Anticipating progression: Learn cluster-specific longitudinal models
34. 42
DP4DS: Data Provenance for Data Science
D2P4
+
DP4DS(*)
Physical Activity monitoring
(wearables)
In-patient hospital records
Primary care health records
+ prescriptions
Clinical protocols
Multi-omics
(genomics, transcriptomics,
proteomics, metabolomics…)
Images -- Histology, X-ray, …
Early detection of Type 2 Diabetes /
Metabolic / age-related diseases
Early detection of Parkinson’s
Frailty / intrinsic capacity assessment
Multi Morbidity Long Term Conditions (MLTC)
Covid risk / Post-Acute Covid Syndrome (PACS)
Liver disease progression: NAFLD / NASH
Liquid biopsy
Programming:
Scripting: python, R, …
Workflows: Knime, RapidMiner..
Methods
Clustering (ML)
Predictive modelling (ML)
Image interpretation / Deep Learning
… “AI”…
(plus traditional statistics!)
35. 43
Data Model Predictions
Model
pre-processing
Raw
datasets
features
Predicted you:
- Ranking
- Score
- Class
Data
collection
Instances
Key decisions are made during data selection and
processing:
- Where does the data come from?
- What’s in the dataset?
- What transformations were applied?
Complementing current ML approaches to model interpretability
1. Can we explain these decisions?
2. Are these explanations useful?
36. 44
Explaining data preparation
Data
collection
Model
Population data pre-processing
Raw
datasets
features
Predicted you:
- Ranking
- Score
- Class
- Integration
- Cleaning
- Outlier removal
- Normalisation
- Feature selection
- Class rebalancing
- Sampling
- Stratification
- …
Data acquisition and wrangling:
- How were datasets acquired?
- How recently?
- For what purpose?
- Are they being reused /
repurposed?
- What is their quality?
Instances
- Scripts Python / TensorFlow, Pandas, Spark
- Workflows Knime, …
Provenance Transparency
37. 46
Recent early results
A small grassroots project… [1]
- Formalisation of provenance patterns for pipeline operators
- Systematic collection of fine-grained provenance from (nearly) arbitrary pipelines
- Reality check:
- How much does it cost? provenance volume
- Does it help? queries against the provenance database
[1]. Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science. Chapman, A., Missier,
P., Simonelli, G., & Torlone, R. PVLDB, 14(4):507-520, January, 2021.
38. 47
Operators
14/03/2021 03_ b _c .
:///U / 65/D a /03_ b _c . 1/1
14/03/2021 03_ b _c .
:///U / 65/D a /03_ b _c . 1/1
op
Data reduction
- Feature selection
- Instance selection
Data augmentation
- Space transformation
- Instance generation
- Encoding (eg one-hot…)
Data transformation
- Data repair
- Binarisation
- Normalisation
- Discretisation
- Imputation
Ex.: vertical augmentation adding columns
39. 48
Code instrumentation
Create a provlet for
a specific
transformation
Initialize provenance
capture
…code injection is now being automated!
47. 56
Summary
Multiple hypotheses regarding Data Provenance for Data Science:
1. Is it practical to collect fine-grained provenance?
1. To what extent can it be done automatically?
2. How much does it cost?
2. Is it also useful? what is the benefit to data analysts?
Work in progress! Interest? Ideas?
48. 57
Acknowledgments
Prof. Mike Catt
PhD Students: Ben Lam, Philip Darke
MSc student: Sam Butterfield
Prof. Guaraldi
Prof. Mandreoli
MSc student: Davide Ferrari
Prof. Torlone
MSc student: Giulia Simonelli
Prof. Chapman
Editor's Notes
CVD leading cause of death for males (15.5%) and seconds for females (8.8%) in 2015 (*)
How about the data used to train / build the model?
Relatively easy to keep track of data pre-processing provenance
\newcommand{\f}{\textbf{a}}
\text{features}~ X=[\f_1 \ldots \f_k]
\text{new features}~ Y=[\f'_1 \ldots \f'_l]
\noindent new values for each row are obtained by applying $f$\\ to values in the $X$ features