Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webinar 1 :May 2018

294 views

Published on

Pistoia Alliance launched its Centre of Excellence for Artificial Intelligence (AI) in Life Sciences where we hope to bring together best practice, adoption strategy and hackathons covering a range of challenges.
Over the coming months we will be hosting a series of topics and speakers giving their perspectives on the role of Artificial & Augmented Intelligence in Life Sciences and Healthcare.
The topics will cover some of the current challenges, user stories & value in using AI in life sciences. If you want to get involved in this series as a speaker or suggest topics please get in touch
Webinar 1 will focused on the following
A Brief History
Big Data/ML/DL/AI - fundamentals and concepts
Data Fidelity importance
Some best practices

Published in: Health & Medicine
  • Be the first to comment

Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webinar 1 :May 2018

  1. 1. 23 May, 2018 Demystifying AI An Introduction to AI in Life Sciences Pistoia Alliance Centre of Excellence for AI in Life Sciences Webinar 1 Prashant Natarajan (@BigDataCXO)
  2. 2. This webinar is being recorded
  3. 3. Poll Question 1: Are you or your organisation using AI (including machine learning/deep learning/chat bots)? A. Yes B. Plan to do in next 12 months C. No
  4. 4. ©PistoiaAlliance 4 Phase 1: AI Intro & Views from the Trenches 2018 April August AI for Life Science Centre of Excellence 2018 Phase 2: AI Topics Cont’d Community Meetings Led by industry experts Oct: Boston Workshop EducateDevelop 2019 Datathon Challenges/ Comps
  5. 5. ©PistoiaAlliance Webinars: AI in Life Sciences – Q2/Q3 2018 Pistoia Alliance Membership Introduction 5 • Webinar 1 (23 May 2018) Prashant Natarajan – A Brief History – Big Data/ML/DL/AI - fundamentals and concepts – Data Fidelity & NFR Framework – Best Practices from the Trenches – Q&A • Webinar 2 June 2018 Prashant Natarajan – Big Data Analytics & AI - 2 sides of the same coin – A guided tour of learning algorithms for Healthcare – Real-life use cases in health & life sciences from the book Q & A – AI Solutions - Going Beyond Algorithms – Q & A • Webinar 3 – Real World Evidence, the Big Data Connection – The 3 P’s of RWE: Persons, Providers, and Pharma • Webinar 4 – State of the Art in AI with working examples • Etc – monthly Community will guide themes Like to give a talk or panel?
  6. 6. Poll Question 2: Are your or your organisation’s AI* project(s) to date providing meaningful outcomes? A. Yes B. Not yet C. Don’t Know AI* = including machine learning/deep learning/chat bots)
  7. 7. Prashant Natarajan • Senior Director of AI Applications at H2O.ai, Mountain View, CA, USA (www.h2o.ai) • Undergraduate degree in Chemical Engineering; Master’s in Technical Communications & Linguistics; PhD courses in Logic & Cognitive Psychology; AT&T-Yahoo Chancellor’s Fellow • 18+ years in health sciences industry – providers, pharma, payers, patients • H2O.ai; Oracle Health Sciences; McKesson; Healthways; Siemens • Lead author or contributor to books on big data analytics, business intelligence, cancer, machine learning, AI (best-sellers in 2012, 2017, 2018) • Co-Faculty Instructor, Stanford University School of Medicine, Palo Alto, CA • Industry Advisor, CA Initiative to Advance Precision Medicine/San Francisco VA @BigDataCXO | Prashant@h2o.ai | www.BigDataCXO.com
  8. 8. ©PistoiaAlliance Agenda 823 May, 2018 • Intelligent Machines: Perspectives on AI • Multi-disciplinary Foundations • Winters, Summers & More: a brief history of AI • To be or not to be • Machine Learning: the Basics • Best Practices • Considerations for Life Sciences • Webinar 2 content review
  9. 9. ©PistoiaAlliance Intelligent Machines: Perspectives on AI 923 May, 2018 • Acting Humanly (Turing Test) • A computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the written responses come from a person or from a computer. • An intelligent machine would need to demonstrate following capabilities: – NLP: natural language processing – Knowledge representation: store what it knows or hears – Automated reasoning: use the stored information to answer questions and to draw new conclusions – Machine learning to adapt to new circumstances and to detect and predict using data • Total Turing Test includes physical simulation; hence, capabilities must include computer vision (to perceive objects) & robotics • Thinking Humanly (Cognitive Modeling) • If we are going to say that a given program thinks like a human, we must have some way of determining how humans think. We need to get inside the actual workings of human minds. • Accomplished via – Introspection —thoughts – Psychological experiments—observing a person in action – Brain imaging—observing the brain in action • Cognitive modeling brings together computer models from AI and experimental techniques from psychology to construct precise and testable theories of the human mind • Source: AI A Modern Approach, Stuart Russell and Peter Norvig, 2010 “The difference between the almost right word and the right word is a large matter…” ~ Mark Twain
  10. 10. ©PistoiaAlliance Intelligent Machines: Perspectives on AI 1023 May, 2018 • Thinking Rationally (Logic) • Aristotle tried to codify “right thinking,” that is, irrefutable logical reasoning processes. • Solve problems and create intelligent systems using “laws of thought” logical notations. Emphasis on making “correct” inferences • Not easy to take informal knowledge and state it in the formal terms required by logical notation, particularly when the knowledge is less than 100% certain. • Differences between solving a problem “in principle” with logic and solving it in practice. • Acting Rationally (Rational Agents/CIAs) • Contextually-Intelligent Agents – act/do things • A rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome. • An agent must – Operate autonomously – Perceive their environment – Persist over a prolonged time period – Adapt to change – Create and pursue goals • Sources • Demystifying Big Data & Machine Learning for Healthcare, Prashant Natarajan, Bob Rogers, et al., 2017 • AI A Modern Approach, Stuart Russell and Peter Norvig, 2010 “The difference between the almost right word and the right word is a large matter…” ~ Mark Twain
  11. 11. ©PistoiaAlliance Multi-disciplinary Foundations 1123 May, 2018
  12. 12. ©PistoiaAlliance Winters, Summers and other Seasons 1223 May, 2018 • A Brief History of AI • Desire for thinking machines almost as old as written human history • Talos > Rhetorical Algebra (India, Greece, Egypt) > Linear Algebra (al-Khwarizmi) > Calculating Machines > • Science Fiction > Game AI > Scientific Research (Dartmouth) • Rosenblatt’s Perceptron > Minsky’s Winter > Expert Systems > Winter > Hinton’s DBN/RBM > “Modern” AI • Singularity (ahem!) • We live in an era that is poised for success of ML driven by compute power, datafication, sharing/collaboration
  13. 13. ©PistoiaAlliance To Be or Not To Be 1323 May, 2018 • Reality • Summers • Business and use case driven • Augmented & Artificial • Data Fidelity • Jobs gained • White Box • Conversational AI • ML + DL • Automated & biz friendly • Hype • Winters • Artificial General Intelligence – AGI • Artificial • Data Quality • Jobs Lost • Black Box • Chat bots • DL • Manual & geek friendly
  14. 14. Poll Question 3: Do you feel that life sciences is ahead or behind in the development of AI in comparison to other industries? A. Considerably behind B. Somewhat behind C. Equal D. Ahead E. Don’t Know
  15. 15. Machine Learning 101 Mastering the Basics Source: “Demystifying Big Data and Machine Learning for Healthcare” (Taylor & Francis, 2017), Natarajan et al. Prashant Natarajan
  16. 16. ©PistoiaAlliance Why Machine Learning? 1623 May, 2018 • Machine learning enables new use cases by – Ameliorating the effects of certain human limitations - cognitive (repetitive accuracy, human limitations & information overload), physical (fatigue), emotional (mood, human biases, etc) – Enabling new knowledge creation or data reduction via learning and prediction – Learning to generate computational biomarkers - finding hidden patterns/insights that are not visible to the eye – Processing repetitive data management tasks more efficiently, consistently, and with greater performance – Serving as the foundation for clinical workflows and comprehensive secondary use that includes predictive and prescriptive analytics, intelligent search, speech to text conversion, real-time image processing among other uses
  17. 17. ©PistoiaAlliance What is Machine Learning? 1723 May, 2018 • “…field of study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel) • ”…searching a very large space of possible hypotheses to determine one that best fits the observed data and any prior knowledge held by the learner” (Tom Mitchell) Source: Self, “Demystifying Big Data & Machine Learning for Healthcare” (2017)
  18. 18. ©PistoiaAlliance Learning Algorithms 1823 May, 2018 • “A learning algorithm is an algorithm that is able to learn from data” (Ian Goodfellow et al.) • “Learning algorithms create new knowledge or demonstrate new skills by learning from old (training data) and new (generalized) data. A learning algorithm uses data and experience to self-learn and also perform better over time. During the process, a learner also optimizes itself to progressively come up with better predictions” (Prashant Natarajan et al.) • Generalization: Making predictions (using the data you have to create data you don’t have via extrapolation) and creating new knowledge and insights based on extrapolated data. Generalization is the ability to perform well on previously unobserved inputs • Training Dataset: the data you have that is used as input to the learner to train the model • Test Dataset (dataset that MUST NOT be the same as training data and that’s used by the learner for validation and optimization) • A well-defined learning problem requires a well-specified task, T; performance metric, P; and source of training experience, E • A formal definition (and a personal favorite) definition of learning is "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E"
  19. 19. ©PistoiaAlliance Delving Deeper 1923 May, 2018 • A Task is something that we want the machine learning system to do – the process of learning itself is not the task. Learning is our means of attaining the ability to perform the task. • A Dataset is a collection of Examples (or Features) • A Task is then defined in terms of how the learning system should process an Example. • An Example is defined as a collection of features that have been quantitatively measured from some object or event that we want the system to process." A Feature is the combination of an attribute and its value. • Dimensionality is the number of features that contain most useful or actionable information • Parameters are attribute values (of high-value features) that control the behavior of the learning system.
  20. 20. ©PistoiaAlliance Delving Deeper 2023 May, 2018 • Dimensional Reduction is reducing the number of features required in an example. Dimension reduction can be accomplished via Feature Engineering: Extraction and Selection • Weights determine how each feature affects the prediction • One of the primary purposes of machine learning is for the system to perform on "unknown unknowns," or new, previously-unseen data - not just the training dataset that which the model was trained. A good machine learning system generalizes well from the training dataset to any data from the problem domain. • Overfitting happens when a learner mimics random fluctuations, anomalies, and noise in the training dataset thus adversely impacting the performance of the system on new data.
  21. 21. ©PistoiaAlliance Delving Deeper 2123 May, 2018 • Performance, P, is measured on the task being carried out - and is typically measured in terms of Accuracy, which is the "proportion of examples for which the model produces the correct output" or Error Rate, which is the "proportion of examples for which the model produces the incorrect output" • Noise, in machine learning, refers to "errors in the training data for machine learning algorithms • Experience, E, in machine learning is primarily determined by the amount of supervision (during the learning process) and the availability of labeled data in the dataset. • Supervised Learning, the learning algorithm is provided with a set of inputs for the algorithm along with the corresponding correct outputs, and learning involves the algorithm comparing its current actual output with the correct or target outputs, so that it knows what its error is, and modify things accordingly.”
  22. 22. ©PistoiaAlliance Delving Deeper 2223 May, 2018 • Input data is is labeled based on existing knowledge (for example, is the email in the training dataset spam or not-spam?) • The model continues to train until it achieves a desired level of performance on the training dataset - and the training model is then fed new and unknown data. • In Unsupervised Learning, input data is not labeled, and further, "the system is not told the 'right answer' - for example, it is not trained on pairs consisting of an input and the desired output. Instead the system is given the input patterns and is left to find interesting patterns, regularities, or clusterings among them.” • In Semi-supervised Learning, input data may be only partially labeled, and the expected results may or may not be known. The machine learning system will include both supervised and unsupervised learners. • Active Learning, is a semi-supervised learning experience where the model chooses by itself what unlabelled data would be most informative for it, and asks an external “oracle” (for example, a human annotator) for a label for the new data points
  23. 23. ©PistoiaAlliance Delving Deeper 2323 May, 2018 • Deep Learning is a type of machine learning experience that uses learning algorithms called artificial neural networks that attempt to simulate or replicate the functioning of the human brain. Think of deep neural networks as “ANNs with lotsa depth” • Think of "deep" in deep learning as having many more layers (or Depth) than were possible with ANNs and as the ability to deal with very large datasets due to Moore’s law and data availability. The principle driving deep learning is “guiding the training of intermediate levels of representation using unsupervised learning, which can be performed locally at each level.” (Bengio) • Some types of deep neural nets – including Feed Forward Neural Networks; Recurrent Neural Networks; Convolutional Neural Networks; and Reinforcement Neural Networks among others
  24. 24. ©PistoiaAlliance Combine ML + Other Analytics 2423 May, 2018 Source: Exhibit from “The Age of Analytics: Competing in a Data- Driven World,” December 2016, McKinsey Global Institute, www.mckinsey.com. Copyright © 2016 McKinsey&Company. All rights reserved. Reprinted by permission.
  25. 25. ©PistoiaAlliance Best Practices 2523 May, 2018 • Ask a specific question: the best first question is something you already know the answer to, so that you have a reference and some intuition to compare your results with. • Start Simple: for model selection and data you consider using. You want your results to be robust, so less model complexity and fewer parameters are always beneficial. • Regarding data, don’t start by building a huge data lake with every kind of data you could possibly get your hands on. Instead, start with the minimal set of data that could get you to a good result. • Try many algos to see how they work. Ensembles are useful. Remember that data is more important than the exact algorithm you use. More training data is always desirable.
  26. 26. ©PistoiaAlliance Best Practices 2623 May, 2018 • Treat Your Data with Suspicion: Look at your data, dig into its details, look for correlations, suspicious gaps, systematic biases, errors, and flaws. Use statistics and visualizations here. • Text has transcription errors, misspellings, and abbreviations. These challenges often exist for structured data as well: you will find that data is recorded inconsistently both across your data set and even within a single field. • Data fidelity is more important and useful than “1 Size Fits All” Data Quality – use the NFR framework in our book to inform data fidelity • Validate your models - Separate your data into training, test, and validation sets; be aware of your/data biases
  27. 27. ©PistoiaAlliance Best Practices 2723 May, 2018 • Set Up a Feedback Loop - think through how you will use the output errors of your machine-learning system to improve it. Downstream users can provide feedback on when your algorithm got it wrong. How are you capturing this feedback so you can bring it back into training? • White boxes using Transparency, Interpretability, Explainability - Healthcare Doesn’t Trust Black Boxes • Correlation Is Not Causation - It’s easy convince yourself that two factors that move together imply that one causes the other. Just remember that in many cases there is a hidden factor that could be causing both factors to move together.
  28. 28. ©PistoiaAlliance Best Practices 2823 May, 2018 • Monitor Ongoing Performance - How will you monitor the performance of your algorithm on an ongoing basis? Data drifts and systems evolve. • Keep Track Of Your Model Changes - Always track the revision of your model and report it with your results. As you improve different parts of your data analytics pipeline, you will want to go back and re-analyze data. Recording which model was used at which time helps you understand what to recalculate. • Don’t be Fooled by “Accuracy” - If you’re looking for a rare event that only happens 1% of the time, and you never actually find it, you can report your accuracy as 99%. Obviously, that’s meaningless. Instead, figure out before you start your project what precision and recall your application requires to be useful. Build your application to these metrics.
  29. 29. ©PistoiaAlliance Consideration for Life Sciences 2923 May, 2018 • Regulations and policy • Innovation in a regulated environment • TIE it up • Organization and structural challenges in Life Sciences • Resourcing • Data fidelity and labeling • Keep minds open on deep learning • MDM is critical as is data governance • Ethics and privacy – human and machine morality are not the same. Does a machine have morals? • Clear demarcation or sharing of human & machine- learning/CIA responsibilities when failure happens
  30. 30. Poll Question 4: Are you planning collaborations in AI in any of the following fields over the next 12-18 months? A. Academia B. Technology/data provider C. Healthcare D. Government/public sector E. Don’t know
  31. 31. ©PistoiaAlliance Audience Q&A Please use the Question function in GoToWebinar
  32. 32. ©PistoiaAlliance Big Data Analytics, ML & AI The next Pistoia Alliance CoE AI Webinar: Date: June 2018 check http://www.pistoiaalliance.org/events/ for the latest information
  33. 33. info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org Thank You

×