Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Philips john huffman

1,205 views

Published on

BDE2016

Published in: Data & Analytics
  • Be the first to comment

Philips john huffman

  1. 1. 1 Philips HealthCare Informatics A Perspective on Big Data, Analytics and AI John Huffman, CTO Philips Healthcare Informatics September 2016, Utrecht, NL
  2. 2. 2 A Little Bit About My Background 35 years or so of AI, reasoning and knowledge integration • Started at Thinking Machines when it started in the early 80’s – Worked with Danny Hillis, Brewster Kahle on The Connection Machine • MCC (US Fifth Generation Project) – Worked with Doug Lenat on AI and CYC (comprehensive common sense knowledge and reasoning project)  Liaison to NLP and CHI groups • Progressively worked on systems of integrated information, knowledge representation, workflow and integrated decision support through start-ups (usually my own) and finally larger companies – Aware, SGI, Stentor, Poiesis Informatics, Philips
  3. 3. 3 Lots of Hype Around Big Data… Many companies getting into the fray…
  4. 4. 4 Many Opinions on Where We Are Has anyone actually leveraged this in healthcare?
  5. 5. 5 Advanced Analytics Process* Multi-Stage Process *CRISP – Cross Industry Standard Process for Data Mining
  6. 6. 6 Too much focus on one component… Multi-Stage Process *CRISP – Cross Industry Standard Process for Data Mining
  7. 7. 7 Steps
  8. 8. 8 Analytics Lifecycle Overview Data Ingestion Model training Production Model Evaluation Data Scientist Landing Zone Data Processing ETLed Processed Zone Model Repository Data Science Cleaned Data Data Cleaning Big Data Platform Anonymized data Repository
  9. 9. 9 Feature Eng Hosted solution Analytics Lifecycle (more detail) REST ML APIs ML AlgosIPs Data Science Hosted Cluster (Create Model) ETLs ML R lib ML Py lib Models ML Scoring Service Feature Engg. Predictive Analytical Apps Operationalize Model Evaluate Model Predictive Model Evaluator Model Staging Hosted Cluster (Evaluation) Production Cluster Access Processing Data Access Processing Feature Eng ML Frmk ML Framework Models ML Scoring Service ML Frmk Data Big Data platform Data Science Platform (Analytics and ML) Proposition Owner Model Evaluator Service Predictive model creation Domain Services Domain Services Original raw data ETLed data Anonymized data Scripts and Model Rep. Create model Data Preparation Phase
  10. 10. 10 Challenges in Data Collection and Processing Before any analytics can start… • Data Identification, Collection and Preparation – Domain knowledge important to discriminate relevant data • ETL – extracting relevant data from raw data • Massaging – pre-processing the data – [Automatic] annotation of data (e.g. masking of bones in chest xray) • Normalization of the data – Especially complex when data is received from multiple sources • Aggregation of data – For purpose of statistical analysis • Note – All the above steps must be done on the same set of technologies that will be present during the deployment of the resultant model
  11. 11. 11 Training and Validating the Model Which method is appropriate? • Effective model creation requires an understanding of the nuances and strengths of different methods – Selection of the right method depending on the task  Classification/Regression/Clustering/Dimensionality reduction… • Identification and compute of the metric(s) to evaluate the model – Requires training and test data • Ensure there is no overfitting • Validate the model – On extended data sets, cohort variation • Fine tune the parameters of the model • Note – All the above steps to be done on the same set of technologies that will be present during the deployment
  12. 12. 12 Challenges in Deployment and Operations • Installation (On-Premise, Cloud, Hybrid) • Configuration • Health Monitoring • Auto-Scaling • Multi-Tenancy • Disaster Recovery • Licensing • Performance Monitoring • Metering and Billing • Upgrades • Snapshots • Certificate Management • Resource Utilization and Trending • Privacy and Security
  13. 13. 13 These Methods Are Not New Decades to centuries old technologies • Neural Networks – (1943) by Warren McCulloch and Walter Pitts, original called threshold logic • Deep Learning – (1965) Ivakhnenko and Lapa, papers in 1971 already described deep networks with 8 layers trained by the group method of data handling algorithm • Random Decision Forest – (1995) Ho • Big Data (MapReduce) – 2000-2004 various papers, underlying methods well-known in the mid-90’s. Apache Hadoop (open source) has been available since 2011 • Bayesian methods – Bayes lived in the 1700’s. Naïve Bayes methods since the 50’s
  14. 14. 14 Some Lessons from AI History Well-known that data is much more important than method… • Just Google – “More data and simple algorithms beat complex analytics methods” • This is well-known from expert system and AI experience – “Brittleness”  Application of models on data outside the training domain frequently fails in unusual, unexpected ways – Marvin Minsky, “Society of Mind”  Complex and intelligent behavior comes from the orchestration of simple agents • Without a broad, semantically interoperable, clean data repository – complex analytics, decision support algorithms, and workflow optimizations cannot be derived • Data is the intellectual property in this domain
  15. 15. 15 Analytics Stack Analytics is a set of tools – not a solution General ML Algorithms R SDK Data Repositories (S3, HDFS, Hive…) REST Machine Learning APIs Py SDK Analytical Apps Clinical Image Analytics Clinical Text Analytics 3rd Party Apps JDBC/OBDC Distributed Processing framework IPs Deep Learning libraries NLP building blocks Model Rep. Scripts Rep. • Provide easy to use SDKs (R and Python) • Prebaked thin client development environments • Rstudio and Jupyter • All ML Capabilities are exposed via RESTFul APIs • Provide higher level abstraction APIs for Clinical Text and Clinical images • Provide Building blocks for NLP and DL frameworks • Host Research IP assets • Persist the models and scripts in repositories (shared across development and deployment clusters)
  16. 16. 16 Philips Approach - HSDP Analytics and Big Data are an integrated component of the platform ConnectStore Authorize Share Orchestrate Manages, updates, monitors and remotely controls smart devices Securely identifies users, authorizes consent, ensures data privacy and tracks user activity Standardizes interfaces between HealthSuite enabled applications and devices with third-party systems Provides functionality to help complete routine tasks and coordinate communications among users A tailored set of capabilities and tools, optimized for rapid prototyping and development of healthcare and health-related applications Host Provides managed infrastructure to monitor the health of systems and performance of applications Analyze Acquire, access and manage personal data from devices and applications through a cloud- hosted repository Offers the foundational infrastructure to build decision support algorithms and machine learning applications
  17. 17. 17

×