Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HealthCare.AI Applied To Cancer Data Sets


Published on

This presentation was made at the November 28, 2017 meeting of The Chicago Technology For Value-Based Healthcare Meetup ( It is an updated presentation to the previous month's meeting.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

HealthCare.AI Applied To Cancer Data Sets

  1. 1. HealthCare.AI (Python version) based on Python sk-learn library (Cancer Data Sets) Chicago Technology For Value-Based Healthcare 11/28/2017 Dan Wellisch, Meetup Organizer Healthcare.AI is found at is an open source machine learning toolkit for healthcare. It has a dedicated team of healthcare data scientists, and is part of Health Catalyst (
  2. 2. • The Healthcare.AI toolkit also has an R version. • Data Input: Feature Columns • Data Output: Prediction Column • Types Of Problems Classification – Binary or Multiclass Regression – Numeric • Classification Examples: 2 Different Cancer Data Sets Introduction
  3. 3. Machine Learning Terms Feature/Independent Variable: A measurable property of a phenomenon being observed. Example: Characteristics of a tumor: radius, color, etc. Prediction/Dependent Variable: Given features, what do we predict the outcome to be? Example: Benign or Malignant Model: An algorithm like, linear regression, logic regression, SVM, neural networks
  4. 4. Machine Learning Cycle Model Selection: Choose A Model. Feature Engineering: The process of choosing features. The goal is to choose features • That change independently of each other. In other words, given Feature A, select a Feature B that does not track the level of Feature A. • That are composed of other features. For example, if you have length and width, then don’t use area as another feature. Run Model Ready For Production? No Yes Save Model, Run Model In Production Improve
  5. 5. Training the Model 1. Setup a pipeline. Each “pipe” in the pipeline is either a transformer or estimator. A transformer implements a fit and transform method. An estimator implements a fit and predict method. transformer estimator 2. Clean the data by processing through transformers of the pipeline.
  6. 6. Training the Model 2. (cont.) Example Transformers. custom_pipeline = Pipeline([ ('remove_DTS_columns', hcai_filters.DataframeColumnSuffixFilter()), ('remove_grain_column', hcai_filters.DataframeColumnRemover('ID')), # Perform one of two basic imputation methods # TODO we need to think about making this optional to solve the problem of rare and very predictive values ('imputation', hcai_transformers.DataFrameImputer(impute=False, verbose=False)), ('null_row_filter', hcai_filters.DataframeNullValueFilter(excluded_columns=None)), ('convert_target2_to_binary', hcai_transformers.DataFrameConvertTarget2ToBinary('classification', 'Diagnosis')), ('prediction_to_numeric', hcai_transformers.DataFrameConvertColumnToNumeric('Diagnosis')), ('create_dummy_variables', hcai_transformers.DataFrameCreateDummyVariables(excluded_columns=['Diagnosis'])) ])
  7. 7. Training the Model 3. Split data into train and test sets. 4. Train a model. Example models to train are the following.: a) linear regression model b) logistic regression model c) lasso regression model d) ensemble regression model e) ensemble logistic model f) random forest regression model g) random forest classification model h) knn model 5. Tweak model parameters for models you investigate on your training set. Go back to step 4 until you have found your best performing model on your training set. 6. Save your best performing model (for your training set). 7. The saved model can now be put into production. Demo Model Training Execute
  8. 8. Using the Saved Model To Predict Classes Demo Model Prediction Execute