This slides show how to integrate with the powerful tool in big data area. When using spark to do data preprocessing then produce the training data set to scikit learn , it will cause performance issue . So i share some tips how to overcome related performance issue