VTREAT: A Package for Automating Variable Treatment in R
Data characterization, treatment, and cleaning are necessary (though not always glamorous) components of machine learning and data science projects. While there is no substitute for getting your hands dirty in the data, there are many data issues that repeat from project to project. In particular, how do you deal with missing data values? How do you deal with previously unobserved categorical values?
In this talk, I will discuss some typical data problems, and describe VTREAT, our in-progress R package for automating the treatment of these common data issues.
Bio:
Nina Zumel is a Principal Consultant with Win-Vector LLC, a data science consulting firm based in San Francisco. Her technical interests include data science, statistics, statistical learning, and data visualization. She is also the co-author with John Mount of Practical Data Science with R. This book presents the process and principles of data science from a practitioner’s perspective, and complements existing texts on machine learning, statistics, big data, and R.