Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Scribd will begin operating the SlideShare business on December 1, 2020 As of this date, Scribd will manage your SlideShare account and any content you may have on SlideShare, and Scribd's General Terms of Use and Privacy Policy will apply. If you wish to opt out, please close your SlideShare account. Learn more.
Published on
VTREAT: A Package for Automating Variable Treatment in R
Data characterization, treatment, and cleaning are necessary (though not always glamorous) components of machine learning and data science projects. While there is no substitute for getting your hands dirty in the data, there are many data issues that repeat from project to project. In particular, how do you deal with missing data values? How do you deal with previously unobserved categorical values?
In this talk, I will discuss some typical data problems, and describe VTREAT, our in-progress R package for automating the treatment of these common data issues.
Bio:
Nina Zumel is a Principal Consultant with Win-Vector LLC, a data science consulting firm based in San Francisco. Her technical interests include data science, statistics, statistical learning, and data visualization. She is also the co-author with John Mount of Practical Data Science with R. This book presents the process and principles of data science from a practitioner’s perspective, and complements existing texts on machine learning, statistics, big data, and R.