Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Job applicant rating algorithm by Anna Wright 218 views
- Using R to apply decision tree algo... by YITONG GAO 182 views
- Mining Facebook for Feelings by CBS Competitivene... 328 views
- Content Recommendation Based on Dat... by Marcel Caraciolo 5120 views
- Data mining with Google analytics by Greg Bray 7879 views
- Google Analytics Data Mining with R by Tatvic 4335 views

593 views

Published on

License: CC Attribution License

No Downloads

Total views

593

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

13

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Using R for Predictive Analytics Szil´rd Pafka a Predictive Analytics World / DC useR Group October 20, 2009
- 2. Introduction and agenda Introduction: • physics, ﬁnance, statistical modeling (PhD) • data mining at a credit card processor (Chief Scientist) • co-organizing the Los Angeles area R users group Agenda: • using R for predictive analytics • beneﬁts from R users groups
- 3. R and predictive analytics R: • data manipulation • statistical analysis, exploratory data analysis • computations, simulations • modeling • visualization Predictive analytics (data mining for prediction problems): • understanding the domain problem and related questions • data exploration and cleaning • building predictive models (supervised learning) • model diagnostics, selection and evaluation • deployment
- 4. R vs alternatives R: • open source • packages • latest cutting-edge methods • command line interaction (vs GUI) • ﬂexible • results easier to reproduce • learning curve • books • R-help • cost, multiple platforms alternatives: • spreadsheets • programming languages • data mining software • similar software environments
- 5. Data exploration • data frames • import from various sources (csv ﬁles, database connection) • ﬂexible data manipulation functionalities • subscripting • numerous transformations • text manipulation (e.g. regexp) • data aggregation • specialized packages (reshape, plyr etc.) • graphics • sample R code
- 6. Visualization • powerful tool for data exploration, diagnostics etc. • R graphics: base, lattice, ggplot2 • visual perception, principles of visual design • ggplot2 • based on the grammar of graphics • nice defaults • elements: mappings, geoms, stats, scales, facets • very expressive (sample code)
- 7. Building predictive models • supervised learning: y = f (x1 , x2 , . . . , xp ) • regression (numeric y ), classiﬁcation (categorical y ) • spam detection example: classiﬁcation (2 classes, Y/N) • numerous methods: logistic regression, LDA, nearest neighbors, decision trees, bagging, boosting, random forests, neural networks, support vector machines etc. • example: neural networks with nnet (sample code) • weight decay as regularization • helps convergence and against overﬁtting
- 8. Model evaluation 99% accuracy? • package caret: uniﬁed interface/wrapper for • training models, tuning (complexity parameters) • diagnosing, selecting, evaluating models • model deployment • Conclusion: R is a viable tool for predictive analytics
- 9. R users groups • R users groups around the world, in the US: SF, NY, LA, DC... • Los Angeles: 150+ members, 4 meetings so far • talks, Q&A, discussions, exchange of knowledge • pointing people where to look for (packages, docs etc.) • also more generally for statistical methodology issues • networking opportunities

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment