The Do's and Don'ts of Data Mining

2,218 views

Published on

The do's and don'ts of data mining and predictive modeling from experts in the field.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,218
On SlideShare
0
From Embeds
0
Number of Embeds
304
Actions
Shares
0
Downloads
34
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Do's and Don'ts of Data Mining

  1. 1. The Do’s and Don’ts of DATA MINING Based on real-world experiences Published By A publicatio n of
  2. 2. Data mining has come a long way over the past 300 years…
  3. 3. Over time, data practitioners have had their fair share of
  4. 4. Change the way you do things, or KEEP doing what works!
  5. 5. The Do’s of ✓ Data Mining
  6. 6. …According to Scott Terry President Rapid Progress Marketing and Modeling, LLC www.RPMSquared.com
  7. 7. Do Create a Clearly-Defined, Measurable Objective for Every Project
  8. 8. Do Simplify The Solution To Increase Your Chances of Success
  9. 9. …According to Gregory Piatetsky-Shapiro Editor www.kdnuggets.com @kdnuggets
  10. 10. DO ASK QUESTIONS. Understanding the problem and asking the right question is more important than using an advanced algorithm.
  11. 11. …According to Jim Kenyon Director of IT Services Optimization Group www.optimizationgroup.com
  12. 12. Do Plan For Data To Be Messy While data is available for mining projects in everincreasing amounts, it is the rare occasion when it will arrive in a tidy, mining-ready format. More typically, it will show up in multiple spreadsheets that vary in format and granularity. These varied formats frequently require hours (and hours) of ETL (Extract, Transform, Load) time.
  13. 13. Do use more than 1 technique/algorithm.
  14. 14. Do cross-check data coming out of the ETL process with the original values, and with project stakeholders.
  15. 15. …According to Falk Huettmann Wildlife Ecologist His work is explicit in space and time, and looks closely at the global effects of the economy.
  16. 16. DO BE INFORMED Stay fluent on the latest data mining concepts and approaches, as well as data mining history.
  17. 17. The Don’ts of ✗ Data Mining
  18. 18. …According to Scott Terry President Rapid Progress Marketing and Modeling, LLC www.RPMSquared.com
  19. 19. DO NOT EVER… I MEAN EVER UNDERESTIMATE THE POWER OF GOOD DATA PREPARATION
  20. 20. Do Not Ascribe Them Mystical Powers and Wrongly Think “It’s All About the Algorithms”
  21. 21. …According to Dean Abbott Founder & President Abbott Analytics/Abbott Consulting www.abbottanalytics.com @deanabb
  22. 22. DON’T USE THE DEFAULT MODEL ACCURACY METRIC
  23. 23. …According to Gregory Piatetsky-Shapiro Editor www.kdnuggets.com @kdnuggets
  24. 24. Don’t Overfit With Big Data, it is easy to find patterns even in random data. Use appropriate tests such as randomization tests to avoid finding false patterns in test data, which will not hold later on.
  25. 25. …According to Jim Kenyon Director of IT Services Optimization Group www.optimizationgroup.com
  26. 26. Do not just collect a pile of data and “toss it into the big data mining engine” to see what comes out. Domain knowledge is an important cross-check on the variables being used. Extraneous data can reduce model accuracy.
  27. 27. Do not underestimate the power of a simpler-tounderstand solution that is slightly less accurate. A model a client cannot grasp is one that will not be trusted as much as one that “makes sense.”
  28. 28. …According to Falk Huettmann Wildlife Ecologist His work is explicit in space and time, and looks closely at the global effects of the economy.
  29. 29. Don’t forget to document all modeling steps and underlying data
  30. 30. Do not blindly trust assumptions made to satisfy frequency statistics, as well as p-values and AIC
  31. 31. Sign up for more Fun data mining articles SUBSCRIBE TO SALFORD SYSTEMS’ BLOG

×