The Do's and Don'ts of Data Mining
 

Like this? Share it with your network

Share

The Do's and Don'ts of Data Mining

on

  • 1,146 views

The do's and don'ts of data mining and predictive modeling from experts in the field.

The do's and don'ts of data mining and predictive modeling from experts in the field.

Statistics

Views

Total Views
1,146
Views on SlideShare
416
Embed Views
730

Actions

Likes
1
Downloads
26
Comments
0

5 Embeds 730

http://1.salford-systems.com 722
http://feedly.com 5
https://www.linkedin.com 1
http://plus.url.google.com 1
http://www.google.com.my 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Do's and Don'ts of Data Mining Presentation Transcript

  • 1. The Do’s and Don’ts of DATA MINING Based on real-world experiences Published By A publicatio n of
  • 2. Data mining has come a long way over the past 300 years…
  • 3. Over time, data practitioners have had their fair share of
  • 4. Change the way you do things, or KEEP doing what works!
  • 5. The Do’s of ✓ Data Mining
  • 6. …According to Scott Terry President Rapid Progress Marketing and Modeling, LLC www.RPMSquared.com
  • 7. Do Create a Clearly-Defined, Measurable Objective for Every Project
  • 8. Do Simplify The Solution To Increase Your Chances of Success
  • 9. …According to Gregory Piatetsky-Shapiro Editor www.kdnuggets.com @kdnuggets
  • 10. DO ASK QUESTIONS. Understanding the problem and asking the right question is more important than using an advanced algorithm.
  • 11. …According to Jim Kenyon Director of IT Services Optimization Group www.optimizationgroup.com
  • 12. Do Plan For Data To Be Messy While data is available for mining projects in everincreasing amounts, it is the rare occasion when it will arrive in a tidy, mining-ready format. More typically, it will show up in multiple spreadsheets that vary in format and granularity. These varied formats frequently require hours (and hours) of ETL (Extract, Transform, Load) time.
  • 13. Do use more than 1 technique/algorithm.
  • 14. Do cross-check data coming out of the ETL process with the original values, and with project stakeholders.
  • 15. …According to Falk Huettmann Wildlife Ecologist His work is explicit in space and time, and looks closely at the global effects of the economy.
  • 16. DO BE INFORMED Stay fluent on the latest data mining concepts and approaches, as well as data mining history.
  • 17. The Don’ts of ✗ Data Mining
  • 18. …According to Scott Terry President Rapid Progress Marketing and Modeling, LLC www.RPMSquared.com
  • 19. DO NOT EVER… I MEAN EVER UNDERESTIMATE THE POWER OF GOOD DATA PREPARATION
  • 20. Do Not Ascribe Them Mystical Powers and Wrongly Think “It’s All About the Algorithms”
  • 21. …According to Dean Abbott Founder & President Abbott Analytics/Abbott Consulting www.abbottanalytics.com @deanabb
  • 22. DON’T USE THE DEFAULT MODEL ACCURACY METRIC
  • 23. …According to Gregory Piatetsky-Shapiro Editor www.kdnuggets.com @kdnuggets
  • 24. Don’t Overfit With Big Data, it is easy to find patterns even in random data. Use appropriate tests such as randomization tests to avoid finding false patterns in test data, which will not hold later on.
  • 25. …According to Jim Kenyon Director of IT Services Optimization Group www.optimizationgroup.com
  • 26. Do not just collect a pile of data and “toss it into the big data mining engine” to see what comes out. Domain knowledge is an important cross-check on the variables being used. Extraneous data can reduce model accuracy.
  • 27. Do not underestimate the power of a simpler-tounderstand solution that is slightly less accurate. A model a client cannot grasp is one that will not be trusted as much as one that “makes sense.”
  • 28. …According to Falk Huettmann Wildlife Ecologist His work is explicit in space and time, and looks closely at the global effects of the economy.
  • 29. Don’t forget to document all modeling steps and underlying data
  • 30. Do not blindly trust assumptions made to satisfy frequency statistics, as well as p-values and AIC
  • 31. Sign up for more Fun data mining articles SUBSCRIBE TO SALFORD SYSTEMS’ BLOG