Data Mining Example ‘Tips’ dataset Irish Centre for High End Computing Dr. Eoin Brazil, Technology Transfer
Outline of Presentation Tips in a US restaurant for 2 1/2 months 244 cases, 8 variables Obs num, totbill, tip, sex, smoker, day (Thur-Sun), time, size of party Source: Bryant, P. G. and Smith, M. A. (1995), Practical Data Analysis: Case Studies in Business Statistics Irish Centre for High End Computing (ICHEC) - Data Mining Example
Exploring ‘tips’ dataset Q:  What are the factors that affect tipping behaviour ? Data restructuring:  Calculate tiprate = tip/totbill Avenues to explore:  Regression modeling Irish Centre for High End Computing (ICHEC) - Data Mining Example
Irish Centre for High End Computing (ICHEC) - Data Mining Example Large bins show global features, tips fall off quickly so perhaps this is not a very expensive restaurant.
Irish Centre for High End Computing (ICHEC) - Data Mining Example Smaller bins show local features, tips tend to be rounded to the nearest 50 cents or dollar.
Irish Centre for High End Computing (ICHEC) - Data Mining Example Looking at the correlation between variables (r) and checking to see who gives more or less than the ‘average’ tip of 18% of the bill.
Irish Centre for High End Computing (ICHEC) - Data Mining Example ‘ Conditioning’ or ‘Drilling down’ to explore more complex relationships within the data.
Irish Centre for High End Computing (ICHEC) - Data Mining Example ‘ Conditioning’ or ‘Drilling down’ to explore more complex relationships within the data, smokers round their tip more.
Beyond ‘toy’ datasets More complex methods Classification (Supervised/Unsupervised) Machine Learning Statistical techniques Visualisation Irish Centre for High End Computing (ICHEC) - Data Mining Example
Acknowledgements Supported by Science Foundation Ireland under grant 08/HEC/I1450 and by HEA’s PRTLI-C4.

An example of discovering simple patterns using basic data mining

  • 1.
    Data Mining Example‘Tips’ dataset Irish Centre for High End Computing Dr. Eoin Brazil, Technology Transfer
  • 2.
    Outline of PresentationTips in a US restaurant for 2 1/2 months 244 cases, 8 variables Obs num, totbill, tip, sex, smoker, day (Thur-Sun), time, size of party Source: Bryant, P. G. and Smith, M. A. (1995), Practical Data Analysis: Case Studies in Business Statistics Irish Centre for High End Computing (ICHEC) - Data Mining Example
  • 3.
    Exploring ‘tips’ datasetQ: What are the factors that affect tipping behaviour ? Data restructuring: Calculate tiprate = tip/totbill Avenues to explore: Regression modeling Irish Centre for High End Computing (ICHEC) - Data Mining Example
  • 4.
    Irish Centre forHigh End Computing (ICHEC) - Data Mining Example Large bins show global features, tips fall off quickly so perhaps this is not a very expensive restaurant.
  • 5.
    Irish Centre forHigh End Computing (ICHEC) - Data Mining Example Smaller bins show local features, tips tend to be rounded to the nearest 50 cents or dollar.
  • 6.
    Irish Centre forHigh End Computing (ICHEC) - Data Mining Example Looking at the correlation between variables (r) and checking to see who gives more or less than the ‘average’ tip of 18% of the bill.
  • 7.
    Irish Centre forHigh End Computing (ICHEC) - Data Mining Example ‘ Conditioning’ or ‘Drilling down’ to explore more complex relationships within the data.
  • 8.
    Irish Centre forHigh End Computing (ICHEC) - Data Mining Example ‘ Conditioning’ or ‘Drilling down’ to explore more complex relationships within the data, smokers round their tip more.
  • 9.
    Beyond ‘toy’ datasetsMore complex methods Classification (Supervised/Unsupervised) Machine Learning Statistical techniques Visualisation Irish Centre for High End Computing (ICHEC) - Data Mining Example
  • 10.
    Acknowledgements Supported byScience Foundation Ireland under grant 08/HEC/I1450 and by HEA’s PRTLI-C4.