Pentaho Meeting 2008 - Statistics & BI

MAINZ 2008 STATISTICIANS AND BI

Our work Someone said tortura i dati abbastanza a lungo e confesseranno qualsiasi cosa ( Anonimo )

Our proposal ,[object Object],[object Object],[object Object],[object Object],The knowledge process has to be well defined and well thought since inception. To have an optimal technical instrument available is not enough! “ statistician” surplus

[object Object],[object Object],[object Object],Our proposal 1. The Data Quality

[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 1. The Data Quality Cluster analysis Regression analysis Rasch Analysis PCA/CatPCA

[object Object],[object Object],[object Object],[object Object],Our proposal 1. The Data Quality Missing values analysis Listwise deletion vs data imputation Outliers Detection b. Data Validation:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write (H.G. Wells)

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Different treatment Plots and Graphics, descriptive statistics, inferential models

[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Linear regression vs non linear High correlation? Structural Equation Models? Multicollinearity?

[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse High correlation?

[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse linear regressione vs non linear (probit, logit,…)

[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Structural Equation Models?

[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Multicollinearity?

[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Interval estimation Vs point estimation

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse GLS regression Mixed Effects Models

[object Object],[object Object],[object Object],Our proposal 2. Statistics Use/Abuse Monte Carlo simulation

[object Object],[object Object],[object Object],Our proposal 3. Interpretation

Conclusions Knowledge discovery rests on the three balanced legs of computer science, statistics and client knowledge. It will not stand either on one leg or on two legs, or even on three unbalanced legs. Successful knowledge discovery needs a substantial to collaboration from all three. DATA MINING = STATISTICS + CLIENT DOMAIN

Conclusions There is the opportunity for an immensely rewarding synergy between data miners and statisticians. However, most data miners tend to be ignorant of statistics and client's domain; statisticians tend to be ignorant of data mining and client's domain; and clients tend to be ignorant of data mining and statistics.

Pentaho Meeting 2008 - Statistics & BI

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to Pentaho Meeting 2008 - Statistics & BI

Similar to Pentaho Meeting 2008 - Statistics & BI (20)

Recently uploaded

Recently uploaded (20)

Pentaho Meeting 2008 - Statistics & BI