Published on

Statistics for Computational Linguistics Subject at University of Seville

Published in: Education, Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Statistics<br />By Carmelo Establier Sánchez<br />
  2. 2. Descriptive Statistics<br />
  3. 3. Descriptive vs. Inferential<br />Discrete data are whole numbers, and are usually a count of objects<br />Measured data are continuous and may take any real value<br />Numerical data are number of any kind<br />Categorical data are made of words (i.e. apple, grapes, bananas…)<br />
  4. 4. Means, medians and modes<br />Median: <br />The median is the middle number of a set of numbers arranged in numerical order.<br />Mode:<br />The most frequent value in a set.<br />Mean:<br />The sum of all the values of a set divided by the number of values. <br />
  5. 5. Variability<br />Range:<br />the length of the smallest interval which contains all the data.<br />It is calculated by subtracting the smallest observation (sample minimum) from the greatest (sample maximum) and provides an indication of statistical dispersion.<br />
  6. 6. Variability<br />Variance:<br />The variance is a measure of items are dispersed about their mean<br />If a random variable X has the expected value (mean) μ = E[X], then the variance of X is given by:<br />
  7. 7. Variability<br />The standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance<br />
  8. 8. Variability<br />Relative variability<br />The relative variability of a ser is its standard deviation divided by its mean.<br />
  9. 9. Linear transformations<br />A linear transformation of a data set is one where each element is increased by or multiplied by a constant<br />Addition:<br />If a constant C is added to each member of a set, the mean will be C more that it was before.<br />Standard Deviation will not be affected. <br />Range will not be affecter neither.<br />
  10. 10. Linear transformation<br />Multiplication <br />Each member of a set is multiplied by a constant C, then <br />The mean will be C times its value before the constant was applied.<br />The Standard Deviation and Range, will be |c| times its value before it was applied.<br />
  11. 11. Inferential Statistics<br />
  12. 12. Inferential Statistics<br />Inferential Statistics comprises the use of statistics and random sampling to make inferences concerning some unknown aspect of a population. It is distinguished from descriptive statistics.<br />Includes:<br />Estimation<br />Point estimation<br />Interval estimation<br />Prediction<br />Hypothesis testing<br />
  13. 13. Estimation<br />Point estimation:<br />In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a &quot;best guess&quot; for an unknown (fixed or random) population parameter.<br />
  14. 14. Estimation<br />Interval estimation:<br />It is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter, in contrast to point estimation, which is a single number.<br />
  15. 15. Hypothesis testing<br />Whilst all pieces of quantitative research have some dilemma, issue or problem that they are trying to investigate, the focus in hypothesis testing is to find ways to structure these in such a way that we can test them effectively. Typically, it is important to:<br />1. Define the research hypothesis and set the parameters for the study.<br /> 2. Set out the null and alternative hypothesis (or more than one hypothesis; in other words, a number of hypotheses). <br />3. Explain how you are going measure. What you are studying and set out the variables to be studied. <br />4. Set the significance level. <br />5. Make a one- or two-tailed prediction. <br />6. Determine whether the distribution that you are studying is normal (this has implications for the types of statistical tests that you can run on your data). <br />7. Select an appropriate statistical test based on the variables you have defined and whether the distribution is normal or not. <br />8. Run the statistical tests on your data and interpret the output. <br />9. Accept or reject the null hypothesis. <br />
  16. 16. Prediction<br />Prediction or Predictive Inference:<br />It is an interpretation of probability that emphasizes the prediction of future observations based on past observations.<br />
  17. 17. Regression <br />
  18. 18. Regression<br />Or linear regression refers to any approach to modeling the relationship between one or more variables denoted y and one or more variables denoted X, such that the model depends linearly on the unknown parameters to be estimated from the data. Such a model is called a &quot;linear model.&quot; Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of y given X, rather than on the joint probability distribution of y and X, which is the domain of multivariate analysis.<br />