This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 3 (hypothesis testing and t tests).
The data and R script for the lab session can be found here: https://github.com/eugeneyan/Statistical-Inference
This presentation will address the issue of sample size determination for social sciences. A simple example is provided for every to understand and explain the sample size determination.
This presentation will address the issue of sample size determination for social sciences. A simple example is provided for every to understand and explain the sample size determination.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
Standard Error & Confidence Intervals.pptxhanyiasimple
Certainly! Let's delve into the concept of **standard error**.
## What Is Standard Error?
The **standard error (SE)** is a statistical measure that quantifies the **variability** between a sample statistic (such as the mean) and the corresponding population parameter. Specifically, it estimates how much the sample mean would **vary** if we were to repeat the study using **new samples** from the same population. Here are the key points:
1. **Purpose**: Standard error helps us understand how well our **sample data** represents the entire population. Even with **probability sampling**, where elements are randomly selected, some **sampling error** remains. Calculating the standard error allows us to estimate the representativeness of our sample and draw valid conclusions.
2. **High vs. Low Standard Error**:
- **High Standard Error**: Indicates that sample means are **widely spread** around the population mean. In other words, the sample may not closely represent the population.
- **Low Standard Error**: Suggests that sample means are **closely distributed** around the population mean, indicating that the sample is representative of the population.
3. **Decreasing Standard Error**:
- To decrease the standard error, **increase the sample size**. Using a large, random sample minimizes **sampling bias** and provides a more accurate estimate of the population parameter.
## Standard Error vs. Standard Deviation
- **Standard Deviation (SD)**: Describes variability **within a single sample**. It can be calculated directly from sample data.
- **Standard Error (SE)**: Estimates variability across **multiple samples** from the same population. It is an **inferential statistic** that can only be estimated (unless the true population parameter is known).
### Example:
Suppose we have a random sample of 200 students, and we calculate the mean math SAT score to be 550. In this case:
- **Sample**: The 200 students
- **Population**: All test takers in the region
The standard error helps us understand how well this sample represents the entire population's math SAT scores.
Remember, the standard error is crucial for making valid statistical inferences. By understanding it, researchers can confidently draw conclusions based on sample data. 📊🔍
If you need further clarification or have additional questions, feel free to ask! 😊
---
I've provided a concise explanation of standard error, emphasizing its importance in statistical analysis. If you'd like more details or specific examples, feel free to ask! ¹²³⁴
Source: Conversation with Copilot, 5/31/2024
(1) What Is Standard Error? | How to Calculate (Guide with Examples) - Scribbr. https://www.scribbr.com/statistics/standard-error/.
(2) Standard Error (SE) Definition: Standard Deviation in ... - Investopedia. https://www.investopedia.com/terms/s/standard-error.asp.
(3) Standard error Definition & Meaning - Merriam-Webster. https://www.merriam-webster.com/dictionary/standard%20error.
(4) Standard err
Confidence Interval ModuleOne of the key concepts of statist.docxmaxinesmith73660
Confidence Interval Module
One of the key concepts of statistics enabling statisticians to make incredibly accurate predictions is called the Central Limit Theorem. The Central Limit Theorem is defined in this way:
· For samples of a sufficiently large size, the real distribution of means is almost always approximately normal.
· The distribution of means gets closer and closer to normal as the sample size gets larger and larger, regardless of what the original variable looks like (positively or negatively skewed).
· In other words, the original variable does not have to be normally distributed.
· This is because, if we as eccentric researchers, drew an almost infinite number of random samples from a single population (such as the student body of NMSU), the means calculated from the many samples of that population will be normally distributed and the mean calculated from all of those samples would be a very close approximation to the true population mean. It is this very characteristic that makes it possible for us, using sound probability based sampling techniques, to make highly accurate statements about characteristics of a population based upon the statistics calculated on a sample drawn from that population.
· Furthermore, we can calculate a statistic known as the standard error of the mean (abbreviated s.e.) that describes the variability of the distribution of all possible sample means in the same way that we used the standard deviation to describe the variability of a single sample. We will use the standard error of the mean (s.e.) to calculate the statistic that is the topic of this module, the confidence interval.
The formula that we use to calculate the standard error of the mean is:
s.e. = s / √N – 1
where s = the standard deviation calculated from the sample; and
N = the sample size.
So the formula tells us that the standard error of the mean is equal to the
standard deviation divided by the square root of the sample size minus 1.
This is the preferred formula for practicing professionals as it accounts for errors that may be a function of the particular sample we have selected.
THE CONFIDENCE INTERVAL (CI)
The formula for the CI is a function of the sample size (N).
For samples sizes ≥ 100, the formula for the CI is:
CI = (the sample mean) + & - Z(s.e.).
Let’s look at an example to see how this formula works.
* Please use a pdf doc. “how to solve the problem”, I have provided for you under the “notes” link.
Example 1
Suppose that we conducted interviews with 140 randomly selected individuals (N = 140) in a large metropolitan area. We assured these individuals that their answers would remain confidential, and we asked them about their law-breaking behavior. Among other questions the individuals were asked to self-report the number of times per month they exceeded the speed limit. One of the objectives of the study was to estimate (make an inference about) the average nu.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
Standard Error & Confidence Intervals.pptxhanyiasimple
Certainly! Let's delve into the concept of **standard error**.
## What Is Standard Error?
The **standard error (SE)** is a statistical measure that quantifies the **variability** between a sample statistic (such as the mean) and the corresponding population parameter. Specifically, it estimates how much the sample mean would **vary** if we were to repeat the study using **new samples** from the same population. Here are the key points:
1. **Purpose**: Standard error helps us understand how well our **sample data** represents the entire population. Even with **probability sampling**, where elements are randomly selected, some **sampling error** remains. Calculating the standard error allows us to estimate the representativeness of our sample and draw valid conclusions.
2. **High vs. Low Standard Error**:
- **High Standard Error**: Indicates that sample means are **widely spread** around the population mean. In other words, the sample may not closely represent the population.
- **Low Standard Error**: Suggests that sample means are **closely distributed** around the population mean, indicating that the sample is representative of the population.
3. **Decreasing Standard Error**:
- To decrease the standard error, **increase the sample size**. Using a large, random sample minimizes **sampling bias** and provides a more accurate estimate of the population parameter.
## Standard Error vs. Standard Deviation
- **Standard Deviation (SD)**: Describes variability **within a single sample**. It can be calculated directly from sample data.
- **Standard Error (SE)**: Estimates variability across **multiple samples** from the same population. It is an **inferential statistic** that can only be estimated (unless the true population parameter is known).
### Example:
Suppose we have a random sample of 200 students, and we calculate the mean math SAT score to be 550. In this case:
- **Sample**: The 200 students
- **Population**: All test takers in the region
The standard error helps us understand how well this sample represents the entire population's math SAT scores.
Remember, the standard error is crucial for making valid statistical inferences. By understanding it, researchers can confidently draw conclusions based on sample data. 📊🔍
If you need further clarification or have additional questions, feel free to ask! 😊
---
I've provided a concise explanation of standard error, emphasizing its importance in statistical analysis. If you'd like more details or specific examples, feel free to ask! ¹²³⁴
Source: Conversation with Copilot, 5/31/2024
(1) What Is Standard Error? | How to Calculate (Guide with Examples) - Scribbr. https://www.scribbr.com/statistics/standard-error/.
(2) Standard Error (SE) Definition: Standard Deviation in ... - Investopedia. https://www.investopedia.com/terms/s/standard-error.asp.
(3) Standard error Definition & Meaning - Merriam-Webster. https://www.merriam-webster.com/dictionary/standard%20error.
(4) Standard err
Confidence Interval ModuleOne of the key concepts of statist.docxmaxinesmith73660
Confidence Interval Module
One of the key concepts of statistics enabling statisticians to make incredibly accurate predictions is called the Central Limit Theorem. The Central Limit Theorem is defined in this way:
· For samples of a sufficiently large size, the real distribution of means is almost always approximately normal.
· The distribution of means gets closer and closer to normal as the sample size gets larger and larger, regardless of what the original variable looks like (positively or negatively skewed).
· In other words, the original variable does not have to be normally distributed.
· This is because, if we as eccentric researchers, drew an almost infinite number of random samples from a single population (such as the student body of NMSU), the means calculated from the many samples of that population will be normally distributed and the mean calculated from all of those samples would be a very close approximation to the true population mean. It is this very characteristic that makes it possible for us, using sound probability based sampling techniques, to make highly accurate statements about characteristics of a population based upon the statistics calculated on a sample drawn from that population.
· Furthermore, we can calculate a statistic known as the standard error of the mean (abbreviated s.e.) that describes the variability of the distribution of all possible sample means in the same way that we used the standard deviation to describe the variability of a single sample. We will use the standard error of the mean (s.e.) to calculate the statistic that is the topic of this module, the confidence interval.
The formula that we use to calculate the standard error of the mean is:
s.e. = s / √N – 1
where s = the standard deviation calculated from the sample; and
N = the sample size.
So the formula tells us that the standard error of the mean is equal to the
standard deviation divided by the square root of the sample size minus 1.
This is the preferred formula for practicing professionals as it accounts for errors that may be a function of the particular sample we have selected.
THE CONFIDENCE INTERVAL (CI)
The formula for the CI is a function of the sample size (N).
For samples sizes ≥ 100, the formula for the CI is:
CI = (the sample mean) + & - Z(s.e.).
Let’s look at an example to see how this formula works.
* Please use a pdf doc. “how to solve the problem”, I have provided for you under the “notes” link.
Example 1
Suppose that we conducted interviews with 140 randomly selected individuals (N = 140) in a large metropolitan area. We assured these individuals that their answers would remain confidential, and we asked them about their law-breaking behavior. Among other questions the individuals were asked to self-report the number of times per month they exceeded the speed limit. One of the objectives of the study was to estimate (make an inference about) the average nu.
What is statistical analysis? It's the science of collecting, exploring and presenting large amounts of data to discover underlying patterns and trends. Statistics are applied every day – in research, industry and government – to become more scientific about decisions that need to be made.
In the previous lesson we discussed a measure of location known as the measure of central tendency. There are other measures of location which are useful in describing the distribution of the data set. These measures of location include the maximum, minimum, percentiles, deciles and quartiles. How to compute and interpret these measures are also discussed in this lesson.
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 4 (statistical power, ANOVA, and post hoc tests).
The data and R script for the lab session can be found here: https://github.com/eugeneyan/Statistical-Inference
📺Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.1: Estimating a Population Proportion
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.1: Estimating a Population Proportion
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
Answer the questions in one paragraph 4-5 sentences.
· Why did the class collectively sign a blank check? Was this a wise decision; why or why not? we took a decision all the class without hesitation
· What is something that I said individuals should always do; what is it; why wasn't it done this time? Which mitigation strategies were used; what other strategies could have been used/considered? individuals should always participate in one group and take one decision
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical distributions. In statistical terms, the sample meanfrom a group of observations is an estimate of the population mean. Given a sample of size n, consider n independent random variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean and standard deviation. The sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of items taken from a population. For example, if you are measuring American people’s weights, it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the weights of every person in the population. The solution is to take a sample of the population, say 1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are going to analyze. In statistical terminology, it can be defined as the average of the squared differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
· Determine the mean
· Then for each number: subtract the Mean and square the result
· Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
· Next we need to divide by the number of data points, which is simply done by multiplying by "1/N":
Statistically it can be stated by the following:
·
· This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each b.
Recommender Systems: Beyond the user-item matrixEugene Yan Ziyou
Recommendation systems. They're a pretty old topic that started way back in the 1990s.
A meetup on it sounds like it'll be boring... if we only talked about the standard user-item matrix collaborative filtering on big data systems.
Thankfully, for this meetup, we'll be sharing on how we can adopt some more recent techniques to recommend products, including social media graphs (and random walks), sequences (and NLP), and PyTorch. The sharing will cover everything starting from data acquisition and preparation, implementation of multiple techniques, and result comparisons. Some familiarity with Python and PyTorch would be useful; minimal math required.
Healthcare expenditure is set to rise over the coming years. Cost will undoubtedly influence patients’ decision-making when it comes to diagnosis and treatment.
For healthcare providers, providing up-front cost estimates improves patient experience, making patients more willing to return (if required) in the future. For patients, having accurate pre-admission estimates allow for informed decisions and adequate preparation, reducing payment challenges after treatment. Ultimately, this case is a first step towards (i) standardization of healthcare cost estimation and (ii) price transparency to build trust between healthcare providers, payers, and patients.
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsEugene Yan Ziyou
- Scaling across multiple properties while centralising capabilities
- How to decide what to centralise / decentralise?
- Alibaba & Grab: How do they scale across multiple commerce sites?
- SuperApps in China and Southeast Asia
- Why / why not go the SuperApp approach?
- WeChat & Grab: SuperApps of Asia
- Case Study: Alibaba’s playbook for integrating acquisitions (Lazada and Daraz)
- What were the key tactics and priorities?
- Lessons learnt
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Eugene Yan Ziyou
Sharing about how Lazada overcame our challenges with scaling and having a proper data culture at the Big Data and Analytics Innovation Summit Singapore 2018
INSEAD Sharing on Lazada Data Science and my JourneyEugene Yan Ziyou
Sharing about how Lazada applies data science to improve customer and seller experience, and my personal journey to my current role in Lazada as Data Science Lead, VP
Here are the values and culture Lazada Data Science lives by daily to fulfil our mission of using data to serve our buyers, sellers, and Lazadians. If this appeals to you, reach out to me!
How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou
Slides from sharing at Strata + Hadoop Singapore 2016 (http://conferences.oreilly.com/strata/hadoop-big-data-sg/public/schedule/detail/54542)
Ecommerce has enabled retailers to make all of their products available to consumers and consumers to access niche products not found in brick-and-mortar stores. This growth provides consumers with unparalleled choice. Nonetheless, the sheer number of products brings with it the challenge of helping users find relevant products with ease.
Lazada has tens of millions of products on its platform, and this number grows by approximately one million monthly. Lazada’s challenge: How can we help users easily discover good quality products they will like? How can we ensure product selection remains fresh and constantly updated?
One way to do this is through the ranking of products. Via ranking, Lazada helps customers easily find products that will delight them by ensuring these products appear in the first few pages. I’ll share how Lazada ranks products on our website. (Note: Google “how amazon ranks products” for some industry background)
Topics include how we:
* Develop methodology (and tricks) to solve not-so-well-defined problems
* Collect and store user-behavior data from our website and app
* Clean and prepare the data (e.g., handling outliers)
* Discover and create features useful features
* Build models to improve customer experience and meet business objectives
* Measure and test outcomes on our website
* Built this end-to-end on our Hadoop infrastructure, with tools including Kafka and Spark
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
Was invited to share with the SMU Masters of IT in Business students on (i) how I got to my current position as a data scientist and (ii) what I do in my current position.
Includes suggested areas to focus on (e.g., distributed systems and processing) and how to gain more experience (e.g., volunteering). I also go through the problems that we solve at Lazada using machine learning and a high level architecture of how we do it.
Garuda Robotics x DataScience SG Meetup (Sep 2015)Eugene Yan Ziyou
What exactly goes on in the commercial drone/UAV industry in Singapore and globally? Behind the hype of consumer “selfie” drones lies a vast number of interesting commercial applications, where drones become an enabler for enterprises to gain new aerial perspectives of their facilities and estates, to make intelligent decisions incorporating this additional dimension of data.
In this presentation, we will look at one such drones-at-work application to reveal some of the behind-the-scene processes and technologies employed. Specifically, we will dive into the precision agriculture domain and share some of the computer vision problems we face, and take a look at various potential solutions to these challenges.
DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
Our team achieved 85th position out of 3,514 at the very popular Kaggle Otto Product Classification Challenge. Here's an overview of how we did it, as well as some techniques we learnt from fellow Kagglers during and after the competition.
Here's a summarised version of the slides shared by Nielsen at the DataScience SG meetup on 20 Apr 2015. Thanks to our generous speakers for sharing on their data science endeavours =D
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
A Study on the Relationship between Education and Income in the USEugene Yan Ziyou
What is the relationship between education and income? Is education truly the great equalizer or do factors such as gender and family income at the age of 16 affect current income?
As part of the Coursera Data Analysis and Statistical Inference course, these issues were examined using data from the US General Social Survey in R.
Diving into Twitter data on consumer electronic brandsEugene Yan Ziyou
Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then analysed in R.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
2. Central Limit Theorem
What is the mean height (𝜇) of all primary school children in Singapore?
Sample =
Anderson Primary
Population = All
primary school
children in SG
Sample = Damai
Primary
Sample = Red
Swastika Primary
Sample =
Zhenghua Primary
𝒙 𝑨𝒏𝒅𝒆𝒓𝒔𝒐𝒏 𝑷𝒓𝒊𝒎𝒂𝒓𝒚 = Mean height of
100 children from Anderson Primary
𝒙 𝑫𝒂𝒎𝒂𝒊 𝑷𝒓𝒊𝒎𝒂𝒓𝒚 = Mean height of 100
children from Damai Primary
𝒙 𝑹𝒆𝒅 𝑺𝒘𝒂𝒔𝒕𝒊𝒌𝒂 = Mean height of 100
children from Red Swastika Primary
𝒙 𝒁𝒉𝒆𝒏𝒈𝒉𝒖𝒂 𝑷𝒓𝒊𝒎𝒂𝒓𝒚= Mean height of 100
children from Zhenghua Primary
𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑒𝑎𝑛 ℎ𝑒𝑖𝑔ℎ𝑡 ~ 𝑁(𝑚𝑒𝑎𝑛 = 𝜇, 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 =
𝜎
100
)
…
…
…
From the sampling
distribution:
Mean( 𝑥) ≈ 𝜇
SD( 𝑥) < 𝜎
− As sample size
increases, SD
decreases
3. Central Limit Theorem (CLT)
The distribution of sample statistics (e.g., mean) is approximately
normal, regardless of the underlying distribution, with mean =
𝜇 and variance =
𝜎2
𝑁
𝒙 ~ 𝑵(𝒎𝒆𝒂𝒏 = 𝝁, 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 =
𝝈
𝒏
)
Further experimentation: http://bitly.com/clt_mean
Distribution is
normal
Sample mean =
population mean
Sample sd = population
sd divided by square root
of sample size
Applet source: Mine Çetinkaya-Rundel, Duke University
4. Conditions for CLT
Independence: Sampled observations must be independent:
− Random sample/assignment
− If sampling without replacement, n < 10% of population
Sample Size/Skew:
− Population should be normal
− If not, sample size should be large (rule of thumb: n > 30)
5. Confidence Interval
An interval estimate of a
population parameter
− Computed as sample mean +/- a
margin of error
𝑥 ± 𝑧 × 𝑆𝐸, where SE =
𝑠
𝑛
− 95% confidence interval would
contain 95% of all values and would
be 𝑥 ± 2𝑆𝐸 or 𝑥 ± 1.96 ×
𝑠
𝑛
𝑪𝑳𝑻: 𝒙 ~ 𝑵(𝒎𝒆𝒂𝒏 = 𝝁, 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 =
𝝈
𝒏
)
6. Confidence Interval
You have taken a random sample of 100 primary school children in
Singapore. Their heights had mean = 150cm and sd = 10cm.
Estimate the true average height of primary school children based on
this sample using a 95% confidence interval.
We are 95% confident that primary school children mean height is
between 148.04cm and 151.96cm
Confidence Interval: 𝑥 ± 𝑧 × 𝑆𝐸
𝑛 = 100
𝑥 = 150
𝑠𝑑 = 10
𝑆𝐸 =
𝑠𝑑
𝑛
=
10
100
= 1
𝑥 ± 𝑧 × 𝑆𝐸 = 150 ± 1.96 × 1
= 150 ± 1.96
= (148.04, 151.96)
7. Required sample size for margin of error
Given a target margin of error and confidence level, and information
on the standard deviation of sample (or population), we can work
backwards to determine the required sample size.
Previous measurements of primary school children heights show sd =
15cm. What should be the sample size in order to get a 95%
confidence interval with a margin of error less than or equal 1cm?
Margin of error: ≤ 1𝑐𝑚
Confidence level: 95%
𝑧 = 1.96
𝑠𝑑 = 15
𝑀𝐸 = 𝑧 × 𝑆𝐸
1 = 1.96 ×
15
𝑛
𝑛 = (
1.96 × 15
1
)2
𝑛 = (29.4)2 = 864.36
Thus, we need a sample size of at
least 865 primary school children
8. Hypothesis Testing
Null hypothesis 𝐻0
− The status quo that is assumed to be true
Alternative hypothesis (𝐻 𝑎)
− An alternative claim under consideration that will require statistical
evidence to accept, and thus, reject the null hypothesis
We will consider 𝐻0 to be true and accept it unless the evidence
in favour of 𝐻 𝑎 is so strong that we reject 𝐻0 in favour of 𝐻 𝑎.
9. Hypothesis Testing
Earlier, we found the sample of 100 primary school children had
mean height = 150cm and sd = 10cm. Based on this statistic,
does the data support the hypothesis that primary school children
on average are shorter than 151cm?
𝐻0: μ = 151 #primary school students have mean height = 151
𝐻 𝑎: 𝜇 < 151 #primary school students have mean height < 151
10. P-value
Probability of obtaining the observed result or results that are
more “extreme”, given that the null hypothesis is true
− P(observed or more extreme outcome | 𝐻0 is true)
− If the p-value is low (i.e., lower than the significance level (𝛼), usually 5%),
then we say that it is very unlikely to observe the data if the null
hypothesis was true, and reject 𝐻0
− If the p-value is high (i.e., higher than 𝛼), we say that it is likely to observe
the data even if the null hypothesis was true, and thus do not reject 𝐻0
11. Hypothesis Testing and P-value
Recall that the sample of 100 primary school children had mean
height = 150cm and sd = 10cm. Also take sig. level = 0.05
𝑥 = 150cm; sd = 10cm; SE =
10
100
= 1 #what we know from the sample
𝑋 ~𝑁(𝜇 = 151, 𝑆𝐸 = 1) #null hypothesis of the population
Test Statistic:
𝑍 =
150 − 151
1
= −1
P-value:
𝑃 𝑍 < −1 = 1 − 0.8413
= 0.1587
Since p-value is higher than 0.05,
we do not reject 𝐻0
12. 𝜇 = 151150
0.1587
Hypothesis Testing and P-value
Interpreting p-value
− If in fact, primary school children have mean height of 151cm, there is a
15.9% chance that a random sample of 100 children would yield a sample
mean of 150cm or lower
− This is a pretty high probability
− Thus, the sample mean of 150 could have
likely occurred by chance
13. Two-sided Hypothesis Testing
What is the probability that the children have mean height different
from 151cm?
𝐻0: μ = 151 #primary school students have mean height = 151
𝐻 𝑎: 𝜇 ≠ 151 #primary school students have mean height ≠ 151
P-value:
𝑃 𝑍 < −1 + 𝑃 𝑍 > 1
= 2 × 1 − 0.8413
= 0.3174
𝜇 = 151150
0.1587 0.1587
152
14. Hypothesis Testing and Confidence Intervals
If the confidence interval contains the null value, don’t reject 𝐻0. If
the confidence interval does not contain the null value, reject 𝐻0.
− Previously, we found the 95% confidence interval for heights of primary
school children to be (148, 152). Given that our null hypothesis(𝐻0 =
151cm) falls within this 95% CI, we do not reject it.
A two-sided hypothesis with significance level 𝛼 is equivalent to a confidence
interval with 𝐶𝐿 = 1 − 𝛼
A one-sided hypothesis with a significance level 𝛼 is equivalent to a
confidence interval with 𝐶𝐿 = 1 − 2𝛼
148 cm 152 cm
95% confident that the average
height is between 148 and 152 cm
15. Decision Errors
Which error is worse to commit (in a research/business context)?
− Type II: Declaring the defendant innocent when they are actually guilty
− Type I: Declaring the defendant guilty when they are actually innocent
“Better that ten guilty persons escape than that one innocent suffer”
- William Blackstone
Fail to reject 𝐻0 Reject 𝐻0
𝐻0 is True Type I error
𝐻0 is False Type II error
16. Type I Error rate
We reject 𝐻0 when the p-value is less than 0.05 (𝛼=0.05)
− I.e., Should 𝐻0 actually be true, we do not want to incorrectly reject it
more than 5% of the time
− Thus, using a 0.05 significance level is equivalent to having a 5% chance
of making a Type I error
Choosing significance levels
− If Type I Error is costly, we choose a lower significance level (e.g., 0.01)
− E.g., spam filtering
− If Type II Error is costly, we choose a higher significance level (e.g., 0.10)
− E.g., airport baggage screening
Fail to reject 𝐻0 Reject 𝐻0
𝐻0 is True Type I error (𝛼)
𝐻0 is False Type II error (𝛽)
17. Student’s t Distribution
According to CLT, the distribution of sample statistics is
approximately normal, if:
− Population is normal
− Sample size is large (n > 30)
If so, we can use the population sd (𝜎) to compute a z-score
However, sample sizes are sometimes small and we often do not
know the standard deviation of the population (𝜎)
− Thus, the normal distribution may not be appropriate
Thus, we rely on the t distribution
18. Shape of the t distribution
Bell shaped but thicker tails than the normal
− Thus, observations are more likely to fall beyond 2sd from the mean
− The thicker tails are helpful in adjusting for the less reliable data on the
standard deviation (when n is small and/or 𝜎 is unknown)
19. Shape of the t distribution
Has one parameter, degrees of freedom (df), which determines
the thickness of the tails
− df refers to the number of independent observations in data set
− Number of independent observations = sample size minus 1
− E.g., in a sample size of 8, there are (8-1) degrees of freedom
What happens to the shape of the
t distribution when df increases?
− It approaches the normal distribution
20. When to use the t distribution
In general, we use the t distribution when:
− N is small (n < 30) and/or;
− 𝜎 is unknown
However, nowadays, our sample sizes are usually above 30
− Thus, why bother with the t distribution?
− Because 95% of the world prefers the t distribution to the normal and
you’ll definitely encounter it eventually
− If you’re unsure, use the t distribution since it approximates to the normal
distribution with large sample sizes
21. Independent and Dependent t-tests
When to use independent and dependent t-tests?
− Dependent: when evaluating the effect between two related samples
− You feed a group of 100 people fast food everyday
− Did they gain weight after 30 days?
− Independent: when evaluating the effect between two independent samples
− You feed 50 males and 50 females fast food everyday
− Did males or females gain more weight after 30 days?
You conduct a study with two groups and have them exercise three
times a day for 30 days (group A = crossfit, group B = yoga).
− How would you test the difference between crossfit and yoga participants?
− How would you test the difference in weight between day 0 and day 30 for
yoga participants?
22. Effect Size
When samples become large enough, you often get significant results
− However, is it practically significant?
Effect size is a simple way to quantify difference between two groups
− Emphasizes the size of the difference (without effect of sample size)
− Cohen’s d is one of the most common ways to measure effect size
Effect size:
Proper calculation for 𝑆𝐷 𝑝𝑜𝑜𝑙𝑒𝑑:
Simple calculation for 𝑆𝐷 𝑝𝑜𝑜𝑙𝑒𝑑:
23. Time for practice
In this lab session we will cover:
− Independent t-tests
− Dependent (paired) t-tests
− Effect size (Cohen’s d)
GitHub repository: https://github.com/eugeneyan/Statistical-Inference