chaitra-1.pptx fake news detection using machine learning
Case study: Probability and Statistic
1. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
1
Assignment Content
I. PURPOSE OF SURVEY....................................................................................................2
1. Objective.........................................................................................................................2
2. Domain: (ITC students from year 1 to year 5.)...............................................................2
3. Data.................................................................................................................................2
II. ANALYSIS OF DATA ......................................................................................................3
1. THEORY AND ASSUMPTION....................................................................................3
2. TEST HYPOTHESIS AND CONFIDENCE INTERVAL CONDUCTING.................4
III. STATISTICAL ANALYSIS ..........................................................................................5
a) For students in year 𝟏:.................................................................................................5
b) For students in year 2:.................................................................................................6
c) For students in year 3:.................................................................................................7
d) For students in year 4:.................................................................................................7
e) For students in year 5:.................................................................................................8
f) For entire students in ITC: ..........................................................................................8
IV. CONCLUSION...............................................................................................................9
3. Our interest in this case study:........................................................................................9
2. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
2
I. PURPOSE OF SURVEY
1. Objective
We decided to conduct this survey in order to estimate the percentage of students in
ITC using smartphone in various models especially iPhone.
We aim to use the result from this survey to:
Estimate the true proportion of students in ITC using some smartphone models from
year 1 to 5 as well as the entire institute.
Estimate the tendency of ITC’s students in using smartphones from first year to fifth
year
Conduct the hypothesis whether or not the true proportion of students using iPhone in
each year and all students in ITC exceed 50%?
2. Domain: (ITC students from year 1 to year 5.)
3. Data
The data are collected by surveying directly with the students (Survey sheet) and
doing online (Online survey sheet).
Case Study: The Telephone Preference in Institute of Technology of Cambodia
3. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
3
Problems:
In order to know the preference of the student in ITC in using Smartphone, we decided to do
a survey among 307 students from year 1st
to year 5th
randomly. After surveying, the results
showed that the numbers of students who use iPhone, Samsung, Nokia, Sony and other
models is respectively listed as the table below.
Data of students using different phones models
Phone Type
Students
iPhone Samsung Nokia Sony other Total
Year 1 52 18 5 0 7 82
Year 2 45 14 7 4 6 76
Year 3 35 13 6 1 6 61
Year 4 33 6 5 2 5 51
Year 5 31 3 1 0 2 37
Total 196 54 24 7 26 307
* Others= Huawei, LG, ASUS
II. ANALYSIS OF DATA
1. THEORY AND ASSUMPTION
1.1 Estimate the percentage of the students using iPhone in each year and the
entire institute:
The results of the test are in the Binomial’s distribution which hold the pmf
p(x) = 𝑝 𝑥
(1 − p) 𝑛−𝑥
In order to conduct the estimator for the proportion, we apply the Maximum Likelihood
Estimator’s theorem(MLE) for the pmf above, then;
𝑙𝑛[𝑝(𝑥)] = 𝑥𝑙𝑛(𝑝) + (𝑛 − 𝑥)𝑙𝑛(1 − 𝑝)
⇒
𝑑
𝑑𝑝
(ln[𝑝(𝑥)]) =
𝑥
𝑝
−
𝑛 − 𝑥
1 − 𝑝
= 0
52
45
35 33 31
18
14 13
6
35 7 6 5
10
4
1 2 0
7 6 6 5
2
0
10
20
30
40
50
60
Year 1 Year 2 Year 3 Year 4 Year 5
NUMBEROFSTUDENTS
Number of students using various phone models
Iphone Samsung Nokia Sony other
4. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
4
⇔
1
𝑝
− 1 =
𝑛
𝑥
− 1
Thus, from the results in the table we can estimate the proportion of students using iPhone as
following;
The estimations of students using various phones in ITC(%)
Students iPhone Samsung Nokia Sony Others* TOTAL
Year 1 63% 22% 6% 0% 9%
100%
Year 2 59% 18% 9% 5% 8%
Year 3 57% 21% 10% 2% 10%
Year 4 65% 12% 10% 4% 10%
Year 5 84% 8% 3% 0% 5%
Entire ITC 64% 18% 8% 2% 8%
* Others= LG, Huawei, Asus
2. TEST HYPOTHESIS AND CONFIDENCE INTERVAL CONDUCTING
The estimator 𝑝 = 𝑥/𝑛 is unbiased (𝐸(𝑝) = 𝑝) has approximately a normal distribution, and
its standard deviation is 𝜎 𝑝̂ = √𝑝(1 − 𝑝)/𝑛 .
When H0 is true, 𝐸(𝑝) = 𝑝 𝑜 and 𝜎 𝑝̂ = √𝑝 𝑜(1 − 𝑝 𝑜)/𝑛 . so 𝜎 𝑝̂ does not involve any
unknown parameters.
when n is large and H0 is true, the test statistic has
𝑍 =
𝑝 − 𝑝0
√𝑝0(1 − 𝑝0)/𝑛
~𝒩(0,1)
If the alternative hypothesis is 𝐻 𝑎: 𝑝 > 𝑝 𝑎 and the upper-tailed rejection region𝑧 > 𝑧 𝛼 is
used, then
63%
59% 57%
65%
84%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
0 1 2 3 4 5 6Year
The tendancy of students using certain phone models from
year 1 to 5
iPhone Samsung Nokia Sony Others
𝑝 =
𝑥
𝑛
5. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
5
𝑃(𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟 ) = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 )
= 𝑃(𝑧 > 𝑧 𝛼 𝑤ℎ𝑒𝑛 𝑧 ℎ𝑎𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑙𝑦 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛) = 𝛼
Which, 𝑍 𝛼is the critical value that is calculated by using standard normal table.
𝑃(𝑧 < 𝑧 𝛼) = 1 − 𝛼
𝜙(𝑧 𝛼) = 1 − 𝛼
Note: The assumption above is stated in the condition that 𝒏𝒑 𝒐 > 𝟏𝟎 𝒂𝒏𝒅 (𝟏 − 𝒑 𝟎) > 𝟏𝟎
Standardizing 𝑝 by subtracting p and dividing by 𝜎 𝑝̂ then implies that
𝑃
(
−𝑧 𝛼
2
<
𝑝 − 𝑝
√ 𝑝(1 − 𝑝)
𝑛
< 𝑧 𝛼/2
)
≈ 1 − 𝛼
By simplifying this equation, we get the confidence interval for population proportion p with
confidence level approximately 100(1 − 𝛼)%
𝑝̃ − 𝑧 𝛼
2
√ 𝑝̂𝑞̂
𝑛
+
𝑧 𝛼/2
2
4𝑛2
1+
𝑧 𝛼
2
2
𝑛
< 𝑝 < 𝑝̃ +
√𝑝̂ 𝑞̂/𝑛+𝑧 𝛼/2
2 /4𝑛2
1+𝑧 𝛼/2
2 /𝑛
, where 𝑝̃ =
𝑝̂+𝑧 𝛼/2
2
/2𝑛
1+𝑧 𝛼/2
2 /𝑛
III. STATISTICAL ANALYSIS
1.1 Test the proportion of the students in each year using iPhone whether the true
proportion exceed 50% at level 𝜶 = 𝟎. 𝟎𝟓
Test hypothesis 𝐻 𝑜 ∶ 𝑝0 = 0.5 𝑣𝑠 𝐻 𝑎 ∶ 𝑝 𝑎 > 0.5
Test statistic:
𝑍 =
𝑝 − 𝑝
√𝑝(1 − 𝑝)/𝑛
These test procedures are valid provided that 𝒏𝒑 𝟎 ≥ 𝟏𝟎 and 𝒏(𝟏 − 𝒑 𝟎) ≥ 𝟏𝟎
Since 𝒏𝒊 𝒑𝒊 > 𝟏𝟎 𝒂𝒏𝒅 (𝟏 − 𝒑𝒊) > 𝟏𝟎, so the assumption above is true.
Under 𝐻0 𝑧 =
𝑝̂−0.5
√𝑝0(1−0.5)/𝑛
a) For students in year 𝟏:
𝑝 =
𝑥
𝑛
=
52
82
= 0.64
Or 100(1 − 𝛼)% CI = ( 𝑝̃ ± 𝑧 𝛼/2
√ 𝑝̂ 𝑞̂/𝑛+𝑧 𝛼/2
2 /4𝑛2
1+𝑧 𝛼/2
2 /𝑛
)
6. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
6
𝑧 𝛼 = 1.645
𝛼
𝑧 = 1.61
𝑧 =
𝑝̂−𝑝0
√𝑝(1−𝑝)/𝑛
=
0.64−0.5
√0.5(1−0.5)/82
=2.536
Find 𝒛∝ at level ∝= 𝟎. 𝟎𝟓:
P (𝑧∝)=1- 𝛼 = 1 − 0.05 = 0.95
From table 𝑧 𝛼 = 1.645
Rejection region RR={𝑧|𝑧 > 𝑧 𝛼}={𝑧|𝑧 > 1.645}
Since 𝑧 = 5.36 > 𝑧 𝛼 = 1.645 , we reject 𝐻0
Conclusion: There are strong evidence to conclude that the true proportion of
students using iPhone in year 1 exceeds 50%.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
We have;
𝑝̃ ± 𝑧 𝛼/2
√𝑝̂ 𝑞̂/𝑛+𝑧 𝛼/2
2 /4𝑛2
1+𝑧 𝛼/2
2 /𝑛
, where 𝑝̃ =
𝑝̂+𝑧 𝛼/2
2
/2𝑛
1+𝑧 𝛼/2
2 /𝑛
𝑝̃ =
0.63 + 1.962
/2 × 82
1 + 1.962/82
= 0.62
𝑧 𝛼/2
√𝑝̂ 𝑞̂/𝑛+𝑧 𝛼/2
2 /4𝑛2
1+𝑧 𝛼/2
2 /𝑛
= 1.96
√0.63×0.47+1.962/4×822
1+1.962/82
=0.11
b) For students in year 2:
𝑝 =
𝑥
𝑛
=
45
76
= 0.59
𝑧 =
𝑝 − 𝑝0
√𝑝(1 − 𝑝)/𝑛
=
0.59 − 0.5
√0.5(1 − 0.5)/76
= 1.61
Rejection Region 𝑅𝑅 = {𝑧|𝑧 > 𝑧 𝛼} = {𝑧|𝑧 > 1.645}
Since 𝑧 ∉ 𝑅𝑅, so we do not reject H0
Conclusion: There is no compelling evidence to conclude that the true proportion of
students in year 2 using iPhone exceed 50% at level 𝛼 = 0.05.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
𝑝̃ =
0.59 + 1.962
/2 × 76
1 + 1.962/76
= 0.58
𝑧 𝛼 = 1.645
𝛼
𝑧 = 2.536
⇒ 95%CI(p) = (𝟎. 𝟓𝟏, 𝟎. 𝟕𝟑)
7. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
7
𝑧 𝛼 = 1.645
𝛼
𝑧 = 2.536
𝑧 𝛼 = 1.645
𝛼
𝑧 = 1.15
𝑧 𝛼/2
√ 𝑝 𝑞̂/𝑛 + 𝑧 𝛼/2
2
/4𝑛2
1 + 𝑧 𝛼/2
2
/𝑛
=
√0.59 × 0.41 + 1.962/4 × 762
1 + 1.962/76
= 0.11
c) For students in year 3:
𝑝 =
𝑥
𝑛
=
35
62
= 0.57
𝑧 =
𝑝 − 𝑝0
√𝑝(1 − 𝑝)/𝑛
=
0.57 − 0.5
√0.5(1 − 0.5)/61
= 1.15
Rejection Region 𝑅𝑅 = {𝑧|𝑧 > 𝑧 𝛼} = {𝑧|𝑧 > 1.645}
Since 𝑧 ∉ 𝑅𝑅, so we do not reject H0
Conclusion: There is no compelling evidence to conclude that the true proportion of
students in year 3 using iPhone exceed 50% at level 𝛼 = 0.05.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
𝑝̃ =
0.57 + 1.962
/2 × 61
1 + 1.962/61
= 0.56
𝑧 𝛼/2
√ 𝑝 𝑞̂/𝑛 + 𝑧 𝛼/2
2
/4𝑛2
1 + 𝑧 𝛼/2
2
/𝑛
= 1.96
√0.57 × 0.43 + 1.962/4 × 612
1 + 1.962/61
= 0.12
d) For students in year 4:
𝑝 =
𝑥
𝑛
=
33
51
= 0.65
𝑧 =
𝑝 − 𝑝0
√𝑝(1 − 𝑝)/𝑛
=
0.65 − 0.5
√0.5(1 − 0.5)/51
= 2.10
Rejection Region 𝑅𝑅 = {𝑧|𝑧 > 𝑧 𝛼} = {𝑧|𝑧 > 1.645}
Since 𝑧 ∈ 𝑅𝑅, so we reject H0
Conclusion: There is compelling evidence to conclude that the true proportion of
students in year 4 using iPhone exceed 50% at level 𝛼 = 0.05.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
⇒ 95%CI(p) = (𝟎. 𝟒𝟒, 𝟎. 𝟔𝟖)
⇒ 95%CI(p) = (0.47, 0.69)
8. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
8
𝑧 𝛼 = 1.645
𝛼
𝑧 = 4.11
𝑧 𝛼 = 1.645
𝛼
𝑧 = 4.85
𝑝̃ =
0.53 + 1.962
/2 × 82
1 + 1.962/82
= 0.64
𝑧 𝛼/2
√ 𝑝 𝑞̂/𝑛 + 𝑧 𝛼/2
2
/4𝑛2
1 + 𝑧 𝛼/2
2
/𝑛
= 1.96
√0.53 × 0.47 + 1.962/4 × 822
1 + 1.962/82
= 0.13
e) For students in year 5:
𝑝 =
𝑥
𝑛
=
31
37
= 0.84
𝑧 =
𝑝 − 𝑝0
√𝑝(1 − 𝑝)/𝑛
=
0.84 − 0.5
√0.5(1 − 0.5)/37
= 4.11
Rejection Region 𝑅𝑅 = {𝑧|𝑧 > 𝑧 𝛼} = {𝑧|𝑧 > 1.645}
Since 𝑧 ∈ 𝑅𝑅, so we reject H0
Conclusion: There is compelling evidence to conclude that the true proportion of
students in year 5 using iPhone exceed 50% at level 𝛼 = 0.05.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
𝑝̃ =
0.84 + 1.962
/2 × 37
1 + 1.962/37
= 0.81
𝑧 𝛼/2
√ 𝑝 𝑞̂/𝑛 + 𝑧 𝛼/2
2
/4𝑛2
1 + 𝑧 𝛼/2
2
/𝑛
= 1.96
√0.84 × 0.16 + 1.962/4 × 372
1 + 1.962/37
= 0.63
f) For entire students in ITC:
𝑝 =
𝑥
𝑛
=
196
307
= 0.64
𝑧 =
𝑝 − 𝑝0
√𝑝(1 − 𝑝)/𝑛
=
0.64 − 0.5
√0.5(1 − 0.5)/307
= 4.85
Rejection Region 𝑅𝑅 = {𝑧|𝑧 > 𝑧 𝛼} = {𝑧|𝑧 > 1.645}
Since 𝑧 ∈ 𝑅𝑅, so we reject H0
⇒ 95%CI(p) = (𝟎. 𝟔𝟖, 𝟎. 𝟗𝟒)
⇒ 95%CI(p) = (𝟎. 𝟓𝟏, 𝟎. 𝟕𝟕)
9. Institute of Technology of Cambodia 2016-2017 Case study: Smartphone Preference
1). Khen Chanthorn 2). Kech Sengthai 3). Ken Keomhong 4). Kam Chanreaksmee
5). Kry Reothea
9
Conclusion: There is enough evidence to conclude that the true proportion of students
in ITC using iPhone from year 1 to year 5 exceed 50% at level 𝛼 = 0.05.
Confidence Interval for proportion at level 𝜶 = 𝟎. 𝟎𝟓
𝑝̃ =
0.53 + 1.962
/2 × 82
1 + 1.962/82
= 0.64
𝑧 𝛼/2
√ 𝑝 𝑞̂/𝑛 + 𝑧 𝛼/2
2
/4𝑛2
1 + 𝑧 𝛼/2
2
/𝑛
= 1.96
√0.53 × 0.47 + 1.962/4 × 822
1 + 1.962/82
= 0.05
IV. CONCLUSION
After doing survey and conducting the estimation and test hypothesis, we found that the
most popular smartphone model is iPhone. However, the test hypothesis showed that at level
𝛼 = 0.05 _equilibrium the 95% Confidence Interval, the true proportion of students using
iPhone in year 1, year 4 and year 5 exceed 50% whereas there are no enough evidence to
conclude that the true proportion of students in year 2 and year 3 exceed 50%.
3. Our interest in this case study:
Even though this is just a small case study, but it gave us good experiences in doing team
work and the tactic in collecting data. Furthermore, this case study is also the revision of the
lessons that we have learnt.
⇒ 95%CI(p) = (0.59,0.69)