SlideShare a Scribd company logo
1 of 14
YOUNG INDIA FELLOWSHIP




                        Statistics Course
                                        Group Project
                                                Members :

                                              Abhishek Chopra
                                              Adhiraj Sarmah,
                                                Kshitij Garg
                                              Mahesh Jakhotia
                                          Tulasi Prasad Chaudhary




                                                 7/25/2011




The group project is based on real case study taken from the Atlanta primary school test papers. The growing
pressure among the teachers to improve the test performance of their classes has resulted in malpractices. We
have to find out the methodologies to find out the fraud if done in the following case.
Contents



  1) Problem Statement                                            2


  2) Logical Analysis                                             2-4


  3) Inference                                                    4


  4) Our Interpretation of the Cheating Process                    4


  5) Statistical Approaches                                       5


  6) ANOVA                                                        5


  7) Pictorial Method                                             8


  8) The Wincoxon Rank Sum Test                                   9


  9) Appendix


         a. Table A.1 : Division of questions into groups based


             on the approach 1 used in ANOVA test                 9


         b. Table A.2 : Class Results                             10


         c. Table A.3: Class B Results                            11


         d. Table A.5 : Class A Results                           12


         e. Table A.5 : Class B Results                           13




                                                     1
GROUP PROJECT STATISTICS – FRAUD DETECTION


Problem Statement: We have been given 2 sets of data of 2 different classrooms and we are required to
strategize and analyze to eventually determine whether there was a teacher fraud in one or both of the
classrooms.

There can be 4 different scenarios:
1) Both A & B data have been tampered.
2) Both A & B data have not been tampered.
3) A is Fraud, B is Not
4) B is Fraud, A is Not

We have summarized our thought processes in the following document and demonstrated them through the
help of excel sheets attached in the folder. We have used various approaches to derive the solution. Each
and every methodology has its own assumptions and its own pros & cons.

Logical Analysis:

       STEP – 1: We calculate the total number of correct answers for every question in both the classes.
       Since we took a student wise-question wise analysis and assign a correct score with the value „1‟, it
       also shows the total number of students who got each question correctly for both the classes

       STEP -2: We then find the Total Number of correct answers of the entire class and divide it by the
       total number of students to arrive at the average mean number of correct answers per student for or
       both the classes.

       STEP – 3: We take the analysis of STEP -2 and then plot line-graphs for both the classes with
       Questions on the X-Axis and Class Performance on the Y-Axis. The analysis of this will provide a
       broad perspective on whether there is any evidence of fraud or not.

       # We found that in Class – A, Questions 30 to Questions 36 clearly show an anomaly.




       STEP-4: We decided to focus on the anomaly region. We analyzed the questions 30-36 and tried to
       see if there were any abnormal patterns in them for both the classes.

                                                     2
# There was very clearly a pattern of answers of exact and uniform correct answers to questions 30-
36 for class A for particular 16 students, which wasn‟t so in Class B.

STEP – 5: We calculated the Average score (i.e. Average no. of correct answers) for each of these
16 students in class A which included questions 30-36. We then found the mean score of these 16
students = 46%.

For Class B, The mean score of all the students is: 38%




                                             3
STEP – 6: We calculated the Average score (ie. Average no. of correct answers) for each of these
      16 students in class A EXCLUDING the questions 30-36. We then found the mean score the 16
      students of Class A, the mean DECREASED to 42% (ie. A decrease of 4%)

      For Class B, The mean score of all the students INCREASED to 40%. (ie. An increase of 2%)

      INFERENCE: Therefore we can say that the set of questions 30 to 36, show reasonable proof
      to believe that some form of cheating/tampering was done in respect to these questions.


      Our interpretation of the Cheating Process

   1) From questions 30 to 36, the graphs present a consistent growth for 16 students from the other
      students from the average growth visually, which can be summed up to 16 x 6 questions, which is
      equal to 96 questions that have been probably tampered with.

   2) The reasons to choose that particular set of questions (from 30 to 36) could be

      a) Since it is given that the level of difficulty increases with the questions it is logical to assume
         that more students would get correct answers for the first half of the questions compared to the
         second half, because the difficulty level would be low at the beginning. In the same manner, the
         second half of the question would be expected to show lesser correct answers as the difficulty
         would be higher.
      b) So it would be logically smart on the teachers part to attempt to tamper/cheat in the second half
         of the questions, since most of the students would be expected to get the correct answers in the
         first half. Even in the second half, it would be smarter to avoid tampering with the last few
         questions since they are the most difficult, and an increased number of correct answers for those
         questions will immediately be easily exposed to detection. So it would be logical to choose
         questions from somewhere within the beginning of second half and significantly before the last
         few questions.

   3) A set of questions which are consecutively chosen for editing also eases the time factor required to
      edit the answers manually, which talks about the limited time available to an invigilator or a teacher
      generally. And 96 questions is a good number of questions to change the entire average of the class
      performance to a significant level which is an increased level of 4 % as we later found from our
      analysis..


Statistical Approaches used:

   1) Anova Method: Initially we divided the classes into groups and applied anova to see if the groups
      have the same distribution or not. If one of the groups did not have the same distribution we could
      conclude that the data of that group was tampered as it disturbed the distribution of the whole class.
      We used two approaches to divide into groups. Later on we used the Tukey Method to find out the
      groups which had a deviated mean.



                                                     4
2) Pictorial distribution: A graph was plotted with the questions on the X axis and the class
      performance on the Y Axis. When we analyzed the class A graph we found out that between the
      questions 30-36 the plot was flat and the results were higher than the performances in the other
      questions. We can conclude on a pictorial basis that fraud has been done in these questions.

   3) The Wilcoxon Rank Sum Test: If we want to use the samples without considering the normal
      assumptions we can use the Rank Sum approach (used for non-normal distribution) discussed in
      section 9.2 of the text book. Since the other tests are based on a lot of mathematical assumptions
      which are not satisfied by the given data, we can use this approach which requires weaker
      mathematical assumptions.


Approach 1 : ANOVA Approach

To compare the means and distributions of various groups, ANOVA is preferred to multiple “t-tests” as
ANOVA leads to a single test statistic for comparing all the means, so the overall risk of type-I error can
be controlled. If we ran many t tests, each at a given alpha level, we couldn‟t know what the overall risk of
a type 1 error is. Certainly the more tests one runs, the greater the risk of a false positive conclusion
somewhere among the tests.

Initially we divided the groups of class A according to the toughness level of the questions. The toughness
level was divided according to the area of right answers answered by the students. For example if the total
number of questions answered by the group is 445. We divided the group into eight groups by classifying
them in to equal areas of (445/8=56). The cumulative sum of total scores in each group is 56.

The data was divided into eight groups. The grouping has been shown in appendix section Table A.1
.Anova test was applied on the above groups to find out if the means of the groups was same or different.



Test Hypothesis

Ho : u1= u2=……u8

Ha : Means are not the same(Thus showing that one or more of the groups have been tampered which
resulted in the varying of its mean from the other groups)

Results for CLASS A

Anova: Single Factor
for CLASS A
SUMMARY
       Groups           Count       Sum   Average Variance
Column 1                        5      52     10.4     16.3
Column 2                        4      47    11.75 7.583333
Column 3                        4      62     15.5 5.666667
Column 4                        6      58 9.666667 8.266667
Column 5                        6      60       10     16.8
Column 6                        6      53 8.833333 48.56667
Column 7                        4      67    16.75 0.916667

                                                     5
Column 8                       9          46 5.111111 8.861111



ANOVA
 Source of Variation      SS         df         MS        F     P-value   F crit
Between Groups         539.6763            7 77.09661 5.076268 0.000444 2.277143
Within Groups          546.7556           36 15.18765

Total                  1086.432           43


ANOVA Results for
CLASS B
 Source of Variation      SS         df         MS        F     P-value   F crit
Between Groups          287.308            7   41.044 3.695126 0.004144 2.277143
Within Groups          399.8738           36 11.10761

Total                  687.1818           43


In the test results we find out that the F Statistic value of “between groups” in class A is 5.07 which is
higher than the critical F value.(2.27). This proves that the null hypothesis Ho that the means are equal can
be rejected.

But a small flaw in this argument is that the samples size of each group is different and this disturbs one of
the basic assumption of the ANOVA that equal variance of groups is required except in the cases when the
groups are of equal sizes. Second point to be noted is that the groups have to be independent. Hence we
have to use a different approach to satisfy the above assumptions. We have now divided the group in such
a way that it entails questions of all difficulty levels. We used a circular approach to divide the questions
into four categories. For example we put questions 1 to 4 in the following four groups. And then questions
5-8 in the following four groups, thus each group had questions of all types making it a homogenous
model. Please refer Appendix Table A.4 - Class A Results for more details of the grouping.

Assumptions for ANOVA:

   1) The sample measurements are selected from a normal population.
   2) The samples are independent.
   3) The unknown population & variance for the measurements from sample I are ui and c2 respectively.

Now let me explain why our current approach in a way satisfies the given assumptions.

The normal approximation is the least crucial. The ANOVA test is a test on means; the central limit
theorem has its effect. The central limit theorem may not work for a small sample size. Hence we have
taken a large sample size per group. 11 is the sample size and we the central limit theorem can be applied
approximately. However one particular alternative is the Kruskal-Wallis Rank Test which is discussed in
the section 10.2 of the textbook. This method can be applied to non-random samples. But since this
methodology was not taught in the class room we leave that solution and focus on using the ANOVA test
for solving the current problem.



                                                      6
The assumption of equal variances is important if the sample sizes are substantially different. But since we
have chosen the same sample size therefore the variance can‟t be a problem over here. When all n‟s are
equal, the effect of even grossly unequal variances is minimal.

Coming to the independence problem, since we have used a homogenous group in which the questions
from easy to tough have been taken therefore the group as a whole is independent from the other groups.
We have used a circular approach to make sure that each group has a homogenous set if questions similar
to the other groups.

Test Hypothesis

Ho : u1= u2=……u8

Ha : Means are not the same(Thus showing that one or more of the groups have been tampered which
resulted in the varying of its mean from the other groups)

        CLASS A Results

         1                2       3         4

        13                9       4         14
        12                10     13         9
        15                19     15         14
        14                14     12         10
        7                 7       8         14
        11                9       9         3
        14                2      12         3
        3                 16     17         16
        16                18     17         2
        8                 9       8         1
        6                 4       6         2

Anova: Single Factor

SUMMARY
    Groups             Count    Sum   Average Variance
Column 1                   11     119 10.81818 17.76364
Column 2                   11     117 10.63636 30.45455
Column 3                   11     121       11       19
Column 4                   11      88        8     34.8



ANOVA
   Source of
   Variation           SS        df         MS        F     P-value   F crit
Between Groups         66.25           3 22.08333 0.865859 0.466735 2.838745
Within Groups       1020.182          40 25.50455

Total               1086.432          43

                                                     7
ANOVA Results
 for Class B
     Source of
     Variation        SS          df         MS         F     P-value   F crit
 Between Groups    51.72727             3 17.24242   1.08536 0.366324 2.838745
 Within Groups     635.4545            40 15.88636

 Total             687.1818            43


Now the ANOVA F test has helped us in just finding whether we need to reject the Ho or not. But rejection
of null hypothesis that means are equal does not indicate specifically which means are not equal. Therefore
we can use the Tukey method to find out the differences among the specified means. By this method we
can specifically point out the group in which the tampering has been done.

Approach 2: The Pictorial Method

We can see that the frequency curve comes out to be normal for class B, but it is skewed on the higher side
in class A. This skewed nature can be attributed to "Tampering or Cheating by the teacher". The mean for
class A (Mean = 20.23) is too high as compared to class B (Mean = 16.78)

 And as seen from the "Question vs. No. of students who attempted it correct " plot we can say that Q-30 to
Q- 36 in class A consists of the tampered data. These questions don't follow the normal trend and show an
increased peak in between of the decreasing curve.

 So, we trim off Q-30 to Q-36 from both the classes and then plot them again for remaining questions. And
we can see that both the curves come out be normal this time and there is no skew nature in class A. The
mean for class A (Mean = 16.32) has also reduced and is now comparable to class B (Mean = 15.44) So it
can be easily said that some tempering was done from Q-30 to Q-36 in class A.




Approach 3 : The Wilcoxon Rank Sum Test

If we want to use the samples without considering the normal assumptions we can use the Rank Sum
approach (used for non-normal distribution) discussed in section 9.2 of the text book. Since the other tests
are based on a lot of mathematical assumptions which are not satisfied by the given data, we can use this
approach which requires weaker mathematical assumptions.

This test requires the following conditions:

   1) Identical distributions but not necessary normal.

The null hypothesis is that the two population distributions are identical. And the alternative test is that the
mean of one of the groups is larger than the other group. If the null hypothesis is rejected it implies that
both the groups are not distributed in an identical way which implies that on one of the groups a fraud has
been done. We can use the critical values and reject the values based on the statistic values.




                                                       8
Here the two groups could be the data from the two classes or the different groups of questions divided in a
homogenous manner. But since this has not been covered in the syllabus we haven‟t done the problem with
this method.

APPENDIX

Table A.1 : Division of questions into groups based on the approach 1 used in ANOVA test


Cumulative     8
  Sum        Groups

     13
     22
     26
     40
     52
     62
     75
     84
     99
    118
    133
    147
    161
    175
    187
    197
    204
    211
    219
    233
    244
    253
    262
    265
    279
    281
    293
    296
    299
    315
    332
    348
    364
    382
    399
    401
    409
    418
    426
    427
    433
    437
    443
    445
                                                     9
Table A.2 : Class A Results

           CLASS A

     Groups             A           B          C            D       E       F       G       H

                         13.00      10.00      19.00        14.00   14.00    2.00   16.00       2
                          9.00      13.00      15.00        12.00   11.00   12.00   16.00       8
                          4.00       9.00      14.00        10.00    9.00    3.00   18.00       9
                         14.00      15.00      14.00         7.00    9.00    3.00   17.00       8
                         12.00                               7.00    3.00   16.00               1
                                                             8.00   14.00   17.00               6
                                                                                                4
                                                                                                6
                                                                                                2


SUMMARY
     Groups             Count       Sum   Average Variance
Column 1                        5      52     10.4     16.3
Column 2                        4      47    11.75 7.583333
Column 3                        4      62     15.5 5.666667
Column 4                        6      58 9.666667 8.266667
Column 5                        6      60       10     16.8
Column 6                        6      53 8.833333 48.56667
Column 7                        4      67    16.75 0.916667
Column 8                        9      46 5.111111 8.861111



ANOVA
 Source of Variation      SS         df         MS        F     P-value   F crit
Between Groups         539.6763            7 77.09661 5.076268 0.000444 2.277143
Within Groups          546.7556           36 15.18765

Total                  1086.432           43




                                                       10
Table A.3: Class B Results


        CLASS B Results

        Groups             A        B          C            D       E    F         G   H

                               13    2.00      15.00         7.00    3    4     4.00   8.00
                                5    6.00      10.00        10.00   10   12    10.00   4.00
                                6    9.00      14.00        12.00   10   10    11.00   5.00
                               14    4.00                    6.00    4    8     6.00   2.00
                                    10.00                            4          1.00   5.00
                                     9.00                           12          3.00   4.00
                                                                                1.00   2.00
                                                                                       3.00
                                                                                       3.00
                                                                                       1.00


Anova: Single Factor

SUMMARY
     Groups               Count     Sum   Average Variance
Column 1                        4      38      9.5 21.66667
Column 2                        6      40 6.666667 10.26667
Column 3                        3      39       13        7
Column 4                        4      35     8.75 7.583333
Column 5                        6      43 7.166667 15.36667
Column 6                        4      34      8.5 11.66667
Column 7                        7      36 5.142857 16.47619
Column 8                       10      37      3.7 4.011111



ANOVA
 Source of Variation      SS         df         MS        F     P-value   F crit
Between Groups          287.308            7   41.044 3.695126 0.004144 2.277143
Within Groups          399.8738           36 11.10761

Total                  687.1818           43




                                                       11
Table A.4 - Class A Results

Class A :

        1               2        3         4

        13              9        4         14
        12              10      13         9
        15              19      15         14
        14              14      12         10
        7               7        8         14
        11              9        9         3
        14              2       12         3
        3               16      17         16
        16              18      17         2
        8               9        8         1
        6               4        6         2



Anova: Single Factor

SUMMARY
    Groups             Count    Sum   Average Variance
Column 1                   11     119 10.81818 17.76364
Column 2                   11     117 10.63636 30.45455
Column 3                   11     121       11       19
Column 4                   11      88        8     34.8



ANOVA
   Source of
   Variation         SS         df         MS        F     P-value   F crit
Between Groups       66.25            3 22.08333 0.865859 0.466735 2.838745
Within Groups     1020.182           40 25.50455

Total             1086.432           43




                                                   12
Table A.5 : Class B Results

             CLASS B



        1                 2        3           4

        13               5         6          14
        2                6         9          4
        10               9        15          10
        14               7        10          12
        6                3        10          10
        4                4        12          4
        12               10        8          4
        10               11        6          1
        3                1         8          4
        5                2         5          4
        2                3         3          1

Anova: Single Factor

SUMMARY
    Groups              Count     Sum       Average    Variance
Column 1                    11       81     7.363636   20.65455
Column 2                    11       61     5.545455   11.27273
Column 3                    11       92     8.363636   11.45455
Column 4                    11       68     6.181818   20.16364



ANOVA
   Source of
   Variation              SS      df         MS            F     P-value   F crit
Between Groups         51.72727         3 17.24242      1.08536 0.366324 2.838745
Within Groups          635.4545        40 15.88636

Total                  687.1818        43




                                                        13

More Related Content

What's hot

Occe2018: Student experiences with a bring your own laptop e-Exam system in p...
Occe2018: Student experiences with a bring your own laptop e-Exam system in p...Occe2018: Student experiences with a bring your own laptop e-Exam system in p...
Occe2018: Student experiences with a bring your own laptop e-Exam system in p...mathewhillier
 
Z-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized DistributionsZ-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized Distributionsjasondroesch
 
Franklin Public Schools: MCAS Presentation 2017
Franklin Public Schools: MCAS Presentation 2017Franklin Public Schools: MCAS Presentation 2017
Franklin Public Schools: MCAS Presentation 2017Franklin Matters
 
Lesson 1 02 data collection and analysis
Lesson 1 02 data collection and analysisLesson 1 02 data collection and analysis
Lesson 1 02 data collection and analysisPerla Pelicano Corpez
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendencyjasondroesch
 
Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Erlinda Rey
 

What's hot (9)

Occe2018: Student experiences with a bring your own laptop e-Exam system in p...
Occe2018: Student experiences with a bring your own laptop e-Exam system in p...Occe2018: Student experiences with a bring your own laptop e-Exam system in p...
Occe2018: Student experiences with a bring your own laptop e-Exam system in p...
 
Z-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized DistributionsZ-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized Distributions
 
Franklin Public Schools: MCAS Presentation 2017
Franklin Public Schools: MCAS Presentation 2017Franklin Public Schools: MCAS Presentation 2017
Franklin Public Schools: MCAS Presentation 2017
 
Lesson 1 02 data collection and analysis
Lesson 1 02 data collection and analysisLesson 1 02 data collection and analysis
Lesson 1 02 data collection and analysis
 
Chapter iv & v
Chapter iv & vChapter iv & v
Chapter iv & v
 
Lesson 2
Lesson 2Lesson 2
Lesson 2
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
 
Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13
 
Basics of SPSS, Part 1
Basics of SPSS, Part 1Basics of SPSS, Part 1
Basics of SPSS, Part 1
 

Similar to Statistics group project_Fraud Detection

WEEK 7 – EXERCISES Enter your answers in the spaces pr.docx
WEEK 7 – EXERCISES Enter your answers in the spaces pr.docxWEEK 7 – EXERCISES Enter your answers in the spaces pr.docx
WEEK 7 – EXERCISES Enter your answers in the spaces pr.docxwendolynhalbert
 
local_media6355515740080111993.pptx
local_media6355515740080111993.pptxlocal_media6355515740080111993.pptx
local_media6355515740080111993.pptxJack459165
 
Midterm Exam The purpose of this examination is t
Midterm Exam        The purpose of this examination is tMidterm Exam        The purpose of this examination is t
Midterm Exam The purpose of this examination is tsimisterchristen
 
PAGE 5 Ryerson University Daphne Coc
 PAGE 5  Ryerson University Daphne Coc PAGE 5  Ryerson University Daphne Coc
PAGE 5 Ryerson University Daphne CocMoseStaton39
 
CTA Algebra Comparative Pilot Study
CTA Algebra Comparative Pilot StudyCTA Algebra Comparative Pilot Study
CTA Algebra Comparative Pilot StudyMuteti Mutie
 
Chapter NineShow all workProblem 1)A skeptical paranorma.docx
Chapter NineShow all workProblem 1)A skeptical paranorma.docxChapter NineShow all workProblem 1)A skeptical paranorma.docx
Chapter NineShow all workProblem 1)A skeptical paranorma.docxneedhamserena
 
Final Project ScenarioA researcher has administered an anxiety.docx
Final Project ScenarioA researcher has administered an anxiety.docxFinal Project ScenarioA researcher has administered an anxiety.docx
Final Project ScenarioA researcher has administered an anxiety.docxAKHIL969626
 
Show all workProblem 1)A skeptical paranormal researcher cla.docx
Show all workProblem 1)A skeptical paranormal researcher cla.docxShow all workProblem 1)A skeptical paranormal researcher cla.docx
Show all workProblem 1)A skeptical paranormal researcher cla.docxboadverna
 
Data File 5Chapter NineShow all workProblem 1)A skeptica.docx
Data File 5Chapter NineShow all workProblem 1)A skeptica.docxData File 5Chapter NineShow all workProblem 1)A skeptica.docx
Data File 5Chapter NineShow all workProblem 1)A skeptica.docxtheodorelove43763
 
tutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychologytutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychologytutor2u
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVAMahsa Farahanynia
 
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...Creative Problem Solving Model for Promoting Achievement among Higher Seconda...
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...QUESTJOURNAL
 
Use the 5-step procedure for all problemsEach of the 5 steps.docx
Use the 5-step procedure for all problemsEach of the 5 steps.docxUse the 5-step procedure for all problemsEach of the 5 steps.docx
Use the 5-step procedure for all problemsEach of the 5 steps.docxdickonsondorris
 
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...Thiyagu K
 
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...iosrjce
 
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docx
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docxPSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docx
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docxpotmanandrea
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Day 11 t test for independent samples
Day 11 t test for independent samplesDay 11 t test for independent samples
Day 11 t test for independent samplesElih Sutisna Yanto
 

Similar to Statistics group project_Fraud Detection (20)

WEEK 7 – EXERCISES Enter your answers in the spaces pr.docx
WEEK 7 – EXERCISES Enter your answers in the spaces pr.docxWEEK 7 – EXERCISES Enter your answers in the spaces pr.docx
WEEK 7 – EXERCISES Enter your answers in the spaces pr.docx
 
local_media6355515740080111993.pptx
local_media6355515740080111993.pptxlocal_media6355515740080111993.pptx
local_media6355515740080111993.pptx
 
Midterm Exam The purpose of this examination is t
Midterm Exam        The purpose of this examination is tMidterm Exam        The purpose of this examination is t
Midterm Exam The purpose of this examination is t
 
PAGE 5 Ryerson University Daphne Coc
 PAGE 5  Ryerson University Daphne Coc PAGE 5  Ryerson University Daphne Coc
PAGE 5 Ryerson University Daphne Coc
 
CTA Algebra Comparative Pilot Study
CTA Algebra Comparative Pilot StudyCTA Algebra Comparative Pilot Study
CTA Algebra Comparative Pilot Study
 
Chapter NineShow all workProblem 1)A skeptical paranorma.docx
Chapter NineShow all workProblem 1)A skeptical paranorma.docxChapter NineShow all workProblem 1)A skeptical paranorma.docx
Chapter NineShow all workProblem 1)A skeptical paranorma.docx
 
Final Project ScenarioA researcher has administered an anxiety.docx
Final Project ScenarioA researcher has administered an anxiety.docxFinal Project ScenarioA researcher has administered an anxiety.docx
Final Project ScenarioA researcher has administered an anxiety.docx
 
Practice Test 1
Practice Test 1Practice Test 1
Practice Test 1
 
Show all workProblem 1)A skeptical paranormal researcher cla.docx
Show all workProblem 1)A skeptical paranormal researcher cla.docxShow all workProblem 1)A skeptical paranormal researcher cla.docx
Show all workProblem 1)A skeptical paranormal researcher cla.docx
 
Qmet 252
Qmet 252Qmet 252
Qmet 252
 
Data File 5Chapter NineShow all workProblem 1)A skeptica.docx
Data File 5Chapter NineShow all workProblem 1)A skeptica.docxData File 5Chapter NineShow all workProblem 1)A skeptica.docx
Data File 5Chapter NineShow all workProblem 1)A skeptica.docx
 
tutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychologytutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychology
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVA
 
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...Creative Problem Solving Model for Promoting Achievement among Higher Seconda...
Creative Problem Solving Model for Promoting Achievement among Higher Seconda...
 
Use the 5-step procedure for all problemsEach of the 5 steps.docx
Use the 5-step procedure for all problemsEach of the 5 steps.docxUse the 5-step procedure for all problemsEach of the 5 steps.docx
Use the 5-step procedure for all problemsEach of the 5 steps.docx
 
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...
EFFECTIVENESS OF CO-OPERATIVE LEARNING METHOD IN LEARNING OF MATHEMATICS AMON...
 
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...
The Effect of Problem-Solving Instructional Strategies on Students’ Learning ...
 
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docx
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docxPSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docx
PSYC 354Homework 8Single-Sample T-TestWhen submitting this f.docx
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Day 11 t test for independent samples
Day 11 t test for independent samplesDay 11 t test for independent samples
Day 11 t test for independent samples
 

Recently uploaded

شرح الدروس المهمة لعامة الأمة للشيخ ابن باز
شرح الدروس المهمة لعامة الأمة  للشيخ ابن بازشرح الدروس المهمة لعامة الأمة  للشيخ ابن باز
شرح الدروس المهمة لعامة الأمة للشيخ ابن بازJoEssam
 
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️soniya singh
 
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...anilsa9823
 
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptx
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptxThe King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptx
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptxOH TEIK BIN
 
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun Jani
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun JaniPradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun Jani
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun JaniPradeep Bhanot
 
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000Sapana Sha
 
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...Sanjna Singh
 
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service 👔
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service  👔CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service  👔
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service 👔anilsa9823
 
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...anilsa9823
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Bassem Matta
 
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...baharayali
 
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...Amil Baba Naveed Bangali
 
madina book to learn arabic part1
madina   book   to  learn  arabic  part1madina   book   to  learn  arabic  part1
madina book to learn arabic part1JoEssam
 
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...Amil Baba Naveed Bangali
 
Study of the Psalms Chapter 1 verse 2 - wanderean
Study of the Psalms Chapter 1 verse 2 - wandereanStudy of the Psalms Chapter 1 verse 2 - wanderean
Study of the Psalms Chapter 1 verse 2 - wandereanmaricelcanoynuay
 
Dgital-Self-UTS-exploring-the-digital-self.pptx
Dgital-Self-UTS-exploring-the-digital-self.pptxDgital-Self-UTS-exploring-the-digital-self.pptx
Dgital-Self-UTS-exploring-the-digital-self.pptxsantosem70
 
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_Us
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_UsThe_Chronological_Life_of_Christ_Part_98_Jesus_Frees_Us
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_UsNetwork Bible Fellowship
 

Recently uploaded (20)

Rohini Sector 21 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 21 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 21 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 21 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
شرح الدروس المهمة لعامة الأمة للشيخ ابن باز
شرح الدروس المهمة لعامة الأمة  للشيخ ابن بازشرح الدروس المهمة لعامة الأمة  للشيخ ابن باز
شرح الدروس المهمة لعامة الأمة للشيخ ابن باز
 
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️
Call Girls in majnu ka tila Delhi 8264348440 ✅ call girls ❤️
 
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...
CALL ON ➥8923113531 🔝Call Girls Indira Nagar Lucknow Lucknow best Night Fun s...
 
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptx
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptxThe King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptx
The King Great Goodness Part 2 ~ Mahasilava Jataka (Eng. & Chi.).pptx
 
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun Jani
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun JaniPradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun Jani
Pradeep Bhanot - Friend, Philosopher Guide And The Brand By Arjun Jani
 
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000
Call Girls In East Of Kailash 9654467111 Short 1500 Night 6000
 
Call Girls In Nehru Place 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Nehru Place 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Nehru Place 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Nehru Place 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...
(NISHA) Call Girls Sanath Nagar ✔️Just Call 7001035870✔️ HI-Fi Hyderabad Esco...
 
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service 👔
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service  👔CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service  👔
CALL ON ➥8923113531 🔝Call Girls Singar Nagar Lucknow best Night Fun service 👔
 
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...
Lucknow 💋 (Call Girls) in Lucknow | Book 8923113531 Extreme Naughty Call Girl...
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024
 
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort serviceyoung Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
 
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...
Top Astrologer, Kala ilam expert in Multan and Black magic specialist in Sind...
 
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...
Amil baba contact number Amil baba Kala jadu Best Amil baba Amil baba ki loca...
 
madina book to learn arabic part1
madina   book   to  learn  arabic  part1madina   book   to  learn  arabic  part1
madina book to learn arabic part1
 
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...
Top Astrologer in UK Best Vashikaran Specialist in England Amil baba Contact ...
 
Study of the Psalms Chapter 1 verse 2 - wanderean
Study of the Psalms Chapter 1 verse 2 - wandereanStudy of the Psalms Chapter 1 verse 2 - wanderean
Study of the Psalms Chapter 1 verse 2 - wanderean
 
Dgital-Self-UTS-exploring-the-digital-self.pptx
Dgital-Self-UTS-exploring-the-digital-self.pptxDgital-Self-UTS-exploring-the-digital-self.pptx
Dgital-Self-UTS-exploring-the-digital-self.pptx
 
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_Us
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_UsThe_Chronological_Life_of_Christ_Part_98_Jesus_Frees_Us
The_Chronological_Life_of_Christ_Part_98_Jesus_Frees_Us
 

Statistics group project_Fraud Detection

  • 1. YOUNG INDIA FELLOWSHIP Statistics Course Group Project Members : Abhishek Chopra Adhiraj Sarmah, Kshitij Garg Mahesh Jakhotia Tulasi Prasad Chaudhary 7/25/2011 The group project is based on real case study taken from the Atlanta primary school test papers. The growing pressure among the teachers to improve the test performance of their classes has resulted in malpractices. We have to find out the methodologies to find out the fraud if done in the following case.
  • 2. Contents 1) Problem Statement 2 2) Logical Analysis 2-4 3) Inference 4 4) Our Interpretation of the Cheating Process 4 5) Statistical Approaches 5 6) ANOVA 5 7) Pictorial Method 8 8) The Wincoxon Rank Sum Test 9 9) Appendix a. Table A.1 : Division of questions into groups based on the approach 1 used in ANOVA test 9 b. Table A.2 : Class Results 10 c. Table A.3: Class B Results 11 d. Table A.5 : Class A Results 12 e. Table A.5 : Class B Results 13 1
  • 3. GROUP PROJECT STATISTICS – FRAUD DETECTION Problem Statement: We have been given 2 sets of data of 2 different classrooms and we are required to strategize and analyze to eventually determine whether there was a teacher fraud in one or both of the classrooms. There can be 4 different scenarios: 1) Both A & B data have been tampered. 2) Both A & B data have not been tampered. 3) A is Fraud, B is Not 4) B is Fraud, A is Not We have summarized our thought processes in the following document and demonstrated them through the help of excel sheets attached in the folder. We have used various approaches to derive the solution. Each and every methodology has its own assumptions and its own pros & cons. Logical Analysis: STEP – 1: We calculate the total number of correct answers for every question in both the classes. Since we took a student wise-question wise analysis and assign a correct score with the value „1‟, it also shows the total number of students who got each question correctly for both the classes STEP -2: We then find the Total Number of correct answers of the entire class and divide it by the total number of students to arrive at the average mean number of correct answers per student for or both the classes. STEP – 3: We take the analysis of STEP -2 and then plot line-graphs for both the classes with Questions on the X-Axis and Class Performance on the Y-Axis. The analysis of this will provide a broad perspective on whether there is any evidence of fraud or not. # We found that in Class – A, Questions 30 to Questions 36 clearly show an anomaly. STEP-4: We decided to focus on the anomaly region. We analyzed the questions 30-36 and tried to see if there were any abnormal patterns in them for both the classes. 2
  • 4. # There was very clearly a pattern of answers of exact and uniform correct answers to questions 30- 36 for class A for particular 16 students, which wasn‟t so in Class B. STEP – 5: We calculated the Average score (i.e. Average no. of correct answers) for each of these 16 students in class A which included questions 30-36. We then found the mean score of these 16 students = 46%. For Class B, The mean score of all the students is: 38% 3
  • 5. STEP – 6: We calculated the Average score (ie. Average no. of correct answers) for each of these 16 students in class A EXCLUDING the questions 30-36. We then found the mean score the 16 students of Class A, the mean DECREASED to 42% (ie. A decrease of 4%) For Class B, The mean score of all the students INCREASED to 40%. (ie. An increase of 2%) INFERENCE: Therefore we can say that the set of questions 30 to 36, show reasonable proof to believe that some form of cheating/tampering was done in respect to these questions. Our interpretation of the Cheating Process 1) From questions 30 to 36, the graphs present a consistent growth for 16 students from the other students from the average growth visually, which can be summed up to 16 x 6 questions, which is equal to 96 questions that have been probably tampered with. 2) The reasons to choose that particular set of questions (from 30 to 36) could be a) Since it is given that the level of difficulty increases with the questions it is logical to assume that more students would get correct answers for the first half of the questions compared to the second half, because the difficulty level would be low at the beginning. In the same manner, the second half of the question would be expected to show lesser correct answers as the difficulty would be higher. b) So it would be logically smart on the teachers part to attempt to tamper/cheat in the second half of the questions, since most of the students would be expected to get the correct answers in the first half. Even in the second half, it would be smarter to avoid tampering with the last few questions since they are the most difficult, and an increased number of correct answers for those questions will immediately be easily exposed to detection. So it would be logical to choose questions from somewhere within the beginning of second half and significantly before the last few questions. 3) A set of questions which are consecutively chosen for editing also eases the time factor required to edit the answers manually, which talks about the limited time available to an invigilator or a teacher generally. And 96 questions is a good number of questions to change the entire average of the class performance to a significant level which is an increased level of 4 % as we later found from our analysis.. Statistical Approaches used: 1) Anova Method: Initially we divided the classes into groups and applied anova to see if the groups have the same distribution or not. If one of the groups did not have the same distribution we could conclude that the data of that group was tampered as it disturbed the distribution of the whole class. We used two approaches to divide into groups. Later on we used the Tukey Method to find out the groups which had a deviated mean. 4
  • 6. 2) Pictorial distribution: A graph was plotted with the questions on the X axis and the class performance on the Y Axis. When we analyzed the class A graph we found out that between the questions 30-36 the plot was flat and the results were higher than the performances in the other questions. We can conclude on a pictorial basis that fraud has been done in these questions. 3) The Wilcoxon Rank Sum Test: If we want to use the samples without considering the normal assumptions we can use the Rank Sum approach (used for non-normal distribution) discussed in section 9.2 of the text book. Since the other tests are based on a lot of mathematical assumptions which are not satisfied by the given data, we can use this approach which requires weaker mathematical assumptions. Approach 1 : ANOVA Approach To compare the means and distributions of various groups, ANOVA is preferred to multiple “t-tests” as ANOVA leads to a single test statistic for comparing all the means, so the overall risk of type-I error can be controlled. If we ran many t tests, each at a given alpha level, we couldn‟t know what the overall risk of a type 1 error is. Certainly the more tests one runs, the greater the risk of a false positive conclusion somewhere among the tests. Initially we divided the groups of class A according to the toughness level of the questions. The toughness level was divided according to the area of right answers answered by the students. For example if the total number of questions answered by the group is 445. We divided the group into eight groups by classifying them in to equal areas of (445/8=56). The cumulative sum of total scores in each group is 56. The data was divided into eight groups. The grouping has been shown in appendix section Table A.1 .Anova test was applied on the above groups to find out if the means of the groups was same or different. Test Hypothesis Ho : u1= u2=……u8 Ha : Means are not the same(Thus showing that one or more of the groups have been tampered which resulted in the varying of its mean from the other groups) Results for CLASS A Anova: Single Factor for CLASS A SUMMARY Groups Count Sum Average Variance Column 1 5 52 10.4 16.3 Column 2 4 47 11.75 7.583333 Column 3 4 62 15.5 5.666667 Column 4 6 58 9.666667 8.266667 Column 5 6 60 10 16.8 Column 6 6 53 8.833333 48.56667 Column 7 4 67 16.75 0.916667 5
  • 7. Column 8 9 46 5.111111 8.861111 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 539.6763 7 77.09661 5.076268 0.000444 2.277143 Within Groups 546.7556 36 15.18765 Total 1086.432 43 ANOVA Results for CLASS B Source of Variation SS df MS F P-value F crit Between Groups 287.308 7 41.044 3.695126 0.004144 2.277143 Within Groups 399.8738 36 11.10761 Total 687.1818 43 In the test results we find out that the F Statistic value of “between groups” in class A is 5.07 which is higher than the critical F value.(2.27). This proves that the null hypothesis Ho that the means are equal can be rejected. But a small flaw in this argument is that the samples size of each group is different and this disturbs one of the basic assumption of the ANOVA that equal variance of groups is required except in the cases when the groups are of equal sizes. Second point to be noted is that the groups have to be independent. Hence we have to use a different approach to satisfy the above assumptions. We have now divided the group in such a way that it entails questions of all difficulty levels. We used a circular approach to divide the questions into four categories. For example we put questions 1 to 4 in the following four groups. And then questions 5-8 in the following four groups, thus each group had questions of all types making it a homogenous model. Please refer Appendix Table A.4 - Class A Results for more details of the grouping. Assumptions for ANOVA: 1) The sample measurements are selected from a normal population. 2) The samples are independent. 3) The unknown population & variance for the measurements from sample I are ui and c2 respectively. Now let me explain why our current approach in a way satisfies the given assumptions. The normal approximation is the least crucial. The ANOVA test is a test on means; the central limit theorem has its effect. The central limit theorem may not work for a small sample size. Hence we have taken a large sample size per group. 11 is the sample size and we the central limit theorem can be applied approximately. However one particular alternative is the Kruskal-Wallis Rank Test which is discussed in the section 10.2 of the textbook. This method can be applied to non-random samples. But since this methodology was not taught in the class room we leave that solution and focus on using the ANOVA test for solving the current problem. 6
  • 8. The assumption of equal variances is important if the sample sizes are substantially different. But since we have chosen the same sample size therefore the variance can‟t be a problem over here. When all n‟s are equal, the effect of even grossly unequal variances is minimal. Coming to the independence problem, since we have used a homogenous group in which the questions from easy to tough have been taken therefore the group as a whole is independent from the other groups. We have used a circular approach to make sure that each group has a homogenous set if questions similar to the other groups. Test Hypothesis Ho : u1= u2=……u8 Ha : Means are not the same(Thus showing that one or more of the groups have been tampered which resulted in the varying of its mean from the other groups) CLASS A Results 1 2 3 4 13 9 4 14 12 10 13 9 15 19 15 14 14 14 12 10 7 7 8 14 11 9 9 3 14 2 12 3 3 16 17 16 16 18 17 2 8 9 8 1 6 4 6 2 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Column 1 11 119 10.81818 17.76364 Column 2 11 117 10.63636 30.45455 Column 3 11 121 11 19 Column 4 11 88 8 34.8 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 66.25 3 22.08333 0.865859 0.466735 2.838745 Within Groups 1020.182 40 25.50455 Total 1086.432 43 7
  • 9. ANOVA Results for Class B Source of Variation SS df MS F P-value F crit Between Groups 51.72727 3 17.24242 1.08536 0.366324 2.838745 Within Groups 635.4545 40 15.88636 Total 687.1818 43 Now the ANOVA F test has helped us in just finding whether we need to reject the Ho or not. But rejection of null hypothesis that means are equal does not indicate specifically which means are not equal. Therefore we can use the Tukey method to find out the differences among the specified means. By this method we can specifically point out the group in which the tampering has been done. Approach 2: The Pictorial Method We can see that the frequency curve comes out to be normal for class B, but it is skewed on the higher side in class A. This skewed nature can be attributed to "Tampering or Cheating by the teacher". The mean for class A (Mean = 20.23) is too high as compared to class B (Mean = 16.78) And as seen from the "Question vs. No. of students who attempted it correct " plot we can say that Q-30 to Q- 36 in class A consists of the tampered data. These questions don't follow the normal trend and show an increased peak in between of the decreasing curve. So, we trim off Q-30 to Q-36 from both the classes and then plot them again for remaining questions. And we can see that both the curves come out be normal this time and there is no skew nature in class A. The mean for class A (Mean = 16.32) has also reduced and is now comparable to class B (Mean = 15.44) So it can be easily said that some tempering was done from Q-30 to Q-36 in class A. Approach 3 : The Wilcoxon Rank Sum Test If we want to use the samples without considering the normal assumptions we can use the Rank Sum approach (used for non-normal distribution) discussed in section 9.2 of the text book. Since the other tests are based on a lot of mathematical assumptions which are not satisfied by the given data, we can use this approach which requires weaker mathematical assumptions. This test requires the following conditions: 1) Identical distributions but not necessary normal. The null hypothesis is that the two population distributions are identical. And the alternative test is that the mean of one of the groups is larger than the other group. If the null hypothesis is rejected it implies that both the groups are not distributed in an identical way which implies that on one of the groups a fraud has been done. We can use the critical values and reject the values based on the statistic values. 8
  • 10. Here the two groups could be the data from the two classes or the different groups of questions divided in a homogenous manner. But since this has not been covered in the syllabus we haven‟t done the problem with this method. APPENDIX Table A.1 : Division of questions into groups based on the approach 1 used in ANOVA test Cumulative 8 Sum Groups 13 22 26 40 52 62 75 84 99 118 133 147 161 175 187 197 204 211 219 233 244 253 262 265 279 281 293 296 299 315 332 348 364 382 399 401 409 418 426 427 433 437 443 445 9
  • 11. Table A.2 : Class A Results CLASS A Groups A B C D E F G H 13.00 10.00 19.00 14.00 14.00 2.00 16.00 2 9.00 13.00 15.00 12.00 11.00 12.00 16.00 8 4.00 9.00 14.00 10.00 9.00 3.00 18.00 9 14.00 15.00 14.00 7.00 9.00 3.00 17.00 8 12.00 7.00 3.00 16.00 1 8.00 14.00 17.00 6 4 6 2 SUMMARY Groups Count Sum Average Variance Column 1 5 52 10.4 16.3 Column 2 4 47 11.75 7.583333 Column 3 4 62 15.5 5.666667 Column 4 6 58 9.666667 8.266667 Column 5 6 60 10 16.8 Column 6 6 53 8.833333 48.56667 Column 7 4 67 16.75 0.916667 Column 8 9 46 5.111111 8.861111 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 539.6763 7 77.09661 5.076268 0.000444 2.277143 Within Groups 546.7556 36 15.18765 Total 1086.432 43 10
  • 12. Table A.3: Class B Results CLASS B Results Groups A B C D E F G H 13 2.00 15.00 7.00 3 4 4.00 8.00 5 6.00 10.00 10.00 10 12 10.00 4.00 6 9.00 14.00 12.00 10 10 11.00 5.00 14 4.00 6.00 4 8 6.00 2.00 10.00 4 1.00 5.00 9.00 12 3.00 4.00 1.00 2.00 3.00 3.00 1.00 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Column 1 4 38 9.5 21.66667 Column 2 6 40 6.666667 10.26667 Column 3 3 39 13 7 Column 4 4 35 8.75 7.583333 Column 5 6 43 7.166667 15.36667 Column 6 4 34 8.5 11.66667 Column 7 7 36 5.142857 16.47619 Column 8 10 37 3.7 4.011111 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 287.308 7 41.044 3.695126 0.004144 2.277143 Within Groups 399.8738 36 11.10761 Total 687.1818 43 11
  • 13. Table A.4 - Class A Results Class A : 1 2 3 4 13 9 4 14 12 10 13 9 15 19 15 14 14 14 12 10 7 7 8 14 11 9 9 3 14 2 12 3 3 16 17 16 16 18 17 2 8 9 8 1 6 4 6 2 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Column 1 11 119 10.81818 17.76364 Column 2 11 117 10.63636 30.45455 Column 3 11 121 11 19 Column 4 11 88 8 34.8 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 66.25 3 22.08333 0.865859 0.466735 2.838745 Within Groups 1020.182 40 25.50455 Total 1086.432 43 12
  • 14. Table A.5 : Class B Results CLASS B 1 2 3 4 13 5 6 14 2 6 9 4 10 9 15 10 14 7 10 12 6 3 10 10 4 4 12 4 12 10 8 4 10 11 6 1 3 1 8 4 5 2 5 4 2 3 3 1 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Column 1 11 81 7.363636 20.65455 Column 2 11 61 5.545455 11.27273 Column 3 11 92 8.363636 11.45455 Column 4 11 68 6.181818 20.16364 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 51.72727 3 17.24242 1.08536 0.366324 2.838745 Within Groups 635.4545 40 15.88636 Total 687.1818 43 13