Math ia farmville final

1,887 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,887
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Math ia farmville final

  1. 1. Tenorio 1 IB Math SL Internal Assessment: Farmville Statistics Arielle Tenorio Period 6 Farmville is a popular computer game that is hosted by the social networking website, Facebook. This game allows players to manage a virtual farm by plowing, planting, growing, and harvesting on their virtual farmland. Crops, trees, and livestock can be purchased with the “FarmCoins” that are earned by harvesting. There are also levels in this game that are achieved by reaching a certain amount of experience points. Players of higher levels tend to have larger farms, more crops, and more FarmCoins than those of lower levels. This assignment will examine the relationship between the number of trees a Farmville player has and what level they are on in the game. It is predicted that there will be a positive relationship. This assumption can be confirmed or denied by analyzing and processing collected data. First, a scatter plot will be produced with a line of linear regression to display the trend of the data. The correlation coefficient value for the two variables will also be determined. A box and whisker plot will compare the highest-ranking players out of those surveyed and the lowest- ranking and the number of trees that both groups tend to own. A chi-squared test will test for independence to find if the two factors occur as a result of one another or is they are unrelated events.
  2. 2. Tenorio 2 Data samples were collected from 25 random Farmville players after logging onto Facebook and opening the Farmville game. After visiting the virtual farms of 25 “Friends” and counting the number of trees on each farm, a table was drawn up to organize the collected values. Figure 1: Collected Data # Farmville Level Number of Trees 1 7 7 2 8 9 3 9 9 4 9 16 5 10 11 6 13 17 7 13 23 8 13 45 9 15 19 10 15 20 11 16 33 12 16 35 13 16 16 14 18 21 15 19 18 16 20 20 17 22 79 18 22 35 19 23 41 20 23 28 21 25 62 22 26 35 23 28 44 24 31 94 25 34 40 Figure 1: This table displays the data that was collected. From the table, it can be observed that the number of trees generally increases as the level increases. The values on this table will be generated onto a scatter plot.
  3. 3. Tenorio 3 A scatter plot is used to visually display the relationship between two variables on a two- dimensional graph. A line of linear regression, or trend line, can be found to confirm the observation of the relationship. A correlation between the variables occurs as a result of the clustering of data points around the trend line. Figure 2: Scatter Plot and Linear Regression Line The Relationship Between Level and 100 Number of Trees 90 80 70 Number of Trees 60 50 40 30 20 y = 2.0783x - 6.4127 10 0 0 5 10 15 20 25 30 35 40 Farmville Level Figure 2: This scatter plot shows a positive relationship between the level of Farmville and the number of trees a player has. The line of linear regression is produced by using Microsoft Excel. The calculations to find this equation manually is produced below. Line of Linear Regression: The formula for finding the linear regression line for y on x is S xy y − y = 2 (x − x) Sx where y is the average of Y variables, x is the average of X variables, Sxy is the covariance of X and Y and Sx2 is the standard deviation of X, squared. In order to find these values, the data was organized into a table, below.
  4. 4. Tenorio 4 Figure 3: Table for Linear Regression Line # Level (x) Trees (y) xy x² y² 1 7 7 49 49 49 2 8 9 72 64 81 3 9 9 81 81 81 4 9 16 144 81 256 5 10 11 110 100 121 6 13 17 221 169 289 7 13 23 299 169 529 8 13 45 585 169 2025 9 15 19 285 225 361 10 15 20 300 225 400 11 16 33 528 256 1089 12 16 35 560 256 1225 13 16 16 256 256 256 14 18 21 378 324 441 15 19 18 342 361 324 16 20 20 400 400 400 17 22 79 1738 484 6241 18 22 35 770 484 1225 19 23 41 943 529 1681 20 23 28 644 529 784 21 25 62 1550 625 3844 22 26 35 910 676 1225 23 28 44 1232 784 1936 24 31 94 2914 961 8836 25 34 40 1360 1156 1600 ∑ = 451 777 16671 9413 35299 mean = 18.04 31.08 666.84 376.52 1411.96 Figure 3: The sums and averages of of x, y, xy, x² and y² were found and listed. By organizing the data in this manner, it was easier to quickly find the values for Sxy and Sx2. The calculations are shown below. ∑ x = 451 ∑ y = 777 ∑ xy = 16671 ∑x 2 = 9413 n = 25 To find the average of x: x= ∑ x = 451 = 18.04 n 25 To find the average of y:
  5. 5. Tenorio 5 y= ∑ y = 777 = 31.08 n 25 To find Sxy: S xy = ∑ ( xy) − x y n (16671) S xy = − (18.04)(31.08) 25 S xy ≈ 106.16 To find Sx2: Sx = ∑ x2 − x 2 n 9413 Sx = − 18.04 2 25 s x ≈ 51.08 To find the equation of the line of linear regression: S xy y − y = 2 (x − x) Sx 106.16 y = 31.08 x = 18.04 y – (31.08) = ( x − 18.04) 51.08 s xy = 106.16 2 s x = 51.08 y – (31.08) = 2.078x – 37.493 y = 2.078x – 6.413 The correlation between the two values can also be found. Pearson’s correlation coefficient formula is used to find this value. If r = 1, then it is said that the x and y values are perfectly correlated. If r = 0, then x and y are not correlated. If r = -1, then x and y are perfectly negatively correlated. By calculating the correlation coefficient, the degree of linearity between X and Y can be determined. Pearson’s Correlation Coefficient Formula: The formula for finding the correlation coefficient is
  6. 6. Tenorio 6 r= ∑ ( xy) − nxy . ∑ ( x ) − nx ∑ ( y 2 2 2 ) − ny 2 Most of the values have already been determined while finding the linear regression line equation. To find the correlation coefficient, r: r= ∑ ( xy) − nxy ∑ ( x ) − nx ∑ ( y 2 2 2 ) − ny 2 n = 25 ∑ xy = 16671 ∑ x 2 = 9413 ∑ y = 35299 2 (16671 − 25 ⋅ 18.04 ⋅ 31.08) x = 18.04 y = 31.08 r= 9413 − 25 ⋅ 325.44 35299 − 25 ⋅ 965.97 x 2 = 325.44 y 2 = 965.97 r = 0.70334 r² = 0.49468 The correlation value can be rounded to 0.703. It can be stated that there is a moderate, positive correlation between x and y. The positive r value means that the level of a Farmville player (x) increases, then so does the number of trees (y). The graph also represents the positive relationship. However, it will be noted that there are data points that do not cluster as closely to the trend line as the other data points such as points (22, 79) and (31, 94). These points are considered outliers. They might appear as a result of the freedom every player has to purchase a wide variety of items other than trees (animals, seeds, decorations, buildings etc.). Not all players have the same desire to purchase trees. Parallel boxplots can be used to display some of the descriptive statistics of the data sets x and y. The parallel boxplots will present a visual comparison of the distribution of the data as well as the descriptive statistics. These descriptive statistics are median, range, interquartile range minimum and maximum. The spread of data for the number of trees owned by the lowest- ranking half of Farmville players surveyed (levels 7-15) will be compared to that of the highest- ranking players from the group of 25 players (levels 16-34). It is predicted that the lower-level players will less trees while higher-level players will have a greater number of trees, but there may be some overlapping data. Figure 4: Number of Trees for Levels 7-15 and 16-34 Statistic Levels 7-15 Levels 16-34 Quartile 1 9 21 Minimum 7 16
  7. 7. Tenorio 7 Median 16.5 35 Maximum 45 94 Quartile 3 20 44 Figure 4: This table shows the five number summaries for level and number of trees. The data that is organized here will be shown in the box and whisker plot. Figure 5: Box and Whisker Plot 100 90 80 70 Quartile 1 60 Minimum 50 Median Maximum 40 Quartile 3 30 20 10 0 Levels 7-15 Levels 16-34 Figure 5: The box and whisker plot compares the spread of data for Farmville players and the number of trees they own. Fifty-percent of the highest ranking players out of the group that was tested own anywhere from 21 to 45 trees, whereas the middle fifty-percent of lowest ranking players own from 10 to 20 trees. Some beginner players, however, seem to own as many trees as the higher-level players. By comparing the descriptive statistics describing the number of trees that the highest ranking players own versus the lower players, it can be seen that while higher-ranking players tend to have more trees, it is not necessarily true that lower-ranking players cannot surpass them in number of trees owned. This can be seen on the plot, as twenty-five percent of the lower level players own about as much as the higher-level group’s middle fifty-percent. However, the higher-level group has a greater median than that of the lower-level group, which suggests that they own more trees than most of the beginner players.
  8. 8. Tenorio 8 A chi-squared test will now be performed to determine if the number of trees a player has and their level in the game are independent or independent events. The equation for the chi-squared test is ( f − fe )2 X2 =∑ o fe where fo is the observed frequency and fe is the expected frequency. Contingency tables will be constructed to show the results of the 25 surveyed players. One table displays the observed values, while another displays the expected values. Observed values table: Trees 7-30 >30 Total 7-15 10 0 10 16-34 4 11 15 Level Total 14 11 25 Expected values table: Trees 7-30 >30 Total 7-15 5.6 4.4 10 16-34 8.4 6.6 15 Level Total 14 11 25 To find expected value (for box 7-15 x 7-30): 10 ⋅ 14 fe = 23 f e = 5.6 Before performing the chi-squared test, the null and alternative hypotheses are formed, the degree of freedom is calculated, and the significance level is stated. Ho (null hypothesis) states that game level and amount of trees are independent events. H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom. At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 . 2 To find degrees of freedom for a 2 x 2 contingency table: df = (r-1)(c-1) df = (2-1)(2-1) df= 1
  9. 9. Tenorio 9 Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation. Figure 6: X2 Calculation ( fo − fe )2 fo fe fo − fe ( fo − fe )2 fe 10 5.6 4.4 19.36 3.457142857 0 4.4 -4.4 19.36 4.4 4 8.4 -4.4 19.36 2.304761905 11 6.6 4.4 19.36 2.933333333 Total= 13.0952381 Figure 6: This table shows how the chi-squared value was found. X 2 ≈ 13.1 Because the X2 is greater than 5.99, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events. According to the scatter plot and the line of linear regression, there is a positive relationship between the number of trees a Farmville player has and what level they are on in the game. By finding Pearson’s correlation coefficient, it was determined that there is a moderate correlation between the two variables. As stated before, this could be because more experienced players tend to have more “FarmCoins” to purchase trees. Lower-level players and beginners are more likely to buy smaller, cheaper plants. The boxplot also showed that higher-level players own more trees, but also suggested that lower-level players have the ability to own more trees than high- level players. The chi-square test showed that the two factors are dependent events. The level of a Farmville player and the number of trees they own in the game are dependent events. They have a positive correlation suggesting that as a player rises in level, they buy more trees. There were a couple data samples that did not cluster as closely to the linear regression line as the other data points did. These data points are considered to be outliers. Each player has the freedom to use their “FarmCoins” on various accessories for their farms, such as animals, seeds, and decorations, and not all players are interested in buying the same items for their virtual farm. Some players may buy more trees than seeds or animals. To determine if these outliers skew the data significantly, a chi-squared test will be performed on the data again with the outliers removed. The table below displays the data samples without the two outliers, (22, 79) and (31, 94).
  10. 10. Tenorio 10 Figure 7: Data without Outliers Farmville Level Number of Trees 7 7 8 9 9 9 9 16 10 11 13 17 13 23 13 45 15 19 15 20 16 33 16 35 16 16 18 21 19 18 20 20 22 35 23 41 23 28 25 62 26 35 28 44 34 40 Figure 7: This data will be used to perform a second chi-squared test. Observed values table: Trees Tota 7-30 >30 l 7-15 10 0 10 16-34 4 9 13 Level Total 14 9 23 Expected values table:
  11. 11. Tenorio 11 Trees Tota 7-30 >30 l 7-15 6.086957 3.913043 10 16-34 7.913043 5.086957 13 Level Total 14 9 23 Ho (null hypothesis) states that game level and amount of trees are independent events. H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom. At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 . 2 Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation. Figure 8: X2 Calculation without Outliers ( fo − fe )2 fo fe fo − fe ( fo − fe )2 fe 10 6.1 3.9 15.21 2.493443 0 3.9 -3.9 15.21 3.9 4 7.9 -3.9 15.21 1.925316 9 5.1 3.9 15.21 2.982353 Total= 11.30111 Figure 8: This table shows how the chi-squared value was found. X 2 ≈ 11.3 Because the X2 is greater than 3.84, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events. This concludes that the outliers did not have a significant affect on the outcome of the processed data, and did not skew the results.

×