The document provides solutions to statistical problems related to data types, scales of measurement, and hypothesis testing. It defines key concepts like population, sample, structured vs unstructured data, and scales of measurement. Various problems are solved related to identifying the appropriate scale of measurement for different variables, performing appropriate mathematical operations, and determining if a sample is representative of the population. R code is also provided for problems related to testing normality and constructing chi-square plots.
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
Data Science Assignment Help
1. Data Science Assignment Help
For any Assignment related queries, call us at : - +1 678 648 4277
visit : - https://www.statisticsassignmentexperts.com/, or
Email : - info@statisticsassignmentexperts.com
2. 1. A quality engineer wants to check the quality of steel rods produced in a steel
factory. For this, 40 pieces of steel rods are randomly selected from the steel
factory to assess their quality. Based on this, choose the correct option.
A. The population is all steel rods produced in all factories; the sample is the 40
steel rods selected from the given steel factory’s production.
B. The population is all steel rods produced in all factories; the sample is all the
steel rods produced in the given steel factory.
C. The population is all steel rods produced in the given steel factory; the sample
is the 40 steel rods selected from the given steel factory’s production.
D. All the steel rods in the given steel factory are population; the sample is all
steel rods in the given steel factory.
Answer: C By definition, population is the entire collection of elements we are
interested in. Here, the engineer wants to check the quality of steel rods
produced in the given steel factory. Hence, population will be all steel rods
produced in the given factory. Sample is a subset of the population which is
being studied. Since the engineer is studying only a set of 40 rods collected
from that factory, the sample is the set of 40 selected steel rods.
Options A and B are wrong as the engineer is interested only in the given factory’s
rods and not of all factories’.
statisticsassignmentexperts.com
3. 2. Which of the following statements is(are) true?
A. All basic mathematical operations can be performed on some structured data.
B. All basic mathematical operations can be performed on unstructured data.
C. Email contents, text messages, and audio files are usually unstructured data.
D. Height(cm), Weight(Kg), Age(years) are structured data.
Answer: A, C, D
Option (A) Consider the income of employees of a certain company. It is certainly
a structured data and we can perform mathematical operations such as mean
and/or sum on the income of employees.
Option (B) Unstructured data may not necessarily have numeric properties;
therefore, 1 in general, mathematical operations can not be performed on
unstructured data.
Option (C) Email contents, text messages or audio files can not be organised in
any specific way as these do not have any inherent order; therefore, these are
unstructured data.
Option (D) Height, weight, and age have definite order and can always be
arranged in an order. Hence, they are structured data.
3. Values of temperature and humidity of a room are measured for 24 hours at a
regular time interval of 30 minutes. Based on this, choose the correct option.
statisticsassignmentexperts.com
4. A. It is a cross-sectional data.
B. It is time series data.
C. None of the above.
Answer: B
Since the temperature and humidity are recorded over a period of time at regular
intervals, the data collected is time series data.
4. Which of the following is(are) numerical variable(s)?
A. Height(cm)
B. Day of the week
C. Jersey number of sports player
D. Mobile number
E. Email address
F. Age in years
Answer: A, F
Option (A) Since height has numeric properties and can have arithmetic
operations performed on it, it follows that height is a numerical variable.
Option (B) The days of a week belong to a certain category in the set {sunday,
monday, . . . , saturday}. Hence, it is a categorical variable and not a
numerical variable.
Option (C) Jersey numbers are just labels assigned to players for identification
statisticsassignmentexperts.com
5. . As we can see, Dhoni’s jersey number 7 is in no way greater than or lesser than
Gambhir’s jersey number 5. Similarly, a player with jersey number 12 is not
the sum of two players whose jersey numbers add up to 12 (say jersey
numbers 5 and 7). That is, it is meaningless to perform mathematical
operations on any two jersey numbers. Therefore, it is not a numerical
variable.
Option (D) Mobile numbers neither have any order nor can we perform any
standard arithmetic operations on them. Hence, mobile number is also not a
numerical variable.
Option (E) Since the email address also does not have any numeric property, it is
also not a numerical variable.
Option(F) Age has numeric property and we can perform arithmetic operations
on age.
For instance, we can calculate average age of a group of people. Hence, it is a
numerical variable.
5. Which of the following variable(s) has(have) ratio scale of measurement?
A. Temperature in Kelvin
B. Temperature in Centigrade
C. Year
D. Angle measured in degrees
statisticsassignmentexperts.com
6. Answer: A, D
Option (A) Temperature in Kelvin has ratio scale of measurement because
temperature has meaningful intervals and it also has absolute zero.
Option (B) Temperature in Centigrade has no absolute zero, however, we can
perform addition and subtraction operations on it. Therefore, it comes under
interval scale of measurement.
Option (C) Year has ordinal scale of measurement as we can not perform addition
and subtraction operations on year but we can arrange it in increasing or
decreasing order.
Option (D) Angles in degree has ratio scale of measurement as we can compare
the intervals or differences between different angles and it also has absolute zero.
6. Which of the following mathematical operation(s) can be performed on
interval variables?
A. Addition
B. Subtraction
C. Multiplication
D. Division
Answer: A, B Addition and subtraction can be performed on variables with
interval scale of measurement as they have a definite difference between them
but multiplication and subtraction upon these variables is not possible
because difference between them is not comparable; moreover, they do not
statisticsassignmentexperts.com
7. have absolute zero.
7. Pin code is a numerical variable.
A. True
B. False
Answer: B
Pin code is not a numerical variable as there is no ordering possible among
various pin codes. For example, there is no order among pin codes 100002,
500001, 500002 i.e., pin code 100002 is neither greater nor lesser than
500001 or 500002.
8. Which of the following is(are) expected while selecting a sample for a
population?
A. Sample should be a subset of the population.
B. Sample can contain data that is not from the population.
C. Sample should be representative of the characteristics of different elements in
the population.
D. Sample need not be representative of the characteristics of different elements
in the population.
Answer: A, C
By definition, a sample must be a subset of the population and must be
representative of the characteristics of different elements of population.
statisticsassignmentexperts.com
8. The purpose of a sample is to get information about the population.
9. In the 2011 Cricket ODI World Cup quarter-final match between India and
Australia, a media organization estimated that Australia would beat India by 50
runs if Australia bats first, based on the information of matches played between
the two teams previously. Which branch of statistics does the above analysis
belong to?
A. Descriptive Statistics
B. Inferential Statistics
Answer: B
Making predictions from the data comes under inferential statistics. Here, media
makes prediction based on the information it has about the matches played
between two teams previously; therefore, given analysis belongs to inferential
statistics.
10. A class teacher wants to collect feedback from students of the class. The
teacher hands out a blank sheet to each student to obtain descriptive input
and suggestions on the class. The data collected by the class teacher is:
A. Structured Data
B. Unstructured Data
Answer: B
statisticsassignmentexperts.com
9. Students are going to add feedback and suggestions in non organised or non
tabular format. Hence, generated data will be unstructured.
11. Variables with an interval scale of measurement may be converted into a ratio
scale of measurement by performing?
A. Addition operation
B. Subtraction operation
C. Multiplication operation 4
D. Cannot be converted to ratio variables.
Answer: B
Variables with interval scale of measurement can be converted into other
variables with ratio scale of measurement by performing subtraction. If we
make a new variable by subtracting a variable with an interval scale, the new
variable will have absolute zero as one of its values. We obtain absolute zero
when we subtract one value from itself. So, the new variable has a ratio scale
of measurement. For example, in restaurant 1 and restaurant 2, the rating
given by users is noted. The rating given by the user should be an integer
from 1 to 5. So, the rating given by user to restaurant 1 and 2 is an interval
scale since it has fixed measure and no absolute zero. But if we are interested
in the difference (absolute value) between ratings given by the same user,
then the new variable can take values from 0 to 4. This variable has absolute
zero. Hence, it has a ratio scale of measurement.
statisticsassignmentexperts.com
10. 12. Which of the following operations can be valid for categorical variables?
A. Addition
B. Subtraction
C. Comparison ( >, <, =)
D. Multiplication
E. Division
Answer: C
Arithmetic operations can not be performed on categorical variables because they
do not have numeric properties. From the given operations, the only
operation applicable on categorical variables is comparison.
13. What is the scale of measurement for the amount of money you have?
A. Nominal
B. Ordinal
C. Ratio
D. Interval
Answer: C
Amount of money can have a meaningful interval. It also has an absolute zero.
Hence, it comes under the ratio scale of measurement.
statisticsassignmentexperts.com
11. 14. What is the scale of measurement for the military titles - Major, Captain,
Colonel?
A. Nominal
B. Ordinal
C. Ratio
D. Interval
Answer: B
Military titles have a definite rank but they do not have numeric properties;
therefore, they have ordinal scale of measurement.
statisticsassignmentexperts.com
12. Using the information from the data, we have rQ = 0.9351. The R code of this
calculation is compiled in Appendix. From Table 4.2 in the textbook we know that
the critical point to test of normality at the 10% level of significance
corresponding to n = 9 and α = 0.1 is between 0.9032 and 0.9351. Since rQ =
0.9351 > the critical point, we do not reject the hypothesis of normality.
10. Exercise 1.2 gives the age x1, measured in years, as well as the selling price x2,
measured in thousands of dollars, for n = 10 used cars. These data are
reproduced as follows:
(a) Use the results of Exercise 1.2 to calculate the squared statistical distances (xj
− x¯)TS−1 (xj − x¯), j = 1, 2, . . . , 10, where xT
j = (xj1, xj2).
(b) Using the distances in Part (a), determine the proportion of the observations
falling within the estimated 50% probability contour of a bivariate normal
distribution.
(c) Order the distances in Part (a) and construct a chi-square plot.
(d) Given the results in Parts (b) and (c), are these data approximately bivariate
normal? Explain.
statisticsassignmentexperts.com
13. Sol. (a) From Exercise 1.2 we have x¯=
The squared statistical distances d2
j= (xj − x¯)TS−1(xj − x¯), j = 1, . . . , 10 are
calculated and listed below
(b) We plot the data points and 50% probability contour (the blue ellipse) in
Figure 3. It is clear that subject 4, 5, 6, 8, and 9 are falling within the
estimated 50% probability contour.
The proportion of that is 0.5.
Figure 3: Contour of a bivariate normal
statisticsassignmentexperts.com
14. (c) The squared distances in Part (a) are ordered as below. The chi-square plot is
shown in Figure 4.
Figure 4: Chi-square plot
(d) Given the results in Parts (b) and (c), we conclude these data are
approximately bivariate normal. Most of the data are around the theoretical line.
statisticsassignmentexperts.com
15. Appendix
R code for Problem 1. (c).
> library(ellipse)
library(MASS)
> library(mvtnorm)
> set.seed(123)
>
> mu <- c(0,2)
> Sigma <- matrix(c(2,sqrt(2)/2,sqrt(2)/2,1), nrow=2, ncol=2)
> X <- mvrnorm(n=10000,mu=mu, Sigma=Sigma) > lambda <-
eigen(Sigma)$values
> Gamma <- eigen(Sigma)$vectors
> elps <- t(t(ellipse(Sigma, level=0.5, npoints=1000))+mu)
> chi <- qchisq(0.5,df=2)
> c <- sqrt(chi)
> factor <- c*sqrt(lambda)
> plot(X[,1],X[,2])
> lines(elps)
> points(mu[1], mu[2])
> segments(mu[1],mu[2],factor[1]*Gamma[1,1],factor[1]*Gamma[2,1]+mu[2])
> segments(mu[1],mu[2],factor[2]*Gamma[1,2],factor[2]*Gamma[2,2]+mu[2])
statisticsassignmentexperts.com
16. R code for Problem 9.
> x <- c(-0.6, 3.1, 25.3, -16.8, -7.1, -6.2, 25.2, 22.6, 26.0)
> # (a) > qqnorm(x)
> qqline(x)
> # (b)
> y <- sort(x)
> n <- length(y)
> p <- (1:n)-0.5)/n
>q <- qnorm(p)
> rQ <- cor(y,q)
R code for Problem 10.
> n <- 10 > x1 <- c(1,2,3,3,4,5,6,8,9,11)
> x2 <- c(18.95, 19.00, 17.95, 15.54, 14.00, 12.95, 8.94, 7.49, 6.00, 3.99)
> X <- cbind(x1,x2)
> Xbar <- colMeans(X)
> S <- cov(X)
> Sinv <- solve(S)
>
> # (a)
> d <- diag(t(t(X)-Xbar)%*%Sinv%*%(t(X)-Xbar))
statisticsassignmentexperts.com