STATISTICS I (İSTATİSTİK) - (İNGİLİZCE) Dersi Variability Measures soru cevapları:

Toplam 12 Soru & Cevap
PAYLAŞ:

#1

SORU:

Define range.


CEVAP:

The range of a data set, shown as R, is the difference between the largest and smallest values and calculated as follows:

R = Largest Value - Smallest Value


#2

SORU:

A researcher conducts a survey on kids and teenagers whose ages are as followning: 7, 11, 5, 12, 17, 6, 13, 9, 8, 4, 13, 12, 7, 6, 11, 5, 18, 9. What is the age range for his/her survey?


CEVAP:

The oldest one is 18 years old and the youngest one is 4 years old. So the range is 18-4=14 years.


#3

SORU:

What is the disadvantage of range?


CEVAP:

The disadvantage of the range is that it depends only on the highest and lowest observations and it tells us nothing about the variability of the observations which fall between the two extremes. If there are some outliers, an observation point that is too far from other observations, then the range will be heavily affected by these extremes values


#4

SORU:

What is the most important and widely used measure of variability in statistics?


CEVAP:

The most important and widely used measure of variability in statistics is the standard deviation


#5

SORU:

A farmer wants to collect some statistical information about the lambs he feeds and picks 10 of them. The weights of the lambs are 8,10,7,12,14,6,9,7,13 and 14 kgs. Calculate the sample mean, variance and standard deviation.


CEVAP:

We can compute it by preparing the following table

Sample mean=100/10=10kg

Variance=84/(10-1)=9.33kg

Standard Deviation=sqrt(9.33)=3.05kg

  Observation(xi) Deviation(xi-X) Squared Deviation (xi-X)2
  8 -2 4
  10 0 0
  7 -3 9
  12 2 4
  14 4 16
  6 -4 16
  9 -1 1
  7 -3 9
  13 3 9
  14 4 16
Totals 100 0 84

.


#6

SORU:

Grouped age distribution of the sample of 225 workers in a factory is given in the table below. Compute the standard deviation and variance.

Class Interval Frequency(f)
18-24 30
24-30 40
30-36 42
36-42 38
42-48 30
48-54 20
54-60 15
60-66 10
Total 225


CEVAP:

The group mean is computed as X=37.5 (SUM(f*M)/Sample Size=8433/225=37.5)

So the sta*ndard deviation=f*Squared Deviation/(SampleSize-1)=Squareroot(31052.16/224)=11.8

Variance=11.8*11.8=138.6

Class Interval Frequency(f) Midpoint(M) f*M Deviation(M-X) Squared Deviation:(M-X)2 f*Squared Deviation
18-24 30 21 630 -16.5 271.6 8147.7
24-30 40 27 1080 -10.5 109.8 4393.2
30-36 42 33 1386 -4.5 20.1 843.0
36-42 38 39 1482 1.5 2.3 87.8
42-48 30 45 1350 7.5 56.6 1696.5
48-54 20 51 1020 13.5 182.8 3655.8
54-60 15 57 855 19.5 381.0 5715.5
60-66 10 63 630 25.5 651.3 6512.7
Total 225   8433     31052.16

#7

SORU:

Define percentile.


CEVAP:

The percentiles generally are demonstrated as P(m), where m is the number taking values between 0 and 100. Intuitively, the P(m) percentile of a set of n measurements, arranged in order of magnitude, is the value such m percent of the measurements are less than or equal to that corresponding value.


#8

SORU:

Define quartile.


CEVAP:

Some of the specific percentiles frequently used as variability measures are 25th, 50th, and 75th percentiles, often called the first quartile, the second quartile (median), and the third quartile, and denoted by Q1, Q2, and Q3 respectively.


#9

SORU:

Explain the term "interquartile range".


CEVAP:

The second variability measure is the interquartile range (IQR). The interquartile range is the differences between the third and the first quartiles and can be calculated as follows.

IQR = Q3 - Q1 = P(75) - P(25)

The Interquartile range can be thought as the middle 50% of the data, when the interquartile range is calculated, we automatically discard the smallest 25% and largest 25% of the data in terms of variability. Therefore, IQR will give us a good indication about the variability of the center data in which the data is sorted from smallest to largest values.The interquartile range has the advantage over the range of being less compared sensitive to outliers and it is not greatly affected by the sample size. Although IQR has some advantages according to the range, it can be very misleading when the measurements are highly concentrated about the medSome of the specific percentiles frequently used as variability measures are 25th, 50th, and 75th percentiles, often called the first quartile, the second quartile (median), and the third quartile, and denoted by Q1, Q2, and Q3 respectively.


#10

SORU:

Define box plot and explain how the box plots are drawn.


CEVAP:

A variety of graphical techniques can be used to give an effective visual information of key descriptive statistics and the shape of the distribution for a data set. A box plot, which is also called box-and-whisker plot, is the one of these plots. In a box plot, a rectangle (box) with upper and lower edges at the 25th (Q1) and 75th (Q3) percentiles is drawn with a line in the box at the 50th percentile (Q2). Lines, which is also called whiskers, are drawn from the box to the highest and lowest values that are within 1.5xIQR of Q3 and 1.5xIQR of Q1, respectively.


#11

SORU:

How do we detect the outliers and extreme values from a box plot?


CEVAP:

Any observations greater than Q3+1.5xIQR or less than Q1-1.5xIQR are plotted individually and called outliers. In the same manner, the observations greater than Q3+3xIQR or less than Q1-3xIQR are plotted individually and called extremely outliers.


#12

SORU:

Define skewness. What information can one gather from the skewness of a distribution?


CEVAP:

The measures of skewness are another kind of descriptive statistics and give information about the shape of distribution of the observations. A data set which is not symmetrically distributed is called skewed. The mainly observed shapes of distribution are symmetric, left skewed (negatively skewed), and right skewed (positively skewed). If the distribution is unimodal symmetric, the mean, median, and mode are all the same. If the distribution is left skewed, having a long tail in negative direction and a single peak, the mean is pulled in the direction of the tail, and the median falls between the mode and the mean. If the distribution is right skewed, having a long tail in positive direction and a single peak, the mean is pulled in the direction of the tail, and the median falls  between the mode and the mean.