Midterm
Morgan Holland
5/22/2020
Instructions
You will need alcohol. RData dataset to complete the Midterm.
The dataset includes data on 9,822 individuals. Each individual report various demographic and health characteristics, as well as a variable that equals 1 if the person reports abusing alcohol.
Question 1.
Load the alcohol.RData dataset. Convert it to a table named “alcohol.” Use print(head(alcohol)) to show me the first six rows of the dataset. Copy and paste your code and output into the answer box.
Print(head(alcohol))
Question 2
unemrate is the unemployment rate in a respondent’s State. At what level of measurement is this variable? (Nominal Scale, Ordinal Scale, Interval Scale, or Ratio Scale)?
Ratio scale since it has a defined zero point.
Question 3-6
The famsize variable measures how many people are in the respondent’s family. Fill in the missing values in the following relative frequency table:
Family Size Number of Respondents Relative Frequency Cumulative Relative Frequency
1 2595 0.264 0.264
2 2311 0.235 0.499
3 1766 0.180 0.679
4 2000 0.204 0.883
5 740 0.075 0.958
6 257 0.026 0.984
7 99 0.010 0.995
8 32 0.003 0.998
9 9 0.001 0.999
10 6 0.001 0.999
11 5 0.001 1.000
12 1 0.000 1.000
13 1 0.000 1.000
Total number of respondents = 9822
Family size 2: 2311/9822 = 0.235
Family size 4: 2000/9822 = 0.204
Cumulative Relative Frequency Family size 3: 0.499+0.180 = 0.679
Cumulative Relative Frequency Family size 3: 0.679+0.204 = 0.883
Question 7-8f
Create a barplot of the famsize variable Copy and paste your code in the answer box. In the next question, upload your barplot.
# Simple Bar Plotcounts <- table(mtfamily$size)barplot (counts, main=”Number of Respondents”, xlab=”Family Size”
Question 9
Does family size appear to be right-skewed, left-skewed, or not skewed at all?
Family size appears to be left-skewed
Question 10-11
educ is the respondent’s level of education, in years. Calculate the sample mean and sample standard deviation of educ.
Sample mean = average (E2:E9823) = 13.30961
Sample standard deviation =std. s (E2:E9823) = 2.898751
Question 12
Suppose we assume that unemrate is normally distributed with mean 5.57% and standard deviation 1.51%. What is the probability of picking a person at random who faces an unemployment rate less than 4%?
µ = 0.0557 and σ = 0.0151
P(x<0.04) = P(x-µ<0.04-0.0557) = P (x-µσ<0.04-0.05570.0151)0.04-0.05570.0151= -1.04P(z<-1.04) = 0.1492
Question 13
What is the probability of picking a person who faces an unemployment rate greater than 7%, assuming the unemployment rate is normally distributed as in question 12?
µ = 0.0557 and σ = 0.0151
P(x>0.07) = P(x-µ>0.07-0.0557) = P (x-µσ>0.07-0.05570.0151)0.07-0.05570.0151= 0.95P(z>0.95) = 0.1711
Question 14
What is the 98th percentile of unemrate, assuming it is normally distributed as in question 8?
0.98 to z score
Percentile to z score
P(z<?) = 0.98
Z = 2.05
2.05=x-0.05570.0151X = 2.05(0.0151) + 0.0557
X = 0.0867 = 8.67%
Question 15-16
Now let’s informally test our assumption that the unemployment rate is normally distributed.
plot a histogram of unemrate, using a binwidth of 0.1. Copy and paste the code you used in the answer box. In the next question, upload your histogram as a .png or .jpeg
Question 17
Based on your histogram, do you think the unemployment rate is normally distributed? Justify your reasoning in 2-3 sentences.
The graph is not normally distributed.
The graph is nowhere close to the bell shape. It peaks at somewhere in the middle of the histogram and also further to the right.
Question 18
Suppose you are rolling a six-sided die. Let event A= {2,4,6} and B = {1,3,5}. True or False, these events are mutually exclusive.
False. The two events can happen at the same time.
Question 19
Suppose you are rolling a six-sided die. Let event A= {2,3,4} and event B= {1,2,3}. What is the intersection of these two events? That is, what is A∩B?
A∩B = {2,3}
Question 20
Suppose you are rolling a six-sided die. Let event A= {2,3,4} and event B= {1,2,3}. What is the union of these two events? That is, what is A∪B?
A∪B = {1,2,3,4}
Question 21
Suppose you flip a biased coin two times. The probability of getting heads in this coin is p = 0.2. What is the probability of getting two heads in the two flips?
Hint: There are two ways to solve this. You can use R’s built-in Binomial distribution (type help(dbinom)) for more information) or you can write out each event and calculate the probability using the probability rules.
Since the probability of getting heads in one toss/flip = 0.2
The probability of getting two heads on two-coin tosses/flips = 0.2 x 0.2 (because these are independent events) = 0.04
Thus, the answer is: 0.04
Question 22
Suppose X is Bernoulli distributed with parameter p=0.8. What is the mean of X?
Bernoulli distribution is a type of discrete probability distribution which have two possible outcomes where probability of x = 0(failure) is 1-p and probability of x = 1(success) is p.
Mean = Summation xp(x) = 0 * (1-p) + 1 * (p) = p = 0.8
Therefore, mean(x) = 0.8
Question 23
Suppose you know that the number of customers that arrive at a grocery store in an hour is a Poisson random variable with λ=200 That is, you know that on average 200 customers enter the store every hour. On average, how many customers can you expect to arrive in the next ten minutes?
Here λ = 200/hr
Or, λ = 2003 min=103Expected number of customers in the next minute = λt
= 10310=1003=33 customers