IB Questionbank

User interface language: English | Español

Date	May 2021	Marks available	1	Reference code	21M.3.AHL.TZ2.1
Level	Additional Higher Level	Paper	Paper 3	Time zone	Time zone 2
Command term	Suggest	Question number	1	Adapted from	N/A

Question

Juliet is a sociologist who wants to investigate if income affects happiness amongst doctors. This question asks you to review Juliet’s methods and conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and asked them to fill in an anonymous questionnaire. Participants were asked to state their annual income and to respond to a set of questions. The responses were used to determine a happiness score out of $100$ . Of the $415$ doctors on the list, $11$ replied.

Juliet’s results are summarized in the following table.

For the remaining ten responses in the table, Juliet calculates the mean happiness score to be $52.5$ .

Juliet decides to carry out a hypothesis test on the correlation coefficient to investigate whether increased annual income is associated with greater happiness.

Juliet wants to create a model to predict how changing annual income might affect happiness scores. To do this, she assumes that annual income in dollars, $X$ , is the independent variable and the happiness score, $Y$ , is the dependent variable.

She first considers a linear model of the form

$Y = a X + b$ .

Juliet then considers a quadratic model of the form

$Y = c X^{2} + d X + e$ .

After presenting the results of her investigation, a colleague questions whether Juliet’s sample is representative of all doctors in the city.

A report states that the mean annual income of doctors in the city is $$ 80 000$ . Juliet decides to carry out a test to determine whether her sample could realistically be taken from a population with a mean of $$ 80 000$ .

Describe one way in which Juliet could improve the reliability of her investigation.

[1]

a.i.

Describe one criticism that can be made about the validity of Juliet’s investigation.

[1]

a.ii.

Juliet classifies response $K$ as an outlier and removes it from the data. Suggest one possible justification for her decision to remove it.

[1]

Calculate the mean annual income for these remaining responses.

[2]

c.i.

Determine the value of $r$ , Pearson’s product-moment correlation coefficient, for these remaining responses.

[2]

c.ii.

State why the hypothesis test should be one-tailed.

[1]

d.i.

State the null and alternative hypotheses for this test.

[2]

d.ii.

The critical value for this test, at the $5 %$ significance level, is $0.549$ . Juliet assumes that the population is bivariate normal.

Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer.

[2]

d.iii.

Use Juliet’s data to find the value of $a$ and of $b$ .

[1]

e.i.

Interpret, referring to income and happiness, what the value of $a$ represents.

[1]

e.ii.

Find the value of $c$ , of $d$ and of $e$ .

[1]

e.iii.

Find the coefficient of determination for each of the two models she considers.

[2]

e.iv.

Hence compare the two models.

[1]

e.v.

Juliet decides to use the coefficient of determination to choose between these two models.

Comment on the validity of her decision.

[1]

e.vi.

State the name of the test which Juliet should use.

[1]

f.i.

State the null and alternative hypotheses for this test.

[1]

f.ii.

Perform the test, using a $5 %$ significance level, and state your conclusion in context.

[3]

f.iii.

Markscheme

Any one from: R1

increase sample size / increase response rate / repeat process
check whether sample is representative
test-retest participants or do a parallel test
use a stratified sample
use a random sample

Note: Do not condone:
Ask different types of doctor
Ask for proof of income
Ask for proof of being a doctor
Remove anonymity
Remove response $K$ .

[1 mark]

a.i.

Any one from: R1

non-random sampling means a subset of population might be responding
self-reported happiness is not the same as happiness
happiness is not a constant / cannot be quantified / is difficult to measure
income might include external sources
Juliet is only sampling doctors in her city
correlation does not imply causation
sample might be biased

Note: Do not condone the following common but vague responses unless they make a clear link to validity:
Sample size is too small
Result is not generalizable
There may be other variables Juliet is ignoring
Sample might not be representative

[1 mark]

a.ii.

because the income is very different / implausible / clearly contrived R1

Note: Answers must explicitly reference "income" to get credit.

[1 mark]

$($) 90 200$ (M1)A1

[2 marks]

c.i.

$r = 0.558 (0.557723 \dots)$ A2

[2 marks]

c.ii.

EITHER
only looking for change in one direction R1

OR
only looking for greater happiness with greater income R1

OR
only looking for evidence of positive correlation R1

[1 mark]

d.i.

$H_{0} : ρ = 0; H_{1} : ρ > 0$ A1A1

Note: Award A1 for $ρ$ seen (do not accept $r$ ), A1 for both correct hypotheses, using their $ρ$ or $r$ . Accept an equivalent statement in words, however reference to “correlation for the population” or “association for the population” must be explicit for the first A1 to be awarded.

Watch out for a null hypothesis in words similar to “Annual income is not associated with greater happiness”. This is effectively saying $ρ \leq 0$ and should not be condoned.

[2 marks]

d.ii.

METHOD 1 – using critical value of $r$

$0.558 > 0.549 (0.557723 \dots > 0.549)$ R1

(therefore significant evidence of) a positive correlation A1

Note: Do not award R0A1.

METHOD 2 – using $p$ -value

$0.0469 < 0.05 (0.0469463 \dots < 0.05)$ A1

Note: Follow through from their $r$ -value from part (c)(ii).

(therefore significant evidence of) a positive correlation A1

Note: Do not award A0A1.

[2 marks]

d.iii.

$a = 0.000126 (0.000125842 \dots), b = 41.1 (41.1490 \dots)$ A1

[1 mark]

e.i.

EITHER
the amount the happiness score increases for every $$ 1$ increase in (annual) income A1

OR
rate of change of happiness with respect to (annual) income A1

Note: Accept equivalent responses e.g. an increase of $1.26$ in happiness for every $$ 10000$ increase in salary.

[1 mark]

e.ii.

$c = - 2.06 \times 10^{- 9} (- 2.06191 \dots \times 10^{- 9})$ ,

$d = 7.05 \times 10^{- 4} (7.05272 \dots \times 10^{- 4})$ ,

$e = 12.6 (12.5878 \dots)$ A1

[1 mark]

e.iii.

for quadratic model: $R^{2} = 0.659 (0.659145 \dots)$ A1

for linear model: $R^{2} = 0.311 (0.311056 \dots)$ A1

Note: Follow through from their $r$ value from part (c)(ii).

[2 marks]

e.iv.

EITHER
quadratic model is a better fit to the data / more accurate A1

OR
quadratic model explains a higher proportion of the variance A1

[1 mark]

e.v.

EITHER
not valid, $R^{2}$ not a useful measure to compare models with different numbers of parameters A1

OR
not valid, quadratic model will always have a better fit than a linear model A1

Note: Accept any other sensible critique of the validity of the method. Do not accept any answers which focus on the conclusion rather than the method of model selection.

[1 mark]

e.vi.

(single sample) $t$ -test A1

[1 mark]

f.i.

EITHER

$H_{0} : μ = 80 000; H_{1} : μ \neq 80 000$ A1

$H_{0} :$ (sample is drawn from a population where) the population mean is $$ 80 000$
$H_{1} :$ the population mean is not $$ 80 000$ A1

Note: Do not allow FT from an incorrect test in part (f)(i) other than a $z$ -test.

[1 mark]

f.ii.

$p = 0.610 (0.610322 \dots)$ A1

Note: For a $z$ -test follow through from part (f)(i), either $0.578$ (from biased estimate of variance) or $0.598$ (from unbiased estimate of variance).

$0.610 > 0.05$ R1

EITHER

no (significant) evidence that mean differs from $$ 80 000$ A1

the sample could plausibly have been drawn from the quoted population A1

Note: Allow R1FTA1FT from an incorrect $p$ -value, but the final A1 must still be in the context of the original research question.

[3 marks]

f.iii.

Examiners report

[N/A]

a.i.

[N/A]

a.ii.

[N/A]

c.i.

[N/A]

c.ii.

[N/A]

d.i.

[N/A]

d.ii.

[N/A]

d.iii.

[N/A]

e.i.

[N/A]

e.ii.

[N/A]

e.iii.

[N/A]

e.iv.

[N/A]

e.v.

[N/A]

e.vi.

[N/A]

f.i.

[N/A]

f.ii.

[N/A]

f.iii.

Syllabus sections

Topic 4—Statistics and probability » SL 4.1—Concepts, reliability and sampling techniques

Show 80 related questions

22M.2.AHL.TZ1.3a:
Name the type of sampling that best describes the method used by the Principal.
22M.2.SL.TZ1.3b:
Determine if the Netherlands’ score is an outlier for this data. Justify your answer.
19M.2.AHL.TZ1.H_3b.ii:
the standard deviation.
18N.2.AHL.TZ0.H_10b.ii:
an estimate for the standard deviation of the number of emails received per working day.
EXM.2.SL.TZ0.5d:
State an assumption that the company is making, in order to use a t-test.
EXM.2.SL.TZ0.5c.i:
The new drug.
EXM.2.SL.TZ0.5b:
Calculate the number of volunteers in the sample under the age of 30.
17M.1.SL.TZ1.S_9c:
The line $y = kx - 5$ is a tangent to the curve of $f$ . Find the values of $k$ .
18N.2.AHL.TZ0.H_10d:
Suppose that the probability of Archie receiving more than 10 emails in total on any one day is 0.99. Find the value of λ.
EXM.2.SL.TZ0.5e:
State the hypotheses for this t-test.
22M.1.SL.TZ2.7a:
Show that the test score of $25$ would not be considered an outlier.
18M.1.AHL.TZ2.H_3b.i:
Find μ, the expected value of X.
18N.2.AHL.TZ0.H_10a.ii:
Using this distribution model, find the standard deviation of $X$ .
18N.2.AHL.TZ0.H_10b.i:
an estimate for the mean number of emails received per working day.
19M.2.SL.TZ1.S_9b:
Find u.
18M.1.SL.TZ1.S_2a:
Find the value of the interquartile range.
17M.1.SL.TZ1.S_6a.i:
Write down the gradient of the curve of $f$ at P.
17M.2.SL.TZ2.S_8b.i:
Write down the coordinates of A.
EXM.2.SL.TZ0.5f:
Find the p-value for this t-test.
EXM.2.SL.TZ0.5c.ii:
The current drug.
EXM.1.SL.TZ0.5c:
Give an example of a set of data with 7 numbers in it that does have an outlier, justify this fact by stating the Interquartile Range.
19M.1.SL.TZ2.S_9d:
The following diagram shows the graph of $f$ for 0 ≤ $x$ ≤ 3. Line $M$ is a tangent to the graph of $f$ at point P.

Given that $M$ is parallel to $L$ , find the $x$ -coordinate of P.
18M.1.AHL.TZ2.H_3b.ii:
Find P(X > μ).
SPM.1.SL.TZ0.3b:
Find the value of $k$ .
EXM.2.SL.TZ0.5g:
State the conclusion of this test, in context, giving a reason.
17N.1.SL.TZ0.S_5a:
Find $(g \circ f)(x)$ .
19M.2.SL.TZ1.S_9d.iii:
Hence or otherwise, find the obtuse angle formed by the tangent line to $f$ at $x = 8$ and the tangent line to $f$ at $x = 2$ .
17M.2.SL.TZ2.S_8b.ii:
Write down the rate of change of $f$ at A.
18N.2.AHL.TZ0.H_10a.i:
Using this distribution model, find ${\text{P}}\left( {X < 60} \right)$ .
18M.1.SL.TZ1.S_2b:
One student sent k text messages, where k > 11 . Given that k is an outlier, find the least value of k.
18N.2.AHL.TZ0.H_10e:
Now suppose that Archie received exactly 20 emails in total in a consecutive two day period. Show that the probability that he received exactly 10 of them on the first day is independent of λ.
19M.2.SL.TZ1.S_9d.i:
Find $\left( {f \circ f} \right)\left( x \right)$ .
17M.2.SL.TZ2.S_8c.i:
Find the coordinates of B.
18N.2.AHL.TZ0.H_10c:
Give one piece of evidence that suggests Willow’s Poisson distribution model is not a good fit.
19M.2.SL.TZ1.S_9c:
Find the acute angle between $y = x$ and $L$ .
18N.1.SL.TZ0.S_10b.ii:
Hence, find the equation of L in terms of $a$ .
18N.1.SL.TZ0.S_10a:
Find the coordinates of P.
16N.2.SL.TZ0.S_10a:
(i) Find the value of $c$ .

(ii) Show that $b = \frac{\pi }{6}$ .

(iii) Find the value of $a$ .
EXM.2.SL.TZ0.5a:
State the name for this type of sampling technique.
SPM.1.SL.TZ0.3a:
State whether the data is discrete or continuous.
17M.1.SL.TZ1.S_6b:
Determine the concavity of the graph of $f$ when $4 < x < 5$ and justify your answer.
17M.1.SL.TZ1.S_9b:
Find the value of $a$ .
16N.2.AHL.TZ0.H_11d:
(i) Hence show that $X$ has two modes ${m_1}$ and ${m_2}$ .

(ii) State the values of ${m_1}$ and ${m_2}$ .
18N.1.SL.TZ0.S_10b.i:
Find $f'\left( x \right)$ .
17M.1.SL.TZ1.S_6a.ii:
Find the equation of the normal to the curve of $f$ at P.
19M.1.SL.TZ2.S_9b:
Line $L$ passes through the origin and has a gradient of ${\text{tan}}\,\theta$ . Find the equation of $L$ .
17M.1.SL.TZ1.S_9a:
Find the value of $p$ .
19M.1.SL.TZ2.S_9a:
Find the value of ${\text{tan}}\,\theta$ .
18N.1.SL.TZ0.S_10c:
The graph of $f$ has a local minimum at the point Q. The line L passes through Q.

Find the value of $a$ .
16N.2.AHL.TZ0.H_11b:
Find the values of the constants $a$ and $b$ .
16N.2.AHL.TZ0.H_11a:
Show that ${\text{P}}(X = 3) = 0.001$ and ${\text{P}}(X = 4) = 0.0027$ .
18N.2.SL.TZ0.S_10c:
When $t$ = 0, the volume of water in the container is 2.3 m³. It is known that the container is never completely full of water during the 4 hour period.

Find the minimum volume of empty space in the container during the 4 hour period.
19M.1.SL.TZ2.S_9c:
Find the derivative of $f$ .
EXN.1.SL.TZ0.1b:
The Principal selects the students for the sample by asking those who took part in a previous survey if they would like to take part in another. She takes the first of those who reply positively, up to the maximum needed for the sample.

State which two of the sampling methods listed below best describe the method used.

Stratified Quota Convenience Systematic Simple random
16N.2.SL.TZ0.S_10b:
(i) Write down the value of $k$ .

(ii) Find $g(x)$ .
17M.2.SL.TZ2.S_8d:
Let $R$ be the region enclosed by the graph of $f$ , the $x$ -axis, the line $x = b$ and the line $x = a$ . The region $R$ is rotated 360° about the $x$ -axis. Find the volume of the solid formed.
EXM.1.SL.TZ0.5b:
Hence, show that a data set with only 5 numbers in it cannot have any outliers.
17M.2.SL.TZ2.S_8a:
Find the value of $p$ .
SPM.1.SL.TZ0.3c:
It was not possible to ask every person in the school, so the Headmaster arranged the student names in alphabetical order and then asked every 10th person on the list.

Identify the sampling technique used in the survey.
18N.2.SL.TZ0.S_10a:
Find the volume of the container.
19M.2.SL.TZ1.S_3a:
Find $f'\left( x \right)$ .
21M.2.SL.TZ1.1d:
Determine whether Jason is correct. Support your reasoning.
21M.1.AHL.TZ1.11a:
Identify the type of sampling used by the restaurant manager.
16N.2.AHL.TZ0.H_11e:
Determine the minimum value of $x$ such that the probability Kati receives at least one free gift is greater than 0.5.
18N.2.SL.TZ0.S_10b.ii:
During the interval $p$ < $t$ < $q$ , he volume of water in the container increases by $k$ m³. Find the value of $k$ .
EXM.1.SL.TZ0.5a:
Recalling definitions, such as the Lower Quartile is the $\frac{{n + 1}}{4}th$ piece of data with the data placed in order, find an expression for the Interquartile Range.
17N.1.SL.TZ0.S_5b:
Given that $\mathop {\lim }\limits_{x \to + \infty } (g \circ f)(x) = - 3$ , find the value of $b$ .
19M.2.SL.TZ1.S_9a:
Find the gradient of $L$ .
19M.2.SL.TZ1.S_3b:
The graph of $f$ has a horizontal tangent line at $x = 0$ and at $x = a$ . Find $a$ .
19M.2.SL.TZ1.S_9d.ii:
Hence, write down ${f^{ - 1}}\left( x \right)$ .
21M.2.SL.TZ1.1a:
State which of the two sampling methods, systematic or quota, Jason has used.
18M.1.AHL.TZ2.H_3a:
Find the value of p.
18N.2.SL.TZ0.S_10b.i:
Find the value of $p$ and of $q$ .
16N.2.SL.TZ0.S_10c:
(i) Find $w$ .

(ii) Hence or otherwise, find the maximum positive rate of change of $g$ .
21M.3.AHL.TZ1.2b:
Show that $11$ employees are selected for the sample from the national department.
21M.2.SL.TZ2.1a:
State the sampling method being used.
16N.2.AHL.TZ0.H_11c:
Deduce that $\frac{{{\text{P}}(X = n)}}{{{\text{P}}(X = n - 1)}} = \frac{{0.9(n - 1)}}{{n - 3}}$ for $n > 3$ .
EXN.1.SL.TZ0.1a:
Calculate the number of grade $12$ students who should be in the sample.
17M.2.SL.TZ2.S_8c.ii:
Find the the rate of change of $f$ at B.
18M.1.SL.TZ1.S_7:
Consider f(x), g(x) and h(x), for x∈ $\mathbb{R}$ where h(x) = $\left( {f \circ g} \right)$ (x).

Given that g(3) = 7 , g′ (3) = 4 and f ′ (7) = −5 , find the gradient of the normal to the curve of h at x = 3.

Hide related questions

Topic 4—Statistics and probability

Question

Markscheme

Examiners report

Syllabus sections

View options