\documentclass[12pt]{article}
\usepackage[margin=1.06in]{geometry}                
\geometry{letterpaper}                  
\usepackage{graphicx}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{hyperref, multicol}
\usepackage{pifont}
\usepackage{marvosym}
\hypersetup{colorlinks=false, allcolors=blue}
\newcommand{\noin}{\noindent}        
\newcommand{\SD}{\textnormal{SD}}
\newcommand{\var}{\textnormal{Var}}            
\newcommand{\Bern}{\textnormal{Bern}}
\newcommand{\Bin}{\textnormal{Bin}}
\newcommand{\Geom}{\textnormal{Geom}}
\newcommand{\FS}{\textnormal{FS}}
\newcommand{\HGeom}{\textnormal{HGeom}}
\newcommand{\NBin}{\textnormal{NBin}}
\newcommand{\Pois}{\textnormal{Pois}}
\newcommand{\Expo}{\textnormal{Expo}}
\newcommand{\Unif}{\textnormal{Unif}}
\newcommand{\N}{\mathcal{N}}

                     
\begin{document}
 
\noindent {\large  \textbf{Stat 110 Homework 5, Fall 2017}} 

\bigskip

\noindent \textbf{Due}: Friday 10/20 at 5:00 pm, submitted as a PDF via the  \href{https://canvas.harvard.edu/courses/27764}{{course webpage}}. Please check carefully to make sure you upload the correct file. Your submission must be a single PDF file, no more than $20$ MB in size. It can be typeset or scanned, but must be clear and easily legible (not blurry or faint) and correctly rotated. No submissions on paper or by email will be accepted. Please show your work and give clear, careful, convincing justifications. See the syllabus for the collaboration policy. 

\bigskip

\noindent 1. Use Poisson approximations to investigate the following types of coincidences. 

\medskip

\noin (a) Suppose that there are $1600$ sophomores at Harvard. What is the probability that there are two Harvard sophomores who were born not only on the same day of the year, but also at the same hour \emph{and} the same minute (e.g., both sophomores were born at 2:37 pm on March 31, not necessarily in the same year)? Assume that all 365 days of the year are equally likely (and exclude February 29), all times of day are equally likely for a birth, and independence between different students' births (despite the existence of twins:
\url{http://news.harvard.edu/gazette/story/2016/05/doubling-up-at-harvard/} ). 

\medskip

\noin (b) With assumptions as in (a), what is the probability that there are \emph{four} Harvard sophomores who were born not only on the same day, but also at the same hour (e.g., all four were born between 2 pm and 3 pm on  March 31, not necessarily in the same year)?

Give two different Poisson approximations for this value, one based on creating an indicator r.v.~for each quadruplet of sophomores, and the other based on creating an indicator r.v.~for each possible day-hour. Which do you think is more accurate? Why? 

\bigskip

\noindent 2. (BH 4.83)  The legendary Caltech physicist Richard Feynman and two editors of \emph{The Feynman Lectures on Physics}  (Michael Gottlieb and Ralph Leighton) posed the following problem about how to decide what to order at a restaurant. You plan to eat $m$ meals at a certain restaurant, where you have never eaten before. Each time, you will order one dish.

The restaurant has $n$ dishes on the menu, with $n \geq m$. Assume that if you had tried all the dishes, you would have a definite ranking of them from $1$ (your least favorite) to $n$ (your favorite). If you knew which your favorite was, you would be happy to order it always (you never get tired of it). 

Before you've eaten at the restaurant, this ranking is completely unknown to you. After you've tried some dishes, you can rank those dishes amongst themselves, but don't know how they compare with the dishes you haven't yet tried. There is thus an \emph{exploration-exploitation tradeoff}: should you try new dishes, or should you order your favorite among the dishes you have tried before?

A natural strategy is to have two phases in your series of visits to the restaurant: an \emph{exploration phase}, where you try different dishes each time, and an \emph{exploitation phase}, where you always order the best dish you obtained in the exploration phase. Let $k$ be the length of the exploration phase (so $m-k$ is the length of the exploitation phase). 

Your goal is to maximize the expected sum of the ranks of the dishes you eat there (the rank of a dish is the ``true" rank from $1$ to $n$ that you would give that dish if you could try all the dishes). Show that the optimal choice is $$k = \sqrt{2(m+1)}-1,$$ 
or this rounded up or down to an integer if needed. Do this in the following steps:

\medskip

\noin (a) Let $X$ be the rank of the best dish that you find in the exploration phase. Find the expected sum of the ranks of the dishes you eat, in terms of $E(X)$.

\medskip

\noin (b) Find the PMF of $X$, as a simple expression in terms of binomial coefficients. 

\medskip

\noin (c) Show that $$E(X) =  \frac{k(n+1)}{k+1}.$$

\smallskip

\noin Hint: Use Example 1.5.2 (about the team captain) and Exercise 18 from Chapter 1 (about the hockey stick identity).

\medskip

\noin (d) Use calculus to find the optimal value of $k$.

\bigskip

\noin 3. (BH 5.1) The Rayleigh distribution from Example 5.1.7 has PDF 
\[f(x) = xe^{-x^2 / 2}, \quad x > 0.\]
Let $X$ have the Rayleigh distribution.

\medskip

\noin (a) Find $P(1<X<3)$.

\medskip

\noin (b) Find the first quartile, median, and third quartile of $X$; these are defined to be the values $q_1,q_2, q_3$ (respectively) such that $P(X \leq q_j) = j/4$ for $j=1,2,3$. 

\bigskip

\noin 4. (BH 5.5) A circle with a random radius $R \sim \Unif(0,1)$ is generated. Let $A$ be its area.

\medskip

\noin (a) Find the mean and variance of $A$, without first finding the CDF or PDF of $A$.

\medskip

\noin (b) Find the CDF and PDF of $A$. 

\bigskip

\noin 5. (BH  5.28) Walter and Carl both often need to travel from Location A to Location B. Walter walks, and his travel time is Normal with mean $w$ minutes and standard deviation $\sigma$ minutes (travel time can't be negative without using a tachyon beam, but assume that $w$ is so much larger than $\sigma$ that the chance of a negative travel time is negligible). 

Carl drives his car, and his travel time is Normal with mean $c$ minutes and standard deviation $2 \sigma$ minutes (the standard deviation is larger for Carl due to variability in traffic conditions). Walter's travel time is independent of Carl's. On a certain day, Walter and Carl leave from Location A to Location B at the same time.

\medskip

\noin (a) Find the probability that Carl arrives first (in terms of $\Phi$ and the parameters). For this you can use the important fact, proven in the next chapter, that if $X_1$ and $X_2$ are independent with $X_i \sim \N(\mu_i, \sigma^2_i)$, then $X_1+X_2 \sim \N(\mu_1+\mu_2, \sigma^2_1 + \sigma^2_2)$.

\medskip

\noin (b) Give a fully simplified criterion (\emph{not}  in terms of $\Phi$), such  that Carl has more than a  $50\%$ chance of  arriving first if and only if the criterion is satisfied.

\medskip

\noin (c)  Walter and Carl want to make it to a meeting at Location B that is scheduled to begin $w+10$ minutes after they depart from Location A. Give a fully simplified criterion (\emph{not} in terms of $\Phi$)  such that Carl is more likely than Walter to make it on time for the meeting if and only if the criterion is satisfied.


\bigskip

\noin 6. (BH  5.40) Let $T$ be the time until a radioactive particle decays, and suppose (as is often done in physics and chemistry) that $T \sim \Expo(\lambda)$. 

\medskip

\noin (a) The \emph{half-life} of the particle is the time at which there is a $50\%$ chance that the particle has decayed (in statistical terminology, this is the \emph{median} of the distribution of $T$). Find the half-life of the particle.

\medskip

\noin (b) Show that for $\epsilon$ a small, positive constant, the probability that the particle decays in the time interval $[t,t+\epsilon]$, given that it has survived until time $t$, does not depend on $t$ and is approximately proportional to $\epsilon$.

\smallskip

\noin Hint: $e^x \approx 1+x$ if $x \approx 0$.

\medskip

\noin (c) Now consider $n$ radioactive particles, with i.i.d.~times until decay $T_1,\dots,T_n \sim \Expo(\lambda)$. Let $L$ be the first time at which one of the particles decays. Find the CDF of $L$. Also, find $E(L)$ and $\var(L)$.

\medskip


\noin (d) Continuing (c), find the mean and variance of $M=\max(T_1,\dots,T_n)$, the \emph{last} time at which one of the particles decays, \emph{without using calculus}.  

\smallskip

\noin Hint: Draw a timeline, apply (c), and remember the memoryless property.

\bigskip

\noin 7. (BH 5.42)  (a) Fred visits Blotchville again. He finds that the city has installed an electronic display at the bus stop, showing the time when the previous bus arrived. The times between arrivals of buses are still independent Exponentials with mean $10$ minutes. Fred waits for the next bus, and then records the time between that bus and the previous bus. On average, what length of time between buses does he see?

\medskip

\noin (b) Fred then visits Blunderville, where the times between buses are also 10 minutes on average, and independent. Yet to his dismay, he finds that on average he has to wait more than 1 hour for the next bus when he arrives at the bus stop! How is it possible that the average Fred-to-bus time is greater than the average bus-to-bus time even though Fred arrives at some time between two bus arrivals? Explain this intuitively, and construct a specific discrete distribution for the times between buses showing that this is possible.

\end{document}