\documentclass[12pt]{article}
\usepackage[margin=1.06in]{geometry}                
\geometry{letterpaper}                  
\usepackage{graphicx}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{hyperref, multicol}
\usepackage{pifont}
\usepackage{marvosym}
\hypersetup{colorlinks=false, allcolors=blue}
\newcommand{\noin}{\noindent}        
\newcommand{\SD}{\textnormal{SD}}
\newcommand{\var}{\textnormal{Var}}    
\newcommand{\cov}{\textnormal{Cov}}                                 
\newcommand{\corr}{\textnormal{Corr}}                  
\newcommand{\Bern}{\textnormal{Bern}}
\newcommand{\Bin}{\textnormal{Bin}}
\newcommand{\Geom}{\textnormal{Geom}}
\newcommand{\FS}{\textnormal{FS}}
\newcommand{\HGeom}{\textnormal{HGeom}}
\newcommand{\NBin}{\textnormal{NBin}}
\newcommand{\Pois}{\textnormal{Pois}}
\newcommand{\Expo}{\textnormal{Expo}}
\newcommand{\Unif}{\textnormal{Unif}}
\newcommand{\N}{\mathcal{N}}

                     
\begin{document}
 
\noindent {\large  \textbf{Stat 110 Homework 7, Fall 2017}} 

\bigskip

\noindent \textbf{Due}: Friday 11/3 at 5:00 pm, submitted as a PDF via the  \href{https://canvas.harvard.edu/courses/27764}{{course webpage}}. Please check carefully to make sure you upload the correct file. Your submission must be a single PDF file, no more than $20$ MB in size. It can be typeset or scanned, but must be clear and easily legible (not blurry or faint) and correctly rotated. No submissions on paper or by email will be accepted. Please show your work and give clear, careful, convincing justifications. See the syllabus for the collaboration policy. 

\bigskip

\noindent 1. (BH 7.25)  Two companies, Company 1 and Company 2,  have just been founded. Stock market crashes occur according to a Poisson process with rate $\lambda_0$. Such a crash would put both companies out of business. For $j \in \{1,2\}$, there may be an adverse event ``of type $j$," which puts Company $j$ out of business (if it is not already out of business) but does not affect the other company; such events occur according to a Poisson process with rate $\lambda_j$.  If there has not been a stock market crash or an adverse event of type $j$, then company $j$ remains in business. The three Poisson processes are independent of each other. Let $X_1$ and $X_2$ be how long Company 1 and Company 2 stay in business, respectively.

\medskip

\noin (a) Find the marginal distributions of $X_1$ and $X_2$.

\medskip

\noin (b) Find $P(X_1 > x_1, X_2 > x_2)$, and use this to find the joint CDF of $X_1$ and $X_2$. 


\bigskip

\noindent 2. (BH 7.45) A random triangle is formed in some way, such that all pairs of angles have the same joint distribution. What is the correlation between two of the angles (assuming that the variance of the angles is nonzero)?

\bigskip

\noindent 3. (BH 7.46) Each of $n \geq 2$ people puts his or her name on a slip of paper (no two have the same name). The slips of paper are shuffled in a hat, and then each person draws one (uniformly at random at each stage, without replacement). Find the standard deviation of the number of people who draw their own names.

\bigskip

\noindent 4. (BH 7.58) A statistician is trying to estimate an unknown parameter $\theta$ based on some data. She has available two independent estimators $\hat{\theta}_1$ and $\hat{\theta}_2$ (an estimator is a function of the data, used to estimate a parameter). For example, $\hat{\theta}_1$ could be the sample mean of a subset of the data and $\hat{\theta}_2$ could be the sample mean of another subset of the data, disjoint from the subset used to calculate $\hat{\theta}_1$. Assume that both of these estimators are unbiased, i.e., $E(\hat{\theta}_j) = \theta$. 

Rather than having a bunch of separate estimators, the statistician wants one combined estimator. It may not make sense to give equal weights to $\hat{\theta}_1$ and $\hat{\theta}_2$ since one could be much more reliable than the other, so she decides to consider combined estimators of the form
$$\hat{\theta} = w_1 \hat{\theta}_1  + w_2 \hat{\theta}_2,$$
a weighted combination of the two estimators. The weights $w_1$ and $w_2$ are nonnegative and satisfy $w_1+w_2=1$. 

\medskip

\noin (a) Check that $\hat{\theta}$ is also unbiased, i.e., $E(\hat{\theta})=\theta$. 

\medskip

\noin (b) Determine the optimal weights $w_1,w_2$, in the sense of minimizing the mean squared error $E(\hat{\theta}-\theta)^2$. Express your answer in terms of the variances of $\hat{\theta}_1$ and $\hat{\theta}_2$. The optimal weights are known as \emph{Fisher weights}. 

\smallskip

\noin Hint: As discussed in Exercise 55 from Chapter 5, mean squared error is variance plus squared bias, so in this case the mean squared error of $\hat{\theta}$ is $\var(\hat{\theta})$. Note that there is no need for multivariable calculus here, since $w_2=1-w_1$. 

\medskip

\noin (c) Give a simple description of what the estimator found in (b) amounts to if the data are i.i.d.~random variables $X_1,\dots,X_n,Y_1,\dots,Y_m$, $\hat{\theta}_1$ is the sample mean of $X_1,\dots,X_n$, and $\hat{\theta}_2$ is the sample mean of $Y_1,\dots,Y_m$.

\bigskip

\noindent 5. (BH 7.63) There will be $X \sim \Pois(\lambda)$ courses offered at a certain school next year. 

\medskip

\noin (a) Find the expected number of choices of $4$ courses (in terms of $\lambda$, fully simplified), assuming that simultaneous enrollment is allowed if there are time conflicts.

\medskip

\noin (b) Now suppose that simultaneous enrollment is not allowed. Suppose that most faculty only want to teach on Tuesdays and Thursdays, and most students only want to take courses that start at 10 am or later, and as a result there are only four possible time slots: 10 am, 11:30 am, 1 pm, 2:30 pm (each course meets Tuesday-Thursday for an hour and a half, starting at one of these times). Rather than trying to avoid major conflicts, the school schedules the courses completely randomly: after the list of courses for next year is determined, they randomly get assigned to time slots, independently and with probability $1/4$ for each time slot. 

\medskip

\noin Let $X_{\textrm{am}}$ and $X_\textrm{pm}$ be the number of morning and afternoon courses for next year, respectively (where ``morning" means starting before noon). Find the joint PMF of $X_{\textrm{am}}$ and $X_\textrm{pm}$, i.e., find $P(X_{\textrm{am}} = a, X_\textrm{pm} = b)$ for all $a,b$. 

\medskip

\noin (c) Continuing as in (b), let $X_1,X_2,X_3,X_4$ be the number of 10 am, 11:30 am, 1 pm, 2:30 pm courses for next year,  respectively. What is the joint distribution of $X_1,X_2,X_3,X_4$? (The result is completely analogous to that of $X_{\textrm{am}}, X_{\textrm{pm}}$; you can derive it by thinking conditionally, but for this part you are also allowed to just use the fact that the result is analogous to that of (b).) Use this to find the expected number of choices of 4 non-conflicting courses (in terms of $\lambda$, fully simplified). What is the ratio of the expected value from (a) to this expected value?

\bigskip

\noindent 6. (BH 7.70)  In humans (and many other organisms), genes come in pairs. Consider a gene of interest, which comes in two types (\emph{alleles}): type $a$ and type $A$. The \emph{genotype} of a person for that gene is the types of the two genes in the pair: $AA, Aa,$ or $aa$ ($aA$ is equivalent to $Aa$).  According to the Hardy-Weinberg law, for a population in equilibrium the frequencies of $AA,Aa,aa$ will be $p^2,2p(1-p),(1-p)^2$ respectively, for some $p$ with $0<p<1$. Suppose that the Hardy-Weinberg law holds, and that $n$ people are drawn randomly from the population, independently. Let $X_1,X_2,X_3$ be the number of people in the sample with genotypes $AA,Aa,aa,$ respectively.

\medskip

\noin (a) What is the joint PMF of $X_1,X_2,X_3$? 

\medskip

\noin (b) What is the distribution of the number of people in the sample who have an $A$?

\medskip

\noin (c) What is the distribution of how many of the $2n$ genes among the people are  $A$'s?

\medskip

\noin (d) Now suppose that $p$ is unknown, and must be estimated using the observed data $X_1,X_2,X_3$. The \emph{maximum likelihood estimator} (MLE) of $p$ is the value of $p$ for which the observed data are as likely as possible. Find the MLE of $p$. 

\medskip

\noin (e) Now suppose that $p$ is unknown, and that our observations can't distinguish between $AA$ and $Aa$. So for each person in the sample, we just know whether or not that person is an $aa$ (in genetics terms, $AA$ and $Aa$ have the same \emph{phenotype}, and we only get to observe the phenotypes, not the genotypes). Find the MLE of $p$.

\bigskip

\noindent 7. (BH 7.76) Let $(X,Y)$ be Bivariate Normal with $X \sim \N(0,\sigma^2_1)$ and $Y \sim \N(0, \sigma^2_2)$ marginally and with $\corr(X,Y)=\rho$. Find a constant $c$ such that $Y-cX$ is independent of $X$. 

\smallskip

\noin Hint: First find $c$ (in terms of $\rho, \sigma_1, \sigma_2$) such that $Y-cX$ and $X$ are uncorrelated.

\end{document}