Lecture 1 Introduction
Statistics links physical theories with experiments
- Theory + response of measurement apparatus = model prediction
- Observations have uncertainty on many levels
- Statistics allows us to quantify this with probablility
Definition of probability
- Kolmogorov Axioms (1933)
- Defined in set theory
- Can be compacted to three axioms
- For all A subset S, P(A) >= 0
- P(S) = 1
- if A^B = empty set, P(a or b) = P(A) + P(B)
- Also define conditional probability. This cannot be derived from the first axioms
Interpretations of Probability
- The axioms provide no interpretations of the elements of the sample space.
- Frequentist statistics treat probability the limiting frequency
- A, B, … are outcomes of an experiment
- That can be repeated an infinite number of times
- Probability is limiting frequency
- What does it mean to say whether a theory is favoured or disfavoured?
- Preferred theories predict a high probability for the data that is ‘like’ the data observed
- Bayesian or Subjective statistics treats probability as a degree of belief
- A, B, … are hypotheses
- P(A) = degree of belief that A is true
- S is sometimes called the hypothesis space
- As opposed to frequentist interpretation, bayes theroem says “if your priors wehre p, then it says how these probabilities should change in light of the data.”
- There is NO recipe for finding the prior probabilities. This is the subjective nature of bayesian statistics.
- You can’t enumerate all the hypotheses! (denominator)
- Both these interpretations are consistent with the Kolmogorov axioms
- Then it also satisfies Bayes theorem
Bayes theorem
- Relates P(AgivenB) to P(BgivenA)
- link the essay
- Both interpretations of statistics is consistent with bayes theorem
- add these things
Law of total probability
- We express P(B) as a sum of P(BgivenA_i)P(A_i)
- This is often used in bayes theorem
Probability density and mass fn
- The first case is continuous, the second is discrete
Cumulative Distribution fn
- We can integrate the density fn
- Or differentiate the cumulative distribution to get the denisty fn
Expectation values
- can be used to summarise a complicated pdf
- E(X) = mu “centre of gravity” of the pdf
Lecture 2: Parameter estimation
Outcomes to probabilities - Frequentist! There are also bayesian
Hypotheses and likelihoods
- A rules that assigns a probability to each data outcome
- P(x given H), “what’s the probability of x under the assumption of some hypothesis”
- this is likelihood function
- We fix x in L, so x is hidden.
- is not a pdf for the parameter. It’s a pdf for the data!
Parameters
- of a pdf are any constants that characterise it.
- we want a function, or estimator, to estimate the parameters.
- theta hat of x
Properties of estimators
- we want small bias
- E(theta hat) - theta
- we want small variance
- V(theta hat)
- These are conflicting criteria, optimising for both is difficult.
Maximum likelihood estimators
- Find theta such that theta hat is maximised.
- Equivalent to maximising log likelihood
- MLEs are not guaranteed to have optimal properties.
- ??
Properties of MLE
- the bias is 0
- the variance is theta squared/n
Monte carlo variance
- In most cases, calculating the variance of estimators is not easy. One way is to simulate the experiment many times
- The distribution of estimates is guassian for ML in large sample limit
Variance of estimators from information inequality
- Sets lower bound on variance for any estimator.
- For small bias, the inequality becomes an equality.
- ⇒ MLE is ‘efficient’
Lecture 3: Hypothesis testing & confidence intervals
Suppose a measurement produces some data , consider hypotheses . We can reject or accept H0.
Setup a critical region such that the probability of finding the data there, under the hypothesis, is less than or equal to (discrete) a small probability alpha.
If is observed in the critical region, reject .
The alternative hypothesis motivates the placement of the critical region
Test significance / goodness of fit
’Discovery’ at 5 sigma in particle physics
Lecture 4: Machine Learning
Curve fitting