Lecture 1 Introduction

Statistics links physical theories with experiments

  • Theory + response of measurement apparatus = model prediction
  • Observations have uncertainty on many levels
  • Statistics allows us to quantify this with probablility

Definition of probability

  • Kolmogorov Axioms (1933)
  • Defined in set theory
  • Can be compacted to three axioms
    • For all A subset S, P(A) >= 0
    • P(S) = 1
    • if A^B = empty set, P(a or b) = P(A) + P(B)
  • Also define conditional probability. This cannot be derived from the first axioms

Interpretations of Probability

  • The axioms provide no interpretations of the elements of the sample space.
  • Frequentist statistics treat probability the limiting frequency
    • A, B, … are outcomes of an experiment
    • That can be repeated an infinite number of times
    • Probability is limiting frequency
    • What does it mean to say whether a theory is favoured or disfavoured?
      • Preferred theories predict a high probability for the data that is ‘like’ the data observed
  • Bayesian or Subjective statistics treats probability as a degree of belief
    • A, B, … are hypotheses
    • P(A) = degree of belief that A is true
    • S is sometimes called the hypothesis space
    • As opposed to frequentist interpretation, bayes theroem says “if your priors wehre p, then it says how these probabilities should change in light of the data.”
    • There is NO recipe for finding the prior probabilities. This is the subjective nature of bayesian statistics.
      • You can’t enumerate all the hypotheses! (denominator)
  • Both these interpretations are consistent with the Kolmogorov axioms
  • Then it also satisfies Bayes theorem

Bayes theorem

  • Relates P(AgivenB) to P(BgivenA)
  • link the essay
  • Both interpretations of statistics is consistent with bayes theorem
    • add these things

Law of total probability

  • We express P(B) as a sum of P(BgivenA_i)P(A_i)
  • This is often used in bayes theorem

Probability density and mass fn

  • The first case is continuous, the second is discrete

Cumulative Distribution fn

  • We can integrate the density fn
  • Or differentiate the cumulative distribution to get the denisty fn

Expectation values

  • can be used to summarise a complicated pdf
  • E(X) = mu “centre of gravity” of the pdf

Lecture 2: Parameter estimation

Outcomes to probabilities - Frequentist! There are also bayesian

Hypotheses and likelihoods

  • A rules that assigns a probability to each data outcome
    • P(x given H), “what’s the probability of x under the assumption of some hypothesis”
    • this is likelihood function
      • We fix x in L, so x is hidden.
    • is not a pdf for the parameter. It’s a pdf for the data!

Parameters

  • of a pdf are any constants that characterise it.
  • we want a function, or estimator, to estimate the parameters.
    • theta hat of x

Properties of estimators

  • we want small bias
    • E(theta hat) - theta
  • we want small variance
    • V(theta hat)
  • These are conflicting criteria, optimising for both is difficult.

Maximum likelihood estimators

  • Find theta such that theta hat is maximised.
  • Equivalent to maximising log likelihood
  • MLEs are not guaranteed to have optimal properties.
    • ??

Properties of MLE

  • the bias is 0
  • the variance is theta squared/n

Monte carlo variance

  • In most cases, calculating the variance of estimators is not easy. One way is to simulate the experiment many times
  • The distribution of estimates is guassian for ML in large sample limit

Variance of estimators from information inequality

  • Sets lower bound on variance for any estimator.
  • For small bias, the inequality becomes an equality.
  • MLE is ‘efficient’

Lecture 3: Hypothesis testing & confidence intervals

Suppose a measurement produces some data , consider hypotheses . We can reject or accept H0.

Setup a critical region such that the probability of finding the data there, under the hypothesis, is less than or equal to (discrete) a small probability alpha.

If is observed in the critical region, reject .

The alternative hypothesis motivates the placement of the critical region

Test significance / goodness of fit

’Discovery’ at 5 sigma in particle physics

Lecture 4: Machine Learning

Curve fitting