Dealing with noisy count data: strategies for tackling over-dispersion, using R

Chris Mainey

PhD Student - UCL

Intelligence Analyst - University Hospital Birmingham NHS FT


  • Count data
  • Poisson model
  • Over-dispersion
    • Scaling
    • Mixture models
    • Multi-level models
  • I use R mostly, so most examples will be in R!

Regression (simplified)

.pull-left[ ]

.pull-right[ Want to predict Y, using X: \[ y = \alpha + \beta x + \varepsilon\] + \(\alpha\)<U+F061> = Intercept + \(\beta\) = Coefficient (how much x affects y) + \(\varepsilon\)= Error (Residual)

  • This is a linear regression, which has an exact solution. The solutions discussed in the following slides are more complicated as they are estimated, with no exact solution. ]

Properties of count data

  • Discrete
    • E.g. 10 or 11 patients, but not 10.5
  • Range from zero to infinity
    • Usually can?t have a negative count
  • Occur in a fixed time period, with a known average rate.
  • Not normally distributed

Poisson Distribution, increasing mean count

What is overdispersion (OD)?

  • Poisson dist. assumes Mean = variance
  • If variance > mean, Poisons model will underestimate the variance:
    • SE & CIs too small, ?Significance? overstated
  • Many mechanisms, including:
    • Mis-specification (lack of predictors, poorly parameterised)
    • Presence of outliers
    • Variation between response probabilities

Options with Poisson models:

  • Fit simple model and ignore OD
  • Fit model, then use techniques to scale/estimate OD and correct
    • Robust SE or Bootstrap
  • Use a model that accounts for this:
    • Scaled Poisson or related
    • Complex variance structure


  • Resampling with replacement ? (Efron 1979)
  • Create sampling distribution of mean
  • Handy, because this is normally distributed (if parametric bootstrap)
  • R: ?boot? package or ? car::Boot ? is a convenience wrapper for glm.

Funnel plot with OD-adjusted limits

  • Funnel control limits at 2 & 3 s in left panel, inflated by additive scale factor t 2 in right panel

?Mixture? models (Cameron & Trivedi, 2013), (Rabe-Hesketh and Skrondal, 2012)

  • Two distributions used
  • ?Between? & ?Within? variance
  • Commonly Negative Binomial (NB1)
    • NB1 group-specific mean, multiplicative
    • NB2 gamma/Poisson, quadratic
  • Weight the mean differently, NB2 gives higher weight to smaller counts

Mixture > Mixed Models (Goldstein 2010)

  • Sometimes a model structure has implicit levels
  • Variance can be partitioned between levels:
    • E.g. patients within GP practises submitting to a trial
    • Patients followed up at several points over time
  • Breaks the normal regression assumption of ?independence? and ?homoscedasticity
    • Can lead to OD

Random intercept Model

Random intercept Model

Other options

  • Principle Components / Factor Analysis
  • Generalised Additive Models
  • Tree-based models / Random Forests

Generalised Additive Models (Hastie &Tibshirani, 1986)

  • Smooth functions of variables:
  • Lost of options for smoothers:
    • Cubic Spline
    • Thin-plate Splines
    • Tensor products
  • Need to estimate degree of smoothness and penalty term


  • In R, most popular package mgcv (Wood, 2017)
    • mgcv::gam(y ~ s(x, bs=?cr?))
    • s() is a smoother construct
  • Estimates smooth parameters as part of model
  • Parametric terms can still be used
  • Random Effects can be included if simple
    • Called by gamm4 or gamm (using lme4 or nlme )

GAM Pros and Cons

  • Smooth functions often represent data better than raw values
  • Requires choosing a smoother, R can estimate parameters including knots and penalty
  • Can reduce overdispersion due to noise
  • Can be heavy on degrees of freedom
  • Need to be careful of over-fitting
  • More complex regression mechanisms

Random Forests (2)

  • Combine tree-based methods with Bootstrapping
    • Random sample of both data & parameters
  • Pros and cons:
    • Predict very well
    • Less likely to over-fit
    • Linearity or distribution not really an issue
  • Hard to visualise/understand
    • Not able to use random effects

Best options for Incident data?

  • Definite clustering:
    • Random-intercepts
  • Collinearity / Noisy data
    • Generalised Additive Models
  • Marginal model ? use additive-OD model


  • GOLDSTEIN, H. (2010). Multilevel Statistical Models , John Wiley & Sons Inc.
  • GREVEN , S. & KNEIB, T. 2010. On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika, 97 , 773-789.
  • HASTIE, T. & TIBSHIRANI, R. 1986. Generalized Additive Models. Statist. Sci. 1 no . 3, 297–310. doi:10.1214/ss/1177013604.
  • HUBER , P. J. 1967 The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, Berkeley , Calif.: University of California Press, 221-233.
  • MCCULLAGH, P. & NELDER, J. A. 1983. Generalized linear models, London, Chapman & Hall .
  • NELDER , J. A. & WEDDERBURN, R. W. M. 1972. Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135 , 370-384 .
  • RABE-HESKETH, S. & SKRONDAL, A. 2012. Multilevel and Longitudinal Modeling Using Stata, Volumes I and II, Third Edition. 3rd ed.: Taylor & Francis .
  • SPEIGELHALTER, D.J., 2005a. Funnel plots for comparing institutional performance. Stat Med, 24 (8), pp. 1185-1202 VER HOEF, J. M. & BOVENG, P. L. 2007. Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology, 88 , 2766-72 .
  • SPIEGELHALTER, D.J., 2005b. Handling over-dispersion of performance indicators. Qual Saf Health Care, 14 (5), pp. 347-351.
  • WOOD , S. N. 2017. Generalized Additive Models: An Introduction with R, Second Edition, Florida, USA, CRC Press .