Statistics for Epidemics
Elective: Statistical considerations for epidemics

School of Medicine and Surgery F, Sapienza University of Rome
Prof. Andrea Bellelli


      To register your attendance please type in your matricola number
Notice that your attendance will be registered only if you completed the exercises linked to the text, and that you cannot interrupt and resume the session (but you can repeat it as many times as you like). Remember to press the button before leaving this page! A confirmation message will appear at the end of this page.

      Epidemics change considerably the statistical reasoning we usually apply to medical (laboratory) diagnosis because they suddenly change the pre-test probability that appears in Bayes' formula. An epidemics is a transient condition, which appears, lasts for some time and then vanishes (if the disease is constantly present in the population we call it endemic or sub-endemic, rather than epidemic). Thus the pre-test probability is correlated to the expected or measured number of new cases registered in the period in which the laboratory test is run.

      Some important concepts:
      1) the serial generation time is the clockwork of the epidemic. It is defined as the average time between contagion and ability of the patient to transmit the disease; may be slightly shorter than the incubation time of the disease. E.g. a person who has become infected with flu becomes infectious approx. 2-3 days later.
      2) The reproductive index R is the average number of persons to whom a single infected patient transmits the disease. In practice R measures the initial exponential growth rate of the epidemic over each serial generation time.
      3) The attack rate is the fraction of the population that has been affected at the end of the epidemics. Since every disease may give severe and mild cases, and the latter may escape diagnosis, one should distinguish the case attack rate, which is estimated on the number of diagnosed (thus severe) cases, from the infection attack rate, which is estimated on the total number of infections, including asymptomatic ones (e.g. estimated from random antibody tests in the population carried out at the end of the epidemic). Obviously: CAR < IAR.
      4) The lethality of the disease and the mortality of the epidemic are two different concepts. Lethality is the ratio of deaths to cases, and as in point 3 above, we distinguish the case fatality ratio from the infection fatality ratio, with CFR >> IFR. Mortality is the ratio of deaths due to the epidemic per 100 thousand or per 1 million population. Th relationship between mortality and lethality is: M = CAR x CFR = IAR x IFR. Mortality is usually measured per week, month or year; lethality is time independent.
      Some best studied examples of diseases causing epidemics are as follows:
      Measles: a RNA virus of the family of Paramyxoviridae transmitted by airborne spread (droplets), it has a very high transmission rate (estimated as R0, the average number of secondary cases produced by each affected individual; for measles R0 > 12) and all the population is susceptible (i.e. there is essentially no genetic resistence). Letality in developed countries is around 0.03%. The virus does not mutate easily and thus immunity lasts for life. In the rare instances where measles was introduced into a naive community it caused attack rates higher than 90%. The vast majority of cases are clinically evident. Before the vaccine was introduced in most countries over 90% of the population above 15 years of age presented antibodies against measles, indicative of previous disease and immunity. Measles epidemics typically occur every 5-7 years; this periodicity is explained because after each epidemics the fraction of the immune populationis raised above 90%, a level where herd immunity supervenes. Thus the next epidemics has to wait for new births to restore a sufficient fraction of sensible people in the population. Between epidemics the disease is maintained in the population under a sub-endemic condition. No animal reservoir exists. In conclusion measles is an excellent example of an epidemics which is limited only by the availability of non-immune members of the population, and a demonstratio of herd immunity.
      Influenza (flu): a RNA virus of the family of Orthomyxoviridae transmitted by airborne spread (droplets), it has a high transmission rate (estimated R0 ~ 4). Probably there is some genetic resistence, because the disease preferentially affects some groups over others (e.g. individuals of blood group B are affected to a higher extent). The virus causes an epidemics every winter; however since it mutates frequently, the antigenic serotype changes and having been affected in the preceding year may not confer immunity for the strain of next year. Each epidemics has an attack rate of about 10% of the population; letality is 0.1%, mostly observed on elderly people (note: measles spares elderly people because of its lifelong immunity; thus when comparing the letality of measles and flu one should refer to specific age cohorts, rather than to the overall letality of the two diseases). It is estimated that in winter epidemics about one third of the population may be susceptible, the remaining being at least partially immune because of some past epidemics. Flu presents a case where multiple factors may limit the attack rate; moreover, since a fraction of cases may be clinically mild, the attack rate may often be underestimated. In some cases severe pandemics were caused by the influenza virus: e.g. the spanish flu of 1918 had a worldwide attack rate of approx. 30% (500 million case, corresponding to one third of the world population at the time), with a letality of 5-10% (between 20 and 50 million deaths). Notice that the R0 equals the experimentally determined R value (Reff) at the beginning of the epidemics times the fraction of susceptibles; thus for most flu epidemics:
Reff = R0 * Fsusceptibles ~ 4 * 0.33 ~ 1.3.
      The simplest statistical model of epidemics was developed in the early 1920s, for teaching purposes, by Lowell Reed and Wade Hampton Frost then working at Johns Hopkins Medical School. To explore this model, on which the following discussion will be based, the student may visit the interactive program that simulates the time course of an epidemics (use the link at the end of this form).
      The model is based on the following series of assumptions:
1) The epidemics develops in a community of size N.
2) Each member of the population has equal probability of meeting any other member. The number of potentially contagious encounters (K) corresponds to the theoretical upper limit of the parameter R0 of the epidemics. This assumption is an obvious simplification because in real populations most encounters between individuals are non-random, and mainly occur in well characterized environments (family, school, work, etc.). Thus real communities are made up by more strictly integrated sub-communities, which loosely interact with each other.
3) Each member of the population is assigned only one of three possible states: S: susceptible; I: infected; R: recovered (removed, immune): SIR. The state of the individual may change along the irreversible sequence S → I →R. Notice that the state removed includes all subjects who got the disease, and does not distinguished those who healed and those who died of it, as both are incapable of transmitting the disease and further propagate the epidemic. This hypothesis is again a simplification and does not include other possible states that may occur in real epidemics, such as the state of the healthy carriers or the presence of an animal reservoir of the germ, which may participate to the disease transmission to humans.
4) The model calculates the probability of each individual to become affected (and successively to become immune) iteratively; each iteration occurs over a time window t corresponding to the average serial generation time (SGT) of the disease, i.e. the time intercurring between the contagion of a person and the transmission of the disease by that person to a further member of the community. The model assumes that the disease duration equals the SGT (i.e. the end of infectivity by that person). Real values for the SGT vary between less than 3 days for flu to over 2 weeks for measles; one may imagine that in the simulation the SGT equals one week. If the duration of the disease (and its infectivity) is longer than a SGT, the model requires a correction.
      The model iteratively calculates at each SGT the probability that every susceptible member of the population becomes affected. The probability that each individual meets another is p = K / (N-1). Thus the probability that each individual does not meet a specific other is: q = 1-p.
      In the original, deterministic, version of the model, here employed, disease is always transmitted if an individual in state S meets an individual in state I; every other encounter is irrelevant to disease transmission. The probability that an individual in state S does not meet any Infected during the time interval SGT is: q x I. If this is the case, S remains S in the next SGT; otherwise S becomes I. The number of S who are converted to I in every SGT results: S * (1-qI).
      At every SGT, however, every individual who was I in the preceding SGT is converted to R; thus individuals in state I at any SGT are only those who were just converted from S. The iterative formula that calculates the state of the population at the ith SGT is:
Ri = Ri-1 + Ii-1
Ii = Si-1 x (1-qIi-1)
Si = Si-1 - Ii
      At the beginning of the simulation the whole population is in state S, unless a correction is introduced to take into account vaccinated people or people made immune by a preceding epidemics, except one individual who is assigned state I (the zero case).
      It will be appreciated that the model includes several simplifications:
(i) the main reason why the epidemics ends its course is the exhaustion of susceptible individuals in the population. This implies that a vast majority of the population is affected (indeed it may be demonstrated that the model is not compatible with epidemics that affect less than 50% of the susceptible individuals in the population). This is consistent with some infectious diseases (e.g. measles or smallpox) but surely not with the majority of them.
(ii) The model is incompatible with diseases that require more states that S(usceptible), I(nfected), and R(ecovered). For example it is incompatible with typhoid fever, for which the state of the healthy carrier exists, or with diseases having an animal reservoir, e.g. plague by Yersinia pestis which affects many species of rodents.
(iii) The model cannot describe epidemics due to diseases in which the I state is prolonged and overlaps successive generation times, e.g. AIDS.
      More complex models are available to describe the above cases, but they lack the immediate simplicity of the Reed-Frost model and result difficult to grasp for physicians not specializing in medical statistics.

      The Reed-Frost model grasps some essential characteristics of epidemics, that we may summarize as follows:
(i) an epidemics has a beginning, a peak phase, and then vanishes. It may either disappear or be maintained as a sub-endemic condition. The time duration of the epidemics is expressed as a multiple of its characteristic serial generation time.
(ii) Containment (e.g. quarantine) operates by reducing the number of potentially infectious encounters K; you can explore the effect of quarantine using the appropriate interactive program (link at the end of this form). This effectively reduces the average number of transmission per infected individual (R0). Vaccination operates by directly converting S to R(recovered, immunes), bypassing the I state. You can explore the effect of vaccination by the appropriate interactive program (use the link at the end of this form)
(iii) The time evolution of the epidemics dictates the pre-test probability of disease.
(iv) The actual duration and extension of the epidemics in a country may be envisaged as the sum of several interrelated Reed-Frost episodes. This is because the model assumes an equal probability of encounter among the members of the population and is suitable to describe an epidemics affecting a village or a small city. Over larger expanses of space the epidemics has to be carried from a village to another. Each village or city independently follows the Reed-Frost model but starts at a different time. In some cases it is possible to trace an actual path of the diffusion of the epidemics, that sequentially affects villages and cities over a communication pathway. Obviously, in this case the duration of the epidemics is longer than one would expect on the basis of its serial generation time. The student may explore this type of progression using the interactive program for the "two villages" case (use the link at the end of this form).

      The fraction of the members of the population that contracted the disease is called the attack rate of the epidemics. A feature of the Reed-Frost model is that an epidemics ends because of the decrease of the Susceptible population below the threshold required for disease transmission. This threshold depends on the parameter K, but is never lower than half of N, i.e. an epidemics that obeys this mechanism cannot affect less than 50% of the population. This is plausible for some diseases, e.g. measles and smallpox occurring in a previously naive population, but many epidemics stop long before reaching this threshold. There are several possible reasons that contribute to lower the attack rate, an incomplete list of which is as follows:
1) the probability of disease transmission depends on variable external factors (e.g. climate, in which case the epidemics follows a seasonal course).
2) Contagion affects only or preferentially a fraction of the population because of some risk factor (e.g. old age, risky lifestyle, professional exposure, living under poor hygienic conditions, etc.).
3) Some members of the population, though not immune, present different proneness to develop the disease (a paradigmatic case is that of the different sensitivity to HIV due to genetic polymorphism of the CCR5 receptor). A very minor modification of the model can be developed to take into account the last possibility, as the student may verify using the interactive program for the "two sub-populations" case (link at the end of this form).
4) A fraction of the population is immune because of vaccination or because of a previous epidemics by the same or a related germ. For example, the usual condition for measles in the pre-vaccine era was to cause epidemics in populations that were immune for 80% or more because of previous epidemics.
5) A significant fraction of the cases of disease may run an asymptomatic course and not be diagnosed. In this case the attack rate may be quite high, but it appears low because many cases are not correctly diagnosed. The measure of the fraction of population presenting specific antibodies at the end of the epidemics may provide an estimate of the effective attack rate. A very interesting case occurs if the mild cases are also poorly contagious, and behave like a sort of vaccination experiment occurring in parallel with the disease. The effect of subclinical cases on the attack rateis simulated by the appropriate interactive program (use the link at the end of this form).

Attendance will be registered after completion of the exercises; you can repeat each of them as many times as you like (remember to press the send button at the end of each page). It is strongly suggested that you follow the order indicated below.
Your data: 0 0 0 0 0 0
the Reed-Frost modelnot completed
the effect of quarantine: not completed
the case of two villages:not completed
the effect of vaccination: not completed
the presence of clinically silent cases: not completed
the case of two sub-populations: not completed

      Home of this course