Statistics for Epidemics
Elective: Statistical considerations for epidemics

School of Medicine and Surgery F, Sapienza University of Rome
Prof. Andrea Bellelli


      To register your attendance please type in your surname and matricola number:

Notice that your attendance will be registered only if you completed the exercises linked to the text, and that you cannot interrupt and resume the session (but you can repeat it as many times as you like). Remember to press the button before leaving this page! A confirmation message will appear at the end of this page.

      Epidemics change considerably the statistical reasoning we usually apply to medical (laboratory) diagnosis because they suddenly change the pre-test probability that appears in Bayes' formula. An epidemics is a transient condition, which appears, lasts for some time and then vanishes (if the disease is constantly present in the population we call it endemic or sub-endemic, rather than epidemic). Thus the pre-test probability is correlated to the expected or measured number of new cases registered in the period in which the laboratory test is run.

      Some important concepts:
      1) the serial generation time is the clockwork of the epidemic. It is defined as the average time between contagion and ability of the patient to transmit the disease; may be slightly shorter than the incubation time of the disease. E.g. a person who has become infected with flu becomes infectious approx. 2-3 days later.
      2) The reproductive index R is the average number of persons to whom a single infected patient transmits the disease. In practice R measures the initial exponential growth rate of the epidemic over each serial generation time.
      3) The attack rate is the fraction of the population that has been affected at the end of the epidemics. Since every disease may give severe and mild cases, and the latter may escape diagnosis, one should distinguish the case attack rate, which is estimated on the number of diagnosed (thus severe) cases, from the infection attack rate, which is estimated on the total number of infections, including asymptomatic ones (e.g. estimated from random antibody tests in the population carried out at the end of the epidemic). Obviously: CAR < IAR.
      4) The lethality of the disease and the mortality of the epidemic are two different concepts. Lethality is the ratio of deaths to cases, and as in point 3 above, we distinguish the case fatality ratio from the infection fatality ratio, with CFR >> IFR. Mortality is the ratio of deaths due to the epidemic per 100 thousand or per 1 million population. Th relationship between mortality and lethality is: M = CAR x CFR = IAR x IFR. Mortality is usually measured per week, month or year; lethality is time independent.
      MODE OF TRANSMISSION OF EPIDEMIC DISEASES: Here we consider only epidemics due to infectious diseases that can be transmitted from the sick to the healthy. These diseases can be due to parasites, prokaryotes and viruses. Some epidemics are not due to transmissible diseases, but to environmental conditions that affect the whole population at the same time, e.g. scurvy, hypoiodic hypothyroidism, ricketts; these are not considered here. The modes of transmission of edpidmic infectious diseases vary:
- airborne diseases are infectious diseases affecting the respiratory tree; the patients emits droplets of saliva charged with the germ with cough; the healthy persons surrounding him inhale these droplets and their respiratory tree is colonized by the germ they contain. Examples: flu, measles, smallpox ....
- oro-fecal route: the germ is ingested along with contaminated water or, more rarely, food, and colonizes the digestive tract. It is emitted with the feces and contaminates the soil or water reservoirs from which it finds its way to a new host. This type of contagion is effectively counteracted by public hygiene measures (safe drainage systems). Examples: cholera, typhoid fever, dysenteria, poliomyelitis ...
- sexually transmitted diseases and diseases transmitted by contact with contaminated blood. Examples: syphilis, AIDS, Ebola, ...
- vector borne diseases. Examples: malaria (mosquito), plague (flea), yellow fever (mosquito), .... In some cases reduction of the vector population may have protective effect.

Accessory modes of transmission:
- fomites are inanimated objects that may host the germ (or its vector) in a living and infectious state. Example:blankets may become contaminaed with smallpox virus.
- animal reservoirs: in some cases animals may become infected and transmit the disease to humans, directly or via vectors. Examples rodents are the natural reservoirs of plague; Ebola is an infection of free-living primates; ....
- Poor public hygiene: inefficient drainage and disposal of sewage; unsafe sources of drinking water, ...
- Poor private hygiene: absence of such measures as bathing, frequent handwashing, poor abitative conditions, ...
- Overcrowding
- Insufficient caloric uptake: famine, poverty, natural or man made catastrophies, ...
- Cultural factors: promiscuity, social gatherings, ...
Epidemics are the field of medicine that is most strongly dependent on polytical and social issues.
      A very important instrument to follow the course of a severe epidemic is via the statistics of mortality for all causes, where epidemics appear as sudden peaks. The reason for this importance of mortality tables is that diagnoses are uncertain, mild cases may escape detection, mortality diagnoses may contain errors etc.; however we have no doubts about general mortality. The figure below reports the general mortality in Europe, as monitored by the European Mortality Monitor (Euromomo) a facility provided by the Statens Serum Institut (Copenhagen, DK; link at he end of this page):

      Some best studied examples of diseases causing epidemics are as follows:
      Measles: a RNA virus of the family of Paramyxoviridae transmitted by airborne spread (droplets), it has a very high transmission rate (estimated as R0, the average number of secondary cases produced by each affected individual; for measles R0 > 12) and all the population is susceptible (i.e. there is essentially no genetic resistence). Letality in developed countries is around 0.03%. The virus does not mutate easily and thus immunity lasts for life. In the rare instances where measles was introduced into a naive community it caused attack rates higher than 90%. The vast majority of cases are clinically evident. Before the vaccine was introduced in most countries over 90% of the population above 15 years of age presented antibodies against measles, indicative of previous disease and immunity. Measles epidemics typically occur every 5-7 years; this periodicity is explained because after each epidemics the fraction of the immune populationis raised above 90%, a level where herd immunity supervenes. Thus the next epidemics has to wait for new births to restore a sufficient fraction of sensible people in the population. Between epidemics the disease is maintained in the population under a sub-endemic condition. No animal reservoir exists. In conclusion measles is an excellent example of an epidemics which is limited only by the availability of non-immune members of the population, and a demonstratio of herd immunity.
      Influenza (flu): a RNA virus of the family of Orthomyxoviridae transmitted by airborne spread (droplets), it has a high transmission rate (estimated R0 ~ 4). Probably there is some genetic resistence, because the disease preferentially affects some groups over others (e.g. individuals of blood group B are affected to a higher extent). The virus causes an epidemics every winter; however since it mutates frequently, the antigenic serotype changes and having been affected in the preceding year may not confer immunity for the strain of next year. Each epidemics has an attack rate of about 10% of the population; letality is 0.1%, mostly observed on elderly people (note: measles spares elderly people because of its lifelong immunity; thus when comparing the letality of measles and flu one should refer to specific age cohorts, rather than to the overall letality of the two diseases). It is estimated that in winter epidemics about one third of the population may be susceptible, the remaining being at least partially immune because of some past epidemics. Flu presents a case where multiple factors may limit the attack rate; moreover, since a fraction of cases may be clinically mild, the attack rate may often be underestimated. In some cases severe pandemics were caused by the influenza virus: e.g. the spanish flu of 1918 had a worldwide attack rate of approx. 30% (500 million case, corresponding to one third of the world population at the time), with a letality of 5-10% (between 20 and 50 million deaths). Notice that the R0 equals the experimentally determined R value (Reff) at the beginning of the epidemics times the fraction of susceptibles; thus for most flu epidemics:
Reff = R0 * Fsusceptibles ~ 4 * 0.33 ~ 1.3.
      The crucial rule that governs the course of epidemics is: the disease heals because the patient develops an immune response, whose duration is long with respect to the course of the epidemic; the epidemic ends because a significant fraction of the population has become immune. This rule has exceptions (e.g. cholera heals because the severe diarrhoea flushes away the germ; immunity is not necessary, and reinfections are possible). The simplest models of epidemics assume this rule.
      The probability of disease transmission is a function of the number of sick people and that of susceptible people. At the beginning of the epidemic, the numbero of susceptibles is large (very few immunes), and the epidemic grows exponentially because of the increase of contagious sick people. However the exponential growth phase is rapidly replaced by an almost linear growth process due to: (i) the decrease of susceptibles (who become sick and then immune), and (ii) the uneven mixing of the population that is constituted by many loosely interacting small groups, whose members interact strongly with each other (and transmit the disease very efficiently within the group; see the exercise of two villages, below).
      When a sufficient fraction of the population has been immunized (because of the disease or of vaccination) the probability that a sick person encounters a non-immune person is reduced and the epidemic ends before having attacked 100% of the population: this is the so called population (or herd) immunity whereby the immunes protect the susceptibles. Note that the end of an epidemic does not eradicate the causative germ, which remains active in the population in a sporadic form and may cause successive episodes of the same epidemic.
      The simplest statistical model of epidemics was developed in the early 1920s, for teaching purposes, by Lowell Reed and Wade Hampton Frost then working at Johns Hopkins Medical School. To explore this model, on which the following discussion will be based, the student may visit the interactive program that simulates the time course of an epidemics (use the link at the end of this form).

      The model is based on the following series of assumptions:
1) The epidemics develops in a community of size N.
2) Each member of the population has equal probability of meeting any other member. The number of potentially contagious encounters (K) corresponds to the theoretical upper limit of the parameter R0 of the epidemics. This assumption is an obvious simplification because in real populations most encounters between individuals are non-random, and mainly occur in well characterized environments (family, school, work, etc.). Thus real communities are made up by more strictly integrated sub-communities, which loosely interact with each other.
3) Each member of the population is assigned only one of three possible states: S: susceptible; I: infected; R: recovered (removed, immune): SIR. The state of the individual may change along the irreversible sequence S → I →R. Notice that the state removed includes all subjects who got the disease, and does not distinguished those who healed and those who died of it, as both are incapable of transmitting the disease and further propagate the epidemic. This hypothesis is again a simplification and does not include other possible states that may occur in real epidemics, such as the state of the healthy carriers or the presence of an animal reservoir of the germ, which may participate to the disease transmission to humans.
4) The model calculates the probability of each individual to become affected (and successively to become immune) iteratively; each iteration occurs over a time window t corresponding to the average serial generation time (SGT) of the disease, i.e. the time intercurring between the contagion of a person and the transmission of the disease by that person to a further member of the community. The model assumes that the disease duration equals the SGT (i.e. the end of infectivity by that person). Real values for the SGT vary between less than 3 days for flu to over 2 weeks for measles; one may imagine that in the simulation the SGT equals one week. If the duration of the disease (and its infectivity) is longer than a SGT, the model requires a correction.

      The model iteratively calculates at each SGT the probability that every susceptible member of the population becomes affected. The probability that each individual meets another is p = K / (N-1). Thus the probability that each individual does not meet a specific other is: q = 1-p.
      In the original, deterministic, version of the model, here employed, disease is always transmitted if an individual in state S meets an individual in state I; every other encounter is irrelevant to disease transmission. The probability that an individual in state S does not meet any Infected during the time interval SGT is: q x I. If this is the case, S remains S in the next SGT; otherwise S becomes I. The number of S who are converted to I in every SGT results: S * (1-qI).
      At every SGT, however, every individual who was I in the preceding SGT is converted to R; thus individuals in state I at any SGT are only those who were just converted from S. The iterative formula that calculates the state of the population at the ith SGT is:
Ri = Ri-1 + Ii-1
Ii = Si-1 x (1-qIi-1)
Si = Si-1 - Ii
      At the beginning of the simulation the whole population is in state S, unless a correction is introduced to take into account vaccinated people or people made immune by a preceding epidemics, except one individual who is assigned state I (the zero case).

      It will be appreciated that the model includes several simplifications:
(i) the main reason why the epidemics ends its course is the exhaustion of susceptible individuals in the population. This implies that a vast majority of the population is affected (indeed it may be demonstrated that the model is not compatible with epidemics that affect less than 50% of the susceptible individuals in the population). This is consistent with some infectious diseases (e.g. measles or smallpox) but surely not with the majority of them.
(ii) The model is incompatible with diseases that require more states that S(usceptible), I(nfected), and R(ecovered). For example it is incompatible with typhoid fever, for which the state of the healthy carrier exists, or with diseases having an animal reservoir, e.g. plague by Yersinia pestis which affects many species of rodents.
(iii) The model cannot describe epidemics due to diseases in which the I state is prolonged and overlaps successive generation times, e.g. AIDS.
      More complex models are available to describe the above cases, but they lack the immediate simplicity of the Reed-Frost model and result difficult to grasp for physicians not specializing in medical statistics.

      The Reed-Frost model grasps some essential characteristics of epidemics, that we may summarize as follows:
(i) an epidemics has a beginning, a peak phase, and then vanishes. It may either disappear or be maintained as a sub-endemic condition. The time duration of the epidemics is expressed as a multiple of its characteristic serial generation time.
(ii) Containment (e.g. quarantine) operates by reducing the number of potentially infectious encounters K; you can explore the effect of quarantine using the appropriate interactive program (link at the end of this form). This effectively reduces the average number of transmission per infected individual (R0). Vaccination operates by directly converting S to R(recovered, immunes), bypassing the I state. You can explore the effect of vaccination by the appropriate interactive program (use the link at the end of this form)
(iii) The time evolution of the epidemics dictates the pre-test probability of disease.
(iv) The actual duration and extension of the epidemics in a country may be envisaged as the sum of several interrelated Reed-Frost episodes. This is because the model assumes an equal probability of encounter among the members of the population and is suitable to describe an epidemics affecting a village or a small city. Over larger expanses of space the epidemics has to be carried from a village to another. Each village or city independently follows the Reed-Frost model but starts at a different time. In some cases it is possible to trace an actual path of the diffusion of the epidemics, that sequentially affects villages and cities over a communication pathway. Obviously, in this case the duration of the epidemics is longer than one would expect on the basis of its serial generation time. The student may explore this type of progression using the interactive program for the two villages case (use the link at the end of this form).

      The fraction of the members of the population that contracted the disease is called the attack rate of the epidemics. A feature of the Reed-Frost model is that an epidemics ends because of the decrease of the Susceptible population below the threshold required for disease transmission. This threshold depends on the parameter K, but is never lower than half of N, i.e. an epidemics that obeys this mechanism cannot affect less than 50% of the population. This is plausible for some diseases, e.g. measles and smallpox occurring in a previously naive population, but many epidemics stop long before reaching this threshold. There are several possible reasons that contribute to lower the attack rate, an incomplete list of which is as follows:
1) the probability of disease transmission depends on variable external factors (e.g. climate, in which case the epidemics follows a seasonal course).
2) Contagion affects only or preferentially a fraction of the population because of some risk factor (e.g. old age, risky lifestyle, professional exposure, living under poor hygienic conditions, etc.).
3) Some members of the population, though not immune, present different proneness to develop the disease (a paradigmatic case is that of the different sensitivity to HIV due to genetic polymorphism of the CCR5 receptor). A very minor modification of the model can be developed to take into account the last possibility, as the student may verify using the interactive program for the two sub-populations case (link at the end of this form).
4) A fraction of the population is immune because of vaccination or because of a previous epidemics by the same or a related germ. For example, the usual condition for measles in the pre-vaccine era was to cause epidemics in populations that were immune for 80% or more because of previous epidemics.
5) A significant fraction of the cases of disease may run an asymptomatic course and not be diagnosed. In this case the attack rate may be quite high, but it appears low because many cases are not correctly diagnosed. The measure of the fraction of population presenting specific antibodies at the end of the epidemics may provide an estimate of the effective attack rate. A very interesting case occurs if the mild cases are also poorly contagious, and behave like a sort of vaccination experiment occurring in parallel with the disease. The effect of subclinical cases on the attack rate is simulated by the appropriate interactive program (use the link at the end of this form).

Attendance will be registered after completion of the exercises; you can repeat each of them as many times as you like (remember to press the send button at the end of each page). It is strongly suggested that you follow the order indicated below.
Your data: 0 0 0 0 0 0
the Reed-Frost modelnot completed
the effect of quarantine: not completed
the case of two villages:not completed
the effect of vaccination: not completed
the presence of clinically silent cases: not completed
the case of two sub-populations: not completed

      Home of this course; a more refined epidemic model by Kermack and McKendrick that you can also try. You can also visit the web page of the European Mortality Monitor (Euromomo).