SARS-CoV-2 Superspreading Events Database

Koen Swinkels
11 min readJun 12, 2020

2,000+ Superspreading Events From Around the World

Latest article update: November 21. Latest database update: January 9, 2021.

SARS-CoV-2 “superspreading events” (SSEs) occur when a large number of people are infected with SARS-CoV-2 during the same event.

SSEs appear to be the driving force behind the current pandemic. Most infected people don’t infect anybody else, and the ones that do typically only infect one or a few others. But a small number of people infect a lot of others. Multiple studies show that 70–80% of transmissions can be traced back to just 10–20% of cases.

Therefore, in order to have a better chance of containing the virus it is important to investigate the features of SSEs. Specifically, knowing in what types of settings SSEs typically occur may help the public, public health professionals, policy makers, organizations and industry prevent SSEs from happening in the future.

To that end, we have compiled this database with now more than 1,600 SSEs from around the world. For each SSE the following features are identified:

  • what the location of the event was
  • when the event occurred
  • how many people were infected directly and/or indirectly
  • what kind of setting the event took place in
  • what kind of activity took place
  • whether the event took place indoors or outdoors
  • whether the event occurred during flu season in that location

The term “events” in this context should be understood in a broad sense, in that the superspreading can take place:

  • at the same venue during a short period of time (e.g. at a party)
  • at the same venue during an extended period of time (e.g. in a nursing home or prison over the course of several days or weeks)
  • at different venues visited by the same superspreader(s)(e.g. an infectious person going on a pub crawl)

Database

Click here to go to the database. Please read the Notes sheet for more information about the database and its limitations.

The database as well as this article will continue to be updated with new SSEs and new information.

The project now also has a website.

link to database

Bubble Map with Animated Timeline

Click here to go to the SSE bubble map. Zoom in and click on a bubble to see more information about an SSE.

bubble map

And click here to see the same map but with an animated timeline that shows the superspreading events as they occurred in time. You can play and pause the animation, and click on a bubble to find more information about an event.

Please note that for most of the large SSEs in the US as well as in some other countries the index dates in the database are placeholders (in the database these cells are marked light red in the Index Date column).

The focus of the database is on the features of the settings in which SSEs took place, and on the general period in which they occurred, specifically whether they occurred in flu season or not. Finding the exact dates for more than 900 SSEs would require a disproportionate amount of time. What this does mean is that no conclusions about the spread of the virus in the US and those countries should be drawn from the timeline.

Preliminary Results

  • Nearly all SSEs in the database took place indoors: the exceptions are SSEs that took place in settings with both indoor and outdoor elements and where it is not clear whether transmission occurred indoors or outdoors
  • The vast majority of SSE transmissions took place in settings where people were essentially confined together for a prolonged period (for example, nursing homes, prisons, cruise ships, worker housing)
  • A feature of these settings is that it is typically outsiders rather than the people who live or work in them (or their relatives) who have control over the circumstances in which they work or live (nursing home residents, hospital patients or inmates typically have little control in terms of precautions they can take)
  • The great majority of SSEs happened during flu season in that location
  • Food processing plants where temperatures are kept very low (meat, dairy, frozen foods) seem particularly vulnerable to SSEs compared to other types of factories and plants where very few SSEs occurred

These results also point to a theoretical conclusion about the nature of transmission. It seems difficult to explain superspreading events and the fact that superspreading events are vastly more common indoors than outdoors without acknowledging that airborne transmission is key:

  1. If transmission is primarily through large droplets that don’t travel farther than 6 feet and that fall to the ground almost immediately, we shouldn’t expect superspreading events to be so much more common indoors than outdoors. The behavior of a large droplet is not meaningfully different in an outdoor vs. an indoor setting. It will just fall to the ground almost instantly and not travel farther than 6 feet. Aerosols, on the other hand, are smaller droplets that linger in the air and can travel farther than 6 feet. Aerosols are diluted much faster outdoors than indoors. So the concentration of infectious aerosols in the air people breathe will be much lower outdoors than it is indoors. As a result, superspreading should be much less likely outdoors than indoors. And that is what we see.
  2. Moreover, at a superspreading event one person (or a small number of people) infects dozens or more others. If transmission happens through large droplets it means that person has to be in close contact with each of those other people and talk, sing, shout, laugh or breathe in their direction. In many situations that seems less likely than that an infected person gradually increases the concentration of infectious aerosols in the air in a poorly ventilated space. The risk will still be highest for people in close contact with the index case because the concentration of infectious aerosols is highest directly in front of the person (similar to standing close to a smoker) but people can become infected at a greater distance as well due to the increased concentration of infectious aerosols in the air they breathe.

Limitations

When assessing these conclusions it is important to keep in mind that the database has some severe limitations. For example, in response to the pandemic societies have taken all sorts of measures to try to contain the virus. This includes the closing of various types of settings , such as restaurants, schools and offices etc. As a result, these settings were then no longer able to give rise to superspreading events. which then were no longer able to give rise to any superspreading events. As a result, certain types of high-risk settings could be significantly underrepresented in the database in the sense that superspreading would have occurred there more frequently if they had been open. To take an obvious example, from the fact that there are no indoor concerts in the database it does not follow that indoor concerts do not pose a serious risk.

Moreover, determining whether an event took place indoors or outdoors (or both) can involve some uncertainty. For categories such as “nursing homes” and “hospitals” we assume that the setting was indoors. But for some other settings such as restaurants or wedding venues we look for more information to make a determination. For example, whenever possible, we:

  • look for images of the venue online
  • try to find detailed descriptions of the event
  • check what the weather was like in that period (eg it is unlikely that a family dinner in Wuhan in January took place outdoors)

This method does not guarantee that the determination is correct 100% of the time. When there is considerable uncertainty about a data point, the corresponding cell in the database is marked light red. We welcome any corrections.

Another limitation of the database involves selection biases that may cause certain types of settings to be overrepresented or underrepresented in the database. For example:

  • People may be more likely to remember or mention certain types of settings than other types
  • People who are more likely to have been in certain types of settings may be more (or less) easily traceable by contact tracers, or more (or less) willing to be interviewed by them
  • In institutional settings such as nursing homes, prisons, hospitals and meat processing plants there will typically be more frequent, systematic & comprehensive testing (especially once one or more people have shown symptoms or tested positive) than in other types of settings such as bars or parties.

There may also be a confirmation bias problem once a specific SSE has been identified. The mere fact that people who now have the virus attended an event that was subsequently identified as an SSE does not mean that those people were infected at that event. Maybe infection occurred elsewhere, before or after. It is by no means always realistically possible to prove with a high degree of confidence when and where transmission occurred. The higher infection rates in a community are, the more of a problem this becomes.

And there is a risk that once an event has been designated as a SSE it develops its own gravitational pull: With the SSE in mind, researchers may more easily assume that newly infected people with a direct or indirect link to the event acquired the infection there — or through somebody who was there — rather than in another way.

In the aggregate this could also lead to an overestimation of the role SSEs play in the pandemic in general.

Furthermore, the lower the incidence of Covid in a population, the more problematic the issue of false positives becomes. False positives can occur in two ways:

  1. An error during the testing & analysis (including contamination issues in a lab). This is rare.
  2. The use of high cycle thresholds in PCR testing causes tests to detect insignificant amounts of virus that are not reliable indicators of recent SARS-CoV-2 infections,which may create an inflated impression of the number of current infections in a community, as this New York Times article explains:

Officials at the Wadsworth Center, New York’s state lab, have access to C.T. values from tests they have processed, and analyzed their numbers at The Times’s request. In July, the lab identified 872 positive tests, based on a threshold of 40 cycles.

With a cutoff of 35, about 43 percent of those tests would no longer qualify as positive. About 63 percent would no longer be judged positive if the cycles were limited to 30.

In Massachusetts, from 85 to 90 percent of people who tested positive in July with a cycle threshold of 40 would have been deemed negative if the threshold were 30 cycles, Dr. Mina said. “I would say that none of those people should be contact-traced, not one,” he said.

The higher the rate of false positives, the more the size of superspreading events may be systematically overstated.

Lastly, it should be noted that for settings such as nursing homes and prisons the database typically takes the cumulative number of cases. In reality these infections could have occurred over the course of several days or weeks. Moreover, some people included in this number may have been infected outside of this setting. If the number of infections in a community is significant then the number of staff who were infected outside of this setting may also be significant. The reason these settings are nonetheless included in a database of superspreading events is that due to the nature of these facilities — residents cannot leave and residents typically have been there for a while — the large majority of residents will likely have been infected in the facility itself.

Incomplete and Imperfect

Note that while the goal for the database is to eventually include all SSEs found by researchers and authorities, currently that project is far from completion. Many more SSEs than just these 1,500+ have taken place.

The information in the database is also by no means fully accurate or complete. Dates often had to be guesstimated based on the information available in publications about the events. And, as noted before, placeholders are sometimes used for dates, as well as for the number of cases associated with an SSE. Cells that contain placeholder data or data that needs to be checked for accuracy are marked in light red.

In addition, GPS data for some of the American SSEs may not be accurate as a bulk conversion method was used that may not be 100% accurate.

Also note that for some SSEs only the number of infections at the initial event is included while for others more detailed information about secondary or even tertiary infections was available (via this smaller database, for example) and included in the total number of infections associated with an event. This obviously makes direct quantitative comparisons between SSEs problematic.

Lastly, whenever there are differing estimates of the number of cases associated with an SSE the database always uses the lowest number.

So the database has numerous serious limitations. It is very much imperfect and a work in progress. Please do not assume it is a representative sample of SSEs and please do not draw hasty conclusions from the data.

Feedback and Help

Any help (corrections, additions, suggestions) would be much appreciated. Send your information to info@superspreadingdatabase.com or add them via the Google Form on this page. Assistance with improving the visualization (for example, to make it look more like this) is also welcome.

If you are a researcher who would like to join the project or cooperate in some other way, please send an email to info@superspreadingdatabase.com.

If you'd rather go it alone, the database and all data in it may be freely used by anyone, in whatever way. A link to this article and the database is always appreciated.

How to cite the database

Swinkels, K. (2020). SARS-CoV-2 Superspreading Events Around the World [Google Sheet]. Retrieved from https://docs.google.com/spreadsheets/d/1c9jwMyT1lw2P0d6SDTno6nHLGMtpheO9xJyGHgdBoco/edit?usp=sharing

References

A substantial number of SSEs in this database come from the following sources:

More Articles

--

--