The WCota dataset is pulled from the Github repository by researcher Wesley Cota (W. Cota, “Monitoring the number of COVID-19 cases and deaths in brazil at municipal and federative units level”, SciELOPreprints:362, 2020). The number of confirmed cases and deaths caused by SARS-COV-2 infections aggregated at the state and municipal levels are compiled from data from the Ministry of Health and State Health Departments.
The author gathers data from publicly available state and municipal secretariat reports prior to registration in the Brazilian Ministry of Health database. This process helps to make data on COVID-19 available in real time, as it takes a long time to register cases from state and municipal secretariats in the Brazilian single system.
Also, the data provided by the Ministry of Health has an infrequent, slow process of update, the site goes down frequently, and the data is unstructured.
We collected the data from February 1st 2020, and onward. The data can be daily updated. Links to publications that use the WCota data or provide other publicaly accessible locations of the data can be found in ( Jorge et al., 2021, 35 ).
The collected, clean and formatted data freely available from WCota can be accessed and downloaded in (WCota, 2020), under the Creative Commons Attribution ShareAlike (CC-BY-SA 4.0) licence.
A python code is available on our Github directory to download data from the WCota project, see details in Github. Pamepi uses files named cases-brazil-cities-time that contain the time series of new cases and deaths by Covid-19.
The WCota dataset has a total of 12 columns and showed a size of 311 MB in the last update of May 11th, 2022. A code with more details about the variables, data processing and analysis methods is presented in our Github directory.
WCota dataset depends on the quality of the reported information given by the state and municipal health secretaries. When files are provided in pdf or images, it can compromise the tabulation of the data in real-time. Additionally, the deaths and cases of COVID-19 are tabulated according to the date when the data was collected.
Therefore, the epidemiological curve constructed can show a delay of one to up to 7 weeks in relation to the date of the first symptoms or the date of the laboratory test of the case (Observatório COVID-19, 2020). Still, the dataset is considered an excellent source to measure the course of the pandemic in real-time.