Brasil io

Description

The Brasil io dataset is extracted from a non-governmental project called Brasil io that performs volunteer work to collect information on the number of confirmed cases and deaths caused by infections of SARS-COV-2 aggregated at state and municipal level (Brasil io, 2020).

The authors gather the data from the state and municipalities secretaries publicly available reports before registration in the Brazilian Ministry of health database. This process helps provide data about the COVID-19 in real-time, once it takes a long time to register the cases from the state and municipal secretaries in the unified Brazilian system.

Also, the data provided by the Ministry of Health has an infrequent, slow process of update, the site goes down frequently, and the data is unstructured. Brasil io dataset corresponds to confirmed cases and deaths caused by SARS-COV-2 infections, aggregate at the state and municipal levels (Brasil io, 2020).

We collected the data from February 1st 2020, and onward. The data can be daily updated. Links to publications that use the Brasio io data or provide other publicly accessible locations of the data can be found in ( Jorge et al., 2021, 35 ).

Data access information

The collected, clean and formatted data freely available from Brasil io can be accessed and downloaded in (Brasil io, 2020), under the Creative Commons Attribution ShareAlike (CC-BY-SA) licence.

Methods of data collection

A python code is available on our Github directory to download data from the Brasil io project, see details in Github .

Data-specific information for Brasil io

The Brasil dataset has a total of 18 columns and showed a total of 2,606,897 registries (rows) and a size of 285 MB in the last update of August 19th, 2021. A code with more details about the variables, data processing and analysis methods is presented in our Github directory.

Limitations of Brasil io dataset

Brasil io dataset depends on the quality of the reported information given by the state and municipal health secretaries. When files are provided in pdf or images, it can compromise the tabulation of the data in real-time. Additionally, the deaths and cases of COVID-19 are tabulated according to the date when the data was collected.

Therefore, the epidemiological curve constructed can show a delay of one to up to 7 weeks in relation to the date of the first symptoms or the date of the laboratory test of the case (Observatório COVID-19, 2020). Still, the dataset is considered an excellent source to measure the course of the pandemic in real-time.


Data and Resources

Data and variable dictionary (Only in Portuguese)

Data explorer

Data dictionary