We gathered data over a range of individual infectious status related to the COVID-19 pandemic, socio-economic and human behaviour. We will integrate health and social-economic determinants data that are available for each of the 5570 Brazilian cities.
Additionally, we will jointly access information about the implemented interventions and social mobility patterns. The harmonised data at the municipal level will be a foundational resource enabling the application/development of statistical analysis, nonlinear mathematical modelling, computational modelling, data visualisation and scientific dissemination about the COVID-19 pandemic in Brazil.
We can collect the data from open access platforms such as Google Mobility, WCota, OpenDatasus, JusBrasil, and IPB Cidacs Bahia. Furthermore, a script in python language was developed to automate the process of organizing, downloading, and updating files, making it easier to adjust the datalake update period as the original platform's system receives its updates.
Currently, we set up our datalake to update weekly in our project. Note that the project's data can be downloaded via the python code mentioned above or via Google Drive. It is noteworthy that an application is being developed to give users the power to manage what information they want to pull for their research.
The vaccination data has 63 gigabytes, FS has 14 gigabytes, SARS has 694 megabytes, Google mobility 593 megabytes, hospital occupancy 192 megabytes, WCota 311 megabytes and the CIDACS census sectors 42 megabytes. These values may change due to constant updates from the original data sources.
Thus, the database has a total sum of approximately 80 gigabytes for the version updated on August 4th, 2021. All files are initially available in .csv format, except for data from census sectors and WCota, respectively in .xlsx (excel 2007) and gzip extensions. However, we added a conversion and extraction form to distribute all files in the .csv extension.
With the emergence of the pandemic caused by SARS-COV-2, the Ministry of Health implemented a surveillance system to record Flu Syndrome (FS, also denoted by e-SUS VE) of mild to moderate cases suspected of COVID-19. The dataset comes from the e-SUS NOTIFICA system, which is filled by public or private units as the primary care units, offices, clinics, service centres, emergency care, among others of low complexity.
The Severe Acute Respiratory Syndrome (SARS) dataset is a health data obtained from the Ministry of Health of Brazil through the Secretariat of Health Surveillance, (Ministério da Saúde, 2020). The SARS surveillance started with the Influenza A(H1N1) pandemic in 2009. Since then, SARS is also used to report Influenza and other respiratory viruses, which previously were reported only with the flu-syndrome sentinel surveillance.
The WCota dataset is pulled from the Github repository by researcher Wesley Cota. The number of confirmed cases and deaths caused by SARS-COV-2 infections aggregated at the state and municipal levels are compiled from data from the Ministry of Health and State Health Departments. The author gathers data from publicly available state and municipal secretariat reports prior to registration in the Brazilian Ministry of Health database.
The Covid-19 vaccination data is related to the National Vaccination Campaign against Covid-19. The Ministry of Health provides it through the National Immunization Program (SI-PNI) Information System and is available in the OpenDatasus for download.
It provides deprivation measures for each Brazilian municipality and census sector and is used to evaluate health inequalities across the country. The 2010 Brazilian Population Census is the basis for calculating the deprivation measure, available from CIDACS.
The dataset from Google mobility represents human mobility given by the Google trends which reports how human mobility is affected by the COVID-19 spread since February 15th, 2020.
It is the historical average daily flux data throughout the country using road/air/fluvial networks. The data is available in the Brazilian Institute of Geography and Statistics and measured using the 2010 Brazilian Population Census.
The last data is a metric that summarizes the level of governmental measures enacted by the local states from March of 2020 and onward. The dataset is divided into two files. The first contains textual information of the measures applied in chronological order and for each state. The second file includes the calculated global metric of stringency, a combination of sub-indexes of different enforcement types, such as restrictions to events, closure of schools, etc.
From the original database described in detail in the Overview, including Vaccination, WCota , FS, and SARS, we created a single file where you can find the main information from each database. The composition facilitates the use of the data since it is already crossmatched with each other and presents a single entry for a specific city and day. A dictionary was developed to allow understanding of each column since the names were modified from the original source to fit this homogeneous set.