The Severe Acute Respiratory Syndrome (SARS) dataset is a health data obtained from the Ministry of Health of Brazil through the Secretariat of Health Surveillance, (Ministério da Saúde, 2021a).
The SARS surveillance started with the Influenza A (H1N1) pandemic in 2009. Since then, SARS is also used to report Influenza and other respiratory viruses, which previously were reported only with the flu-syndrome sentinel surveillance.
Furthermore, in 2020, human infections caused by SARS-COV-2 were incorporated into SARS surveillance. Given the severity of the disease, all individuals in SARS require hospitalisations.
The dataset has individual, not identifiable, information such as the residence location (city), age, gender, infectious status when the person requires hospitalisation (clinical beds or ICU) or if it had a deadly outcome, among others, (Ministério da Saúde, 2021a).
Therefore, the data can be aggregated to perform risk analysis and studies of the dynamic evolution of severe cases of COVID-19 at a municipal level and can also be explored for other issues.
Furthermore, the dataset contains notifications since January 1st 2020 up to date and is weekly updated since then in our project.
In addition, the dataset contains notifications from January 1st, 2020 to the current date. Data is updated weekly in our project.
The SARS data is licensed under a Creative Commons Attribution License cc-by (version 4.0), see (Ministério da Saúde, 2021a).
Additionally, the SARS dataset is publicly available and published by the Ministry of Health of Brazil. Therefore, no approval by an ethics committee is required to use this data, according to Resolutions 466/2012 and 510/2016 (article 1, sections III and V) from the National Health Council (CNS), Brazil.
A python code is available on our Github directory to download the SARS data from the OpenDatasus , see details in (Ministério da Saúde, 2021a) and Github. We are pulling the SARS data from 2019 and on. When running the code, the data is downloaded all again once the OpenDatasus does not have an API token for SARS.
The database is updated weekly in our project, being a version associated with its date of update. Additional information on data collection methods can be found in (Observatório COVID-19, 2020).
The SARS dataset has a total of 162 columns and showed a total of 1,264,480 registries (rows) and a size of 694 MB in the last update of August 19th, 2021.
A code with more details about the variables, data processing and analysis methods is presented in our Github directory .
Every case in SARS will have a final classification given by the epidemiological surveillance teams of the Secretariat of Health. However, given the number of registered patients, they may not be classified on time (some even closed and not analysed anymore).
To overcome such a difficulty, our team will apply a classification algorithm to give a pre-diagnosis of the cases in SARS that have no final classification. The algorithm is applied to the information of symptoms that is available in the dataset.