Skip to content

Commit bfe07cc

Browse files
avallecamclaude
andcommitted
update contact-matrices.Rmd with contactsurveys/socialmixr pattern
- Add library(contactsurveys) and replace socialmixr::polymod loading with contactsurveys::download_survey() + socialmixr::load_survey() using the Zenodo DOI for POLYMOD and Zambia surveys - Add return_demography = TRUE to all contact_matrix() calls - Standardize object names: contacts_byage, contacts_byage_matrix - Add callout warning users to always specify countries = argument - Add text explaining return_demography requirement for {epidemics} Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent fdae8b4 commit bfe07cc

1 file changed

Lines changed: 47 additions & 24 deletions

File tree

episodes/contact-matrices.Rmd

Lines changed: 47 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Some groups of individuals have more contacts than others; the average schoolchi
3838

3939

4040
```{r,message=FALSE,warning=FALSE}
41+
library(contactsurveys)
4142
library(socialmixr)
4243
```
4344

@@ -80,29 +81,47 @@ For a contact matrix with rows $i$ and columns $j$:
8081

8182
Contact matrices are commonly estimated from studies that use diaries to record interactions. For example, the POLYMOD survey measured contact patterns in 8 European countries using data on the location and duration of contacts reported by the study participants [(Mossong et al. 2008)](https://doi.org/10.1371/journal.pmed.0050074).
8283

83-
The R package `{socialmixr}` contains functions which can estimate contact matrices from POLYMOD and other surveys. We can load the POLYMOD survey data:
84+
The R package `{socialmixr}` contains functions which can estimate contact matrices from POLYMOD and other surveys. We can download and load the POLYMOD survey data directly from Zenodo using `{contactsurveys}` and `{socialmixr}`:
8485

86+
```{r polymod_, echo = TRUE, message = FALSE}
87+
survey_files <- contactsurveys::download_survey(
88+
survey = "https://doi.org/10.5281/zenodo.3874557",
89+
verbose = FALSE
90+
)
91+
92+
survey_load <- socialmixr::load_survey(files = survey_files)
93+
```
94+
95+
::::::::::::::::::::::::::::::::::::: callout
96+
### Specify the country name
97+
98+
A single survey file can contain data from multiple countries. You can inspect the available countries with:
8599

86-
```{r polymod_, echo = TRUE}
87-
polymod <- socialmixr::polymod
100+
```{r polymod_countries, echo = TRUE}
101+
levels(survey_load$participants$country)
88102
```
89103

90-
Then we can obtain the contact matrix for the age categories we want by specifying `age_limits`.
104+
Always pass the `countries =` argument to `contact_matrix()` to make sure you use data from the intended country only.
105+
106+
::::::::::::::::::::::::::::::::::::::::::::::::
107+
108+
Then we can obtain the contact matrix for the age categories we want by specifying `age_limits`. We also add `return_demography = TRUE` to include demographic information in the output, which is required when using the contact matrix with `{epidemics}`.
91109

92110
```{r polymod_uk, echo = TRUE}
93-
contact_data <- socialmixr::contact_matrix(
94-
survey = polymod,
111+
contacts_byage <- socialmixr::contact_matrix(
112+
survey = survey_load,
95113
countries = "United Kingdom",
96114
age_limits = c(0, 20, 40),
97-
symmetric = TRUE
115+
symmetric = TRUE,
116+
return_demography = TRUE
98117
)
99-
contact_data
118+
contacts_byage
100119
```
101120

102121

103122

104-
**Note: although the contact matrix `contact_data$matrix` is not itself mathematically symmetric, it satisfies the condition that the total number of contacts of one group with another is the same as the reverse. In other words:
105-
`contact_data$matrix[j,i]*contact_data$demography$proportion[j] = contact_data$matrix[i,j]*contact_data$demography$proportion[i]`.
123+
**Note: although the contact matrix `contacts_byage$matrix` is not itself mathematically symmetric, it satisfies the condition that the total number of contacts of one group with another is the same as the reverse. In other words:
124+
`contacts_byage$matrix[j,i]*contacts_byage$demography$proportion[j] = contacts_byage$matrix[i,j]*contacts_byage$demography$proportion[i]`.
106125
For the mathematical explanation see [the corresponding section in the socialmixr documentation](https://epiforecasts.io/socialmixr/articles/socialmixr.html#symmetric-contact-matrices).**
107126

108127

@@ -122,13 +141,16 @@ If `symmetric` is set to TRUE, the `contact_matrix()` function will internally u
122141

123142
::::::::::::::::::::::::::::::::::::::::::::::::
124143

125-
The example above uses the POLYMOD survey. There are a number of surveys available in `socialmixr`. To list the available surveys, use `socialmixr::list_surveys()`. To download a survey, we can use `socialmixr::get_survey()`
144+
The example above uses the POLYMOD survey. There are a number of surveys available in `socialmixr`. To list the available surveys, use `socialmixr::list_surveys()`. To download a survey from Zenodo and load it, we use `contactsurveys::download_survey()` followed by `socialmixr::load_survey()`:
126145

127146
```{r, message = FALSE, warning = FALSE}
128-
# Access the contact survey data from Zenodo
129-
zambia_sa_survey <- socialmixr::get_survey(
130-
"https://doi.org/10.5281/zenodo.3874675"
147+
# Download and load the contact survey data for Zambia from Zenodo
148+
zambia_survey_files <- contactsurveys::download_survey(
149+
survey = "https://doi.org/10.5281/zenodo.3874675",
150+
verbose = FALSE
131151
)
152+
153+
zambia_sa_survey <- socialmixr::load_survey(files = zambia_survey_files)
132154
```
133155

134156
:::::::::::::::::: spoiler
@@ -179,20 +201,21 @@ Similar to the code above, to access vector values within a dataframe, you can u
179201

180202
:::::::::::::::::::::::: instructor
181203

182-
```{r polymod_poland}
204+
```{r zambia_solution}
183205
# Generate the contact matrix for Zambia only
184-
contact_data_zambia <- socialmixr::contact_matrix(
206+
contacts_byage_zambia <- socialmixr::contact_matrix(
185207
survey = zambia_sa_survey,
186208
countries = "Zambia", # key argument
187209
age_limits = c(0, 20),
188-
symmetric = TRUE
210+
symmetric = TRUE,
211+
return_demography = TRUE
189212
)
190213
191214
# Print the contact matrix for Zambia only
192-
contact_data_zambia
215+
contacts_byage_zambia
193216
194217
# Print the vector of population size for {epidemics}
195-
contact_data_zambia$demography$population
218+
contacts_byage_zambia$demography$population
196219
```
197220
:::::::::::::::::::::::::::::::::
198221

@@ -290,12 +313,12 @@ When simulating an epidemic, we often want to ensure that the average number of
290313

291314
Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.
292315

293-
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent to `contact_data$matrix[i,j]`, and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_{normalised}$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$ (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).
316+
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent to `contacts_byage$matrix[i,j]`, and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_{normalised}$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$ (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).
294317

295318
```{r}
296-
contact_matrix <- t(contact_data$matrix)
297-
scaling_factor <- 1 / max(eigen(contact_matrix)$values)
298-
normalised_matrix <- contact_matrix * scaling_factor
319+
contacts_byage_matrix <- t(contacts_byage$matrix)
320+
scaling_factor <- 1 / max(eigen(contacts_byage_matrix)$values)
321+
normalised_matrix <- contacts_byage_matrix * scaling_factor
299322
```
300323

301324
As a result, if we multiply the scaled matrix by $R_0$, then converting to the number of expected secondary cases would give us $R_0$, as required.
@@ -317,7 +340,7 @@ Normalisation can be performed by the function `contact_matrix()` in `{socialmix
317340

318341
```{r, message = FALSE}
319342
contact_data_split <- socialmixr::contact_matrix(
320-
survey = polymod,
343+
survey = survey_load,
321344
countries = "United Kingdom",
322345
age_limits = c(0, 20, 40),
323346
symmetric = TRUE,

0 commit comments

Comments
 (0)