update contact-matrices.Rmd with contactsurveys/socialmixr pattern

avallecam · claude · avallecam · commit bfe07ccfdb61 · 2026-03-20T20:04:23.000Z
- Add library(contactsurveys) and replace socialmixr::polymod loading
  with contactsurveys::download_survey() + socialmixr::load_survey()
  using the Zenodo DOI for POLYMOD and Zambia surveys
- Add return_demography = TRUE to all contact_matrix() calls
- Standardize object names: contacts_byage, contacts_byage_matrix
- Add callout warning users to always specify countries = argument
- Add text explaining return_demography requirement for {epidemics}

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/episodes/contact-matrices.Rmd b/episodes/contact-matrices.Rmd
@@ -38,6 +38,7 @@ Some groups of individuals have more contacts than others; the average schoolchi
 
 
 ```{r,message=FALSE,warning=FALSE}
+library(contactsurveys)
 library(socialmixr)
 ```
 
@@ -80,29 +81,47 @@ For a contact matrix with rows $i$ and columns $j$:
 
 Contact matrices are commonly estimated from studies that use diaries to record interactions. For example, the POLYMOD survey measured contact patterns in 8 European countries using data on the location and duration of contacts reported by the study participants [(Mossong et al. 2008)](https://doi.org/10.1371/journal.pmed.0050074).
 
-The R package `{socialmixr}` contains functions which can estimate contact matrices from POLYMOD and other surveys. We can load the POLYMOD survey data:
+The R package `{socialmixr}` contains functions which can estimate contact matrices from POLYMOD and other surveys. We can download and load the POLYMOD survey data directly from Zenodo using `{contactsurveys}` and `{socialmixr}`:
 
+```{r polymod_, echo = TRUE, message = FALSE}
+survey_files <- contactsurveys::download_survey(
+  survey = "https://doi.org/10.5281/zenodo.3874557",
+  verbose = FALSE
+)
+
+survey_load <- socialmixr::load_survey(files = survey_files)
+```
+
+::::::::::::::::::::::::::::::::::::: callout
+### Specify the country name
+
+A single survey file can contain data from multiple countries. You can inspect the available countries with:
 
-```{r polymod_, echo = TRUE}
-polymod <- socialmixr::polymod
+```{r polymod_countries, echo = TRUE}
+levels(survey_load$participants$country)
 ```
 
-Then we can obtain the contact matrix for the age categories we want by specifying `age_limits`. 
+Always pass the `countries =` argument to `contact_matrix()` to make sure you use data from the intended country only.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+Then we can obtain the contact matrix for the age categories we want by specifying `age_limits`. We also add `return_demography = TRUE` to include demographic information in the output, which is required when using the contact matrix with `{epidemics}`.
 
 ```{r polymod_uk, echo = TRUE}
-contact_data <- socialmixr::contact_matrix(
-  survey = polymod,
+contacts_byage <- socialmixr::contact_matrix(
+  survey = survey_load,
   countries = "United Kingdom",
   age_limits = c(0, 20, 40),
-  symmetric = TRUE
+  symmetric = TRUE,
+  return_demography = TRUE
 )
-contact_data
+contacts_byage
 ```
 
 
 
-**Note: although the contact matrix `contact_data$matrix` is not itself mathematically symmetric, it satisfies the condition that the total number of contacts of one group with another is the same as the reverse. In other words:
-`contact_data$matrix[j,i]*contact_data$demography$proportion[j] = contact_data$matrix[i,j]*contact_data$demography$proportion[i]`.
+**Note: although the contact matrix `contacts_byage$matrix` is not itself mathematically symmetric, it satisfies the condition that the total number of contacts of one group with another is the same as the reverse. In other words:
+`contacts_byage$matrix[j,i]*contacts_byage$demography$proportion[j] = contacts_byage$matrix[i,j]*contacts_byage$demography$proportion[i]`.
 For the mathematical explanation see [the corresponding section in the socialmixr documentation](https://epiforecasts.io/socialmixr/articles/socialmixr.html#symmetric-contact-matrices).**
 
 
@@ -122,13 +141,16 @@ If `symmetric` is set to TRUE, the `contact_matrix()` function will internally u
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-The example above uses the POLYMOD survey. There are a number of surveys available in `socialmixr`. To list the available surveys, use `socialmixr::list_surveys()`. To download a survey, we can use `socialmixr::get_survey()`
+The example above uses the POLYMOD survey. There are a number of surveys available in `socialmixr`. To list the available surveys, use `socialmixr::list_surveys()`. To download a survey from Zenodo and load it, we use `contactsurveys::download_survey()` followed by `socialmixr::load_survey()`:
 
 ```{r, message = FALSE, warning = FALSE}
-# Access the contact survey data from Zenodo
-zambia_sa_survey <- socialmixr::get_survey(
-  "https://doi.org/10.5281/zenodo.3874675"
+# Download and load the contact survey data for Zambia from Zenodo
+zambia_survey_files <- contactsurveys::download_survey(
+  survey = "https://doi.org/10.5281/zenodo.3874675",
+  verbose = FALSE
 )
+
+zambia_sa_survey <- socialmixr::load_survey(files = zambia_survey_files)
 ```
 
 :::::::::::::::::: spoiler
@@ -179,20 +201,21 @@ Similar to the code above, to access vector values within a dataframe, you can u
 
 :::::::::::::::::::::::: instructor 
 
-```{r polymod_poland}
+```{r zambia_solution}
 # Generate the contact matrix for Zambia only
-contact_data_zambia <- socialmixr::contact_matrix(
+contacts_byage_zambia <- socialmixr::contact_matrix(
   survey = zambia_sa_survey,
   countries = "Zambia", # key argument
   age_limits = c(0, 20),
-  symmetric = TRUE
+  symmetric = TRUE,
+  return_demography = TRUE
 )
 
 # Print the contact matrix for Zambia only
-contact_data_zambia
+contacts_byage_zambia
 
 # Print the vector of population size for {epidemics}
-contact_data_zambia$demography$population
+contacts_byage_zambia$demography$population
 ```
 :::::::::::::::::::::::::::::::::
 
@@ -290,12 +313,12 @@ When simulating an epidemic, we often want to ensure that the average number of
 
 Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.
 
-In the case of the above model, we want to define $\beta  C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent to `contact_data$matrix[i,j]`, and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_{normalised}$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$  (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$). 
+In the case of the above model, we want to define $\beta  C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent to `contacts_byage$matrix[i,j]`, and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_{normalised}$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$  (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).
 
 ```{r}
-contact_matrix <- t(contact_data$matrix)
-scaling_factor <- 1 / max(eigen(contact_matrix)$values)
-normalised_matrix <- contact_matrix * scaling_factor
+contacts_byage_matrix <- t(contacts_byage$matrix)
+scaling_factor <- 1 / max(eigen(contacts_byage_matrix)$values)
+normalised_matrix <- contacts_byage_matrix * scaling_factor
 ```
 
 As a result, if we multiply the scaled matrix by $R_0$, then converting to the number of expected secondary cases would give us $R_0$, as required.
@@ -317,7 +340,7 @@ Normalisation can be performed by the function `contact_matrix()` in `{socialmix
 
 ```{r, message = FALSE}
 contact_data_split <- socialmixr::contact_matrix(
-  survey = polymod,
+  survey = survey_load,
   countries = "United Kingdom",
   age_limits = c(0, 20, 40),
   symmetric = TRUE,