-
Notifications
You must be signed in to change notification settings - Fork 1
Code
The code used to produce the results presented in the paper. All the code files are located in this repository under a folder with the same name.
Open the R project file (softimpute.Rproj) in the main repository folder. This will automatically set the working directory so the scripts run properly. From there, you can run the relevant scripts to reproduce the results. All scripts source code/common.R, which defines shared parameters, themes, and utility functions.
This script contains the main code producing the results in the paper. It contains:
- Island-scale prediction based on SVD
- Evaluation of predictive performance (based on binary and weighted metrics)
- Ecological inference, including distance decay analysis
- Visualizations of existing and potential missing links
emln dataset no. 60: dataset of plant-pollinator interactions collected in the canary islands, archived in the emln R package. Used to build the layers of the network.
data/distance_between_sites_canary.csv: geographic distances between sites, used for distance-decay analysis. Taken from the original data publication.
results/predictions_island_scale.rds: the results of the link prediction. Used for further analysis in the same scripts as well as in other scripts
Plots (saved to results/paper_figs/):
| File | Figure |
|---|---|
subset_heatmap_hist_v2.pdf |
Fig. 2 (composite: panels a,b from subset analysis, panels c, d from main) |
island_heatmap_f05.pdf |
Fig. 2c |
hist_f05a_legend_bottom.pdf |
Fig. 2d |
missing_interactions_degree2.pdf |
Fig. 3 (composite: panels a, b, c) |
map_missing_links_merged.pdf |
Fig. 3a (individual panel) |
pie_chart.pdf |
Fig. 3b (individual panel) |
degree_unobserved_links.pdf |
Fig. 3c (individual panel) |
isl_jaccard_distance.pdf |
Fig. 4 |
An interactive version of Figure 3, with full species labels, information, and threshold controls, is available at https://ecological-complexity-lab.github.io/svd_based_spatial_prediction/
| File | Figure |
|---|---|
pr_roc.pdf |
Fig. S11 |
roc_curve.pdf |
Fig. S12 |
predicted_original.png |
Fig. S13 |
degree_occurrence.pdf |
Fig. S14 |
local_degree_predicted_links.pdf |
Fig. S15 |
degree_binning.pdf |
Fig. S16 |
plant_island_degree.pdf |
Fig. S17 |
netdensity_f05_nnse.pdf |
Fig. S18 |
netsize_f05_nnse.pdf |
Fig. S19 |
Sensitivity analysis examining how the choice of the maximum number of latent dimensions (k) and regularization parameter (lambda) affects predictive performance.
results/predictions_island_scale.rds: island-scale predictions from Abramov_et_al_spatial_prediction_analysis.R.
Plots (saved to results/paper_figs/):
| File | Figure |
|---|---|
k_overall_sensitivity_v2.pdf |
Fig. S8 |
This script applies the same SVD-based link prediction pipeline at the site scale (finer spatial resolution within islands) and compares predictive performance across scales. It contains:
- Site-scale link prediction.
- Evaluation of predictive performance (based on binary and weighted metrics)
- Ecological inference, including distance decay analysis
emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.
data/distance_between_sites_canary.csv: geographic distances between sites (from the original publication).
results/predictions_site_scale.rds: link prediction results at site scale.
Plots (saved to results/paper_figs/):
| File | Figure |
|---|---|
nnse_f05_scales.pdf |
Fig. S1 |
site_heatmap_f05.pdf |
Fig. S2 |
hist_f05_site.pdf |
Fig. S3 |
jaccard_site_f05.pdf |
Fig. S4 |
cor_plot_site_dif_f05.pdf |
Fig. S5 |
site_netsize_f05_nnse.pdf |
Fig. S6 |
site_netdensity_f05_nnse.pdf |
Fig. S7 |
This script compares prediction quality when using the full species pools of auxiliary (A) and target (P) networks versus subsetting to only the species they share. Results form panels a and b of Fig. 2.
emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.
results/predictions_island_scale.rds: island-scale predictions from Abramov_et_al_spatial_prediction_analysis.R (loaded if already exists).
Plots (saved to results/paper_figs/):
| File | Figure |
|---|---|
island_subset_f05.pdf |
Fig. 2a |
island_subset_nnse.pdf |
Fig. 2b |
This script examines three alternatives to the random link withholding strategy used in the main code:
- Positive degree effect: generalist species (high degree) are preferentially withheld.
- Negative degree effect: specialist species (low degree) are preferentially withheld.
- Class-imbalance-preserving: links and non-links are withheld proportionally to their prevalence.
For each strategy, the optimal classification threshold is selected data-adaptively by maximising the mean F0.5 score, matching the approach in the main analysis.
emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.
Plots (saved to results/paper_figs/):
| File | Figure |
|---|---|
fig_imbalance_pr_roc.pdf |
Fig. S9 |
island_heatmap_degree_holdout_combined.pdf |
Fig. S10 |
This script applies the SVD-based link prediction framework to a temporal host–parasite network, treating different sampling years as network layers. It demonstrates the generalisability of the framework beyond spatial plant–pollinator systems. It contains:
- Year-scale link prediction using SVD (softImpute)
- Threshold selection for binary classification
- Non-thresholded evaluation (ROC-AUC, PR-AUC)
- Mapping of potential missing host–parasite interactions
- Temporal distance decay analysis
data/41559_2017_BFs415590170101_MOESM36_ESM.xlsx: host–parasite interaction data across years. Source: Krasnov et al. (2010), archived on Dryad.
results/hp_analysis/predictions_year_scale.rds: link prediction results across years. Computed on first run and reloaded on subsequent runs.
Plots (saved to results/hp_analysis/paper_figs/):
| File | Figure |
|---|---|
pr_roc_host_parasite.pdf |
Fig. S20 |
map_missing_links_host_parasite.pdf |
Fig. S21 |
year_jaccard_distance.pdf |
Fig. S22 |