Skip to content
Kesem Abramov edited this page Apr 8, 2026 · 13 revisions

The code used to produce the results presented in the paper. All the code files are located in this repository under a folder with the same name.

How to run the code

Open the R project file (softimpute.Rproj) in the main repository folder. This will automatically set the working directory so the scripts run properly. From there, you can run the relevant scripts to reproduce the results. All scripts source code/common.R, which defines shared parameters, themes, and utility functions.

Scripts

Main analysis (Abramov_et_al_spatial_prediction_analysis.R)

Description

This script contains the main code producing the results in the paper. It contains:

  1. Island-scale prediction based on SVD
  2. Evaluation of predictive performance (based on binary and weighted metrics)
  3. Ecological inference, including distance decay analysis
  4. Visualizations of existing and potential missing links

Input

emln dataset no. 60: dataset of plant-pollinator interactions collected in the canary islands, archived in the emln R package. Used to build the layers of the network.

data/distance_between_sites_canary.csv: geographic distances between sites, used for distance-decay analysis. Taken from the original data publication.

Output

results/predictions_island_scale.rds: the results of the link prediction. Used for further analysis in the same scripts as well as in other scripts

Plots (saved to results/paper_figs/):

Main figures

File Figure
subset_heatmap_hist_v2.pdf Fig. 2 (composite: panels a,b from subset analysis, panels c, d from main)
island_heatmap_f05.pdf Fig. 2c
hist_f05a_legend_bottom.pdf Fig. 2d
missing_interactions_degree2.pdf Fig. 3 (composite: panels a, b, c)
map_missing_links_merged.pdf Fig. 3a (individual panel)
pie_chart.pdf Fig. 3b (individual panel)
degree_unobserved_links.pdf Fig. 3c (individual panel)
isl_jaccard_distance.pdf Fig. 4

An interactive version of Figure 3, with full species labels, information, and threshold controls, is available at https://ecological-complexity-lab.github.io/svd_based_spatial_prediction/

Supplementary figures

File Figure
pr_roc.pdf Fig. S11
roc_curve.pdf Fig. S12
predicted_original.png Fig. S13
degree_occurrence.pdf Fig. S14
local_degree_predicted_links.pdf Fig. S15
degree_binning.pdf Fig. S16
plant_island_degree.pdf Fig. S17
netdensity_f05_nnse.pdf Fig. S18
netsize_f05_nnse.pdf Fig. S19

k sensitivity analysis (k_swap.R)

Description

Sensitivity analysis examining how the choice of the maximum number of latent dimensions (k) and regularization parameter (lambda) affects predictive performance.

Input

results/predictions_island_scale.rds: island-scale predictions from Abramov_et_al_spatial_prediction_analysis.R.

Output

Plots (saved to results/paper_figs/):

File Figure
k_overall_sensitivity_v2.pdf Fig. S8

Scale analysis (site_scale_spatial_prediction_analysis.R)

Description

This script applies the same SVD-based link prediction pipeline at the site scale (finer spatial resolution within islands) and compares predictive performance across scales. It contains:

  1. Site-scale link prediction.
  2. Evaluation of predictive performance (based on binary and weighted metrics)
  3. Ecological inference, including distance decay analysis

Input

emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.

data/distance_between_sites_canary.csv: geographic distances between sites (from the original publication).

Output

results/predictions_site_scale.rds: link prediction results at site scale.

Plots (saved to results/paper_figs/):

File Figure
nnse_f05_scales.pdf Fig. S1
site_heatmap_f05.pdf Fig. S2
hist_f05_site.pdf Fig. S3
jaccard_site_f05.pdf Fig. S4
cor_plot_site_dif_f05.pdf Fig. S5
site_netsize_f05_nnse.pdf Fig. S6
site_netdensity_f05_nnse.pdf Fig. S7

Subset analysis (subset_analysis_comparison.R)

Description

This script compares prediction quality when using the full species pools of auxiliary (A) and target (P) networks versus subsetting to only the species they share. Results form panels a and b of Fig. 2.

Input

emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.

results/predictions_island_scale.rds: island-scale predictions from Abramov_et_al_spatial_prediction_analysis.R (loaded if already exists).

Output

Plots (saved to results/paper_figs/):

File Figure
island_subset_f05.pdf Fig. 2a
island_subset_nnse.pdf Fig. 2b

Alternative link withholding strategies (alternative_link_withholding_strategies.R)

Description

This script examines three alternatives to the random link withholding strategy used in the main code:

  1. Positive degree effect: generalist species (high degree) are preferentially withheld.
  2. Negative degree effect: specialist species (low degree) are preferentially withheld.
  3. Class-imbalance-preserving: links and non-links are withheld proportionally to their prevalence.

For each strategy, the optimal classification threshold is selected data-adaptively by maximising the mean F0.5 score, matching the approach in the main analysis.

Input

emln dataset no. 60: plant–pollinator interactions from the Canary Islands, archived in the emln R package.

Output

Plots (saved to results/paper_figs/):

File Figure
fig_imbalance_pr_roc.pdf Fig. S9
island_heatmap_degree_holdout_combined.pdf Fig. S10

Temporal host-parasite analysis (temporal_analysis_clean_version.R)

Description

This script applies the SVD-based link prediction framework to a temporal host–parasite network, treating different sampling years as network layers. It demonstrates the generalisability of the framework beyond spatial plant–pollinator systems. It contains:

  1. Year-scale link prediction using SVD (softImpute)
  2. Threshold selection for binary classification
  3. Non-thresholded evaluation (ROC-AUC, PR-AUC)
  4. Mapping of potential missing host–parasite interactions
  5. Temporal distance decay analysis

Input

data/41559_2017_BFs415590170101_MOESM36_ESM.xlsx: host–parasite interaction data across years. Source: Krasnov et al. (2010), archived on Dryad.

Output

results/hp_analysis/predictions_year_scale.rds: link prediction results across years. Computed on first run and reloaded on subsequent runs.

Plots (saved to results/hp_analysis/paper_figs/):

File Figure
pr_roc_host_parasite.pdf Fig. S20
map_missing_links_host_parasite.pdf Fig. S21
year_jaccard_distance.pdf Fig. S22