Skip to content

Unbounded: core/src/exchanges/metaculus/fetchMarkets.ts — fetchPostPages accumulates up to 20,000 posts (~100 MB) in a single request #688

@realfishsam

Description

@realfishsam

Location

core/src/exchanges/metaculus/fetchMarkets.ts:41-70 (fetchPostPages)

Code

const MAX_PAGES = 200;    // safety cap (~20 000 posts)
const BATCH_SIZE = 100;   // max per page

async function fetchPostPages(
    callApi: CallApi,
    apiParams: Record<string, any>,
    targetCount?: number,
): Promise<any[]> {
    let all: any[] = [];   // ← accumulates ALL pages in memory
    let offset = 0;
    let page = 0;

    do {
        const data = await callApi("GetPosts", { ...apiParams, limit: BATCH_SIZE, offset });
        const results: any[] = data.results ?? [];
        if (results.length === 0) break;

        all.push(...results);    // ← no eviction, all pages kept alive
        offset += results.length;
        page++;

        if (targetCount && all.length >= targetCount * 1.5) break;
        if (!data.next) break;
    } while (page < MAX_PAGES);

    return all;
}

searchMarkets() calls this with targetCount = Math.max(limit * 5, 500) (line 121), which for a default limit=200 means up to 1,000 posts. The module-level cachedPosts reference then holds all of these for 5 minutes.

fetchMarketsDefault() calls it with fetchLimit = 2000 when sort === "volume" or sort === "liquidity" (line 166-168), meaning up to 2,000 posts loaded and cached at module level.

Growth Pattern

  • MAX_PAGES × BATCH_SIZE = 200 × 100 = 20,000 posts maximum per invocation
  • Each raw Metaculus post object contains nested question data, community predictions, and metadata — typically 2–10 KB
  • expandPosts() further expands group-of-questions posts into multiple UnifiedMarket objects, multiplying memory
  • The module-level cachedPosts variable holds a reference preventing GC for 5 minutes

OOM Estimate

Scenario Posts Est. size per post Total
Default fetchMarkets() ~1,000 5 KB ~5 MB
sort="volume" path 2,000 5 KB ~10 MB
fetchMarkets({ query }) up to 20,000 5 KB ~100 MB
Multiple concurrent fetchMarkets() N × above N × above

The 100 MB scenario occurs when searchMarkets() is called without a realistic limit cap and targetCount * 1.5 is large. With multiple concurrent requests each running fetchPostPages simultaneously (no concurrency guard), peak memory is multiplied.

Suggested Fix

  • Apply a hard cap in fetchPostPages regardless of targetCount:
    const HARD_CAP = 5000; // posts
    if (all.length >= HARD_CAP) break;
  • Stream/yield results incrementally instead of buffering the full result set.
  • Protect the module-level cachedPosts write with a mutex to prevent concurrent fetches from each loading the full set.
  • For the keyword search path, use Metaculus's server-side search parameter if/when it becomes available, or implement a paginated response instead of client-side filtering.

Found by automated unbounded operations audit

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions