Skip to content

Feature: Add "code density" column (avg LOC +/- per commit) #125

@quotentiroler

Description

@quotentiroler

Summary

Add an additional metric showing average lines of code touched per commit (additions + deletions / commit count) for each ranked user. This "code density" metric would complement the existing contribution count and give insight into how much code a user typically ships per contribution, not just how often they contribute.

Motivation

The current ranking sorts purely by contribution count. Two users with 1,000 commits could have very different impact — one averaging 5 LOC/commit (config tweaks, typo fixes) and another averaging 200 LOC/commit (features, refactors). A density column would surface this difference and make the leaderboard more informative without changing the primary ranking.

Feasibility (researched & verified)

The GitHub GraphQL Commit type exposes additions and deletions as two cheap integer scalar fields. These can be retrieved per-commit with an author filter to ensure only the target user's commits are counted.

Recommended approach: two-pass with batched author-filtered queries

Why not embed it in the existing search query? The history(author: {id: ...}) filter requires the user's node id, which can't be dynamically referenced from within the same search query. Embedding history() without an author filter returns commits from all contributors in that repo — unusable.

Solution: a lightweight second pass.

  1. Pass 1 (existing search query): Add the id field to the User fragment (zero cost increase). This gives you each user's node ID.
  2. Pass 2 (new): Batch 5 users per query using GraphQL aliases, each with their own author: {id} filter to get only their commits.

API cost — verified with real queries

All costs verified against GitHub's GraphQL API (May 2026):

Config (per query, 5 users batched) Samples per user GraphQL cost Status
5 users × 5 repos × 5 commits 25 1 ✅ Stable
5 users × 5 repos × 10 commits 50 1 ✅ Stable
5 users × 8 repos × 8 commits 64 ❌ 502
5 users × 10 repos × 10 commits 100 ❌ 502

Sweet spot: 5 users × 5 repos × 10 commits = 50 samples per user, 1 GraphQL point per 5 users.

The 502s at higher nesting depths are GitHub server-side execution timeouts, not rate limit errors.

Total cost for 256 users per country:

Queries needed 256 / 5 = 52
GraphQL points 52 per country
Time at 5,000 pts/hr ~37 seconds
Total for ~200 countries ~10,400 points ≈ ~2 hours

Proof of concept: batched second-pass query

query {
  rateLimit { cost }
  u1: user(login: "mitsuhiko") {
    contributionsCollection {
      commitContributionsByRepository(maxRepositories: 5) {
        repository {
          nameWithOwner
          defaultBranchRef {
            target {
              ... on Commit {
                history(first: 10, author: {id: "MDQ6VXNlcjczOTY="}) {
                  nodes { additions deletions }
                }
              }
            }
          }
        }
      }
    }
  }
  u2: user(login: "steipete") {
    contributionsCollection {
      commitContributionsByRepository(maxRepositories: 5) {
        repository {
          nameWithOwner
          defaultBranchRef {
            target {
              ... on Commit {
                history(first: 10, author: {id: "MDQ6VXNlcjU4NDkz"}) {
                  nodes { additions deletions }
                }
              }
            }
          }
        }
      }
    }
  }
  # ... u3, u4, u5 with their respective IDs
}

Verified: returns only each user's own commits. Cost = 1 GraphQL point for all 5 users.

Change to existing search query (pass 1)

Simply add id to the User fragment in the existing search(type: USER) query:

... on User {
  id          # ← add this (free scalar field)
  login,
  avatarUrl,
  # ... rest unchanged
}

This has zero cost impact and provides the node IDs needed for pass 2.

Caveats

  • Noise filtering: Exclude commits with >2,000 LOC (additions + deletions) to remove auto-generated files, lockfiles, and bulk formatting commits. Simple threshold filter, no complex statistics needed.
  • Private repos: additions/deletions are only available for repos the token can access. Private contribution density would be excluded from the average.
  • Sample size: 50 commits per user (5 repos × 10 commits) is a sample from the user's top contributed repos, not exhaustive. Directionally accurate for a leaderboard metric.

Suggested display

A new column "Avg LOC" showing mean (additions + deletions) per commit (excluding commits >2,000 LOC):

Rank User Commits Avg LOC
1. alice 5,000 142
2. bob 4,800 23

Purely informational — does not change the sort order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions