Skip to content

Commit 2f7035f

Browse files
JacekDabrowski1JacekDabrowski1
authored andcommitted
Refine claims and clarify benchmark methodology for accuracy and transparency
Update API and documentation versions, correct algorithm counts, and clarify benchmark dataset origins and methodology. Replit-Commit-Author: Agent Replit-Commit-Session-Id: ec794acd-c4a5-47f6-b906-d70ac3c316ee Replit-Commit-Checkpoint-Type: full_checkpoint Replit-Commit-Event-Id: 5d1281c2-bed0-40e2-91e1-1f16ff952df7 Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/28ec11df-9ccf-40bc-9ff4-d0523e5b6a98/ec794acd-c4a5-47f6-b906-d70ac3c316ee/XemkZBb Replit-Helium-Checkpoint-Created: true
1 parent 272a0cc commit 2f7035f

7 files changed

Lines changed: 43 additions & 38 deletions

File tree

website/static/benchmarks.js

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,13 @@ const COLORS = {
1414
};
1515

1616
const ALGO_COLORS = {
17-
'Cleora': '#6c63ff',
18-
'ProNE': '#f59e0b',
19-
'RandNE': '#ef4444',
20-
'NetMF': '#3b82f6',
21-
'DeepWalk': '#f472b6',
22-
'Node2Vec': '#fb923c',
17+
'Cleora': '#6c63ff',
18+
'Cleora (whiten)': '#6c63ff',
19+
'ProNE': '#f59e0b',
20+
'RandNE': '#ef4444',
21+
'NetMF': '#3b82f6',
22+
'DeepWalk': '#f472b6',
23+
'Node2Vec': '#fb923c',
2324
};
2425

2526
const DATASETS = ['ego-Facebook', 'PPI-large', 'Flickr', 'ogbn-arxiv', 'Yelp'];
@@ -56,7 +57,7 @@ const MEMORY_DATA = {
5657

5758
const SCATTER_DATA = {
5859
'ego-Facebook': {
59-
'Cleora': { acc: 0.964, time: 0.740 },
60+
'Cleora (whiten)': { acc: 0.964, time: 0.740 },
6061
'NetMF': { acc: 0.944, time: 17.920 },
6162
'Node2Vec': { acc: 0.918, time: 111.426 },
6263
'DeepWalk': { acc: 0.912, time: 32.352 },

website/templates/api.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ <h4>Tuning</h4>
5353

5454
<div class="docs-content">
5555
<h1>API Reference</h1>
56-
<p>Complete API documentation for pycleora 3.0. All functions, parameters, and return values.</p>
56+
<p>Complete API documentation for pycleora 3.2. All functions, parameters, and return values.</p>
5757

5858
<h2 id="sparse-matrix">pycleora.SparseMatrix</h2>
5959
<div class="api-method">

website/templates/base.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="UTF-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1.0">
66
<title>{% block title %}pycleora{% endblock %} — Fast Graph Embeddings</title>
7-
<meta name="description" content="{% block meta_desc %}pycleora: The fastest CPU-only graph embedding library. Rust core, Python API. 5 graph generators, 7 alternative algorithms. Built-in MLP and Label Propagation classifiers — no PyTorch, no GPU.{% endblock %}">
7+
<meta name="description" content="{% block meta_desc %}pycleora: The fastest CPU-only graph embedding library. Rust core, Python API. 8 algorithms, 5 graph generators. Built-in MLP and Label Propagation classifiers — no PyTorch, no GPU.{% endblock %}">
88
<meta property="og:title" content="pycleora — Fast Graph Embeddings">
99
<meta property="og:description" content="The fastest CPU-only graph embedding library. 7 alternative algorithms, 5 graph generators, zero GPU required.">
1010
<meta property="og:type" content="website">

website/templates/benchmarks.html

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,15 @@ <h4>Analysis</h4>
3131

3232
<div class="docs-content">
3333
<h1>Benchmark Results</h1>
34-
<p>7 algorithms compared across 7 datasets (3 SNAP + 4 standard benchmarks). Node classification accuracy using Nearest Centroid classifier on 80/20 train-test split.</p>
34+
<p>7 algorithms compared across 7 datasets (3 SNAP downloads + 4 scale-matched synthetic). Node classification accuracy using Nearest Centroid classifier (zero hyperparameters) on 80/20 train-test split. Absolute accuracy is deliberately low — we use the simplest possible classifier to isolate embedding quality, not model performance.</p>
35+
36+
<div class="callout callout-info">
37+
<strong>Dataset note:</strong> ego-Facebook, roadNet-CA, and soc-LiveJournal1 are downloaded from SNAP. PPI-large, Flickr, ogbn-arxiv, and Yelp are <strong>scale-matched synthetic graphs</strong> (generated via SBM/Erdős–Rényi to reproduce node count, edge count, and community structure). They are not the original datasets. See <a href="#methodology">Methodology</a> for details.
38+
</div>
3539

3640
<div class="bench-section" id="viz-accuracy">
3741
<h2 id="summary">Summary Table</h2>
38-
<p>Best accuracy per dataset. <span class="best-score">Green</span> = best on that dataset. "&mdash;" = excluded (too large or not applicable). Cleora results use <code>whiten=True</code> with auto-tuned iterations for optimal quality.</p>
42+
<p>Best accuracy per dataset. <span class="best-score">Green</span> = best on that dataset. "&mdash;" = excluded (too large or not applicable). * = uses <code>whiten=True</code> post-processing.</p>
3943
<div class="bench-toggle">
4044
<button class="bench-toggle-btn active" data-view="chart">Chart</button>
4145
<button class="bench-toggle-btn" data-view="table">Table</button>
@@ -65,7 +69,7 @@ <h2 id="summary">Summary Table</h2>
6569
<td>ego-Facebook</td>
6670
<td>4,039</td>
6771
<td>88,234</td>
68-
<td class="best-score">0.964</td>
72+
<td class="best-score">0.964*</td>
6973
<td>0.021</td>
7074
<td>0.318</td>
7175
<td>0.944</td>
@@ -355,7 +359,7 @@ <h3>ego-Facebook</h3>
355359
<table class="bench-table">
356360
<thead><tr><th>Algorithm</th><th>Accuracy</th><th>Time</th></tr></thead>
357361
<tbody>
358-
<tr><td>Cleora (whiten)</td><td class="best-score">0.964</td><td>0.740s</td></tr>
362+
<tr><td>Cleora (whiten)*</td><td class="best-score">0.964</td><td>0.740s</td></tr>
359363
<tr><td>NetMF</td><td>0.944</td><td>17.920s</td></tr>
360364
<tr><td>Cleora-multiscale</td><td>0.942</td><td>0.593s</td></tr>
361365
<tr><td>Node2Vec</td><td>0.918</td><td>111.426s</td></tr>
@@ -414,7 +418,7 @@ <h2 id="when-to-use">When to Use What</h2>
414418
<tr>
415419
<td>Real-time / streaming</td>
416420
<td><strong>Cleora</strong></td>
417-
<td>Fastest for re-embedding. Constant memory. Incremental updates.</td>
421+
<td>Fastest for re-embedding. Lowest memory overhead. Incremental updates.</td>
418422
</tr>
419423
<tr>
420424
<td>Million-node graphs</td>
@@ -439,7 +443,7 @@ <h2 id="when-to-use">When to Use What</h2>
439443
<tr>
440444
<td>Memory constrained</td>
441445
<td><strong>Cleora</strong></td>
442-
<td>4 MB for 4k-node graph. 1.9 GB for 2M nodes. Best memory efficiency at every scale.</td>
446+
<td>16 MB for 4k nodes, 1.9 GB for 2M nodes. Lowest memory footprint at every scale tested.</td>
443447
</tr>
444448
<tr>
445449
<td>Maximum embedding speed</td>
@@ -457,7 +461,7 @@ <h2 id="methodology">Methodology</h2>
457461
<li><strong>Embedding dimension:</strong> 1024 for all algorithms</li>
458462
<li><strong>Metrics:</strong> Accuracy, Macro F1, wall-clock time, peak memory delta</li>
459463
<li><strong>Hardware:</strong> Single CPU core, no GPU</li>
460-
<li><strong>Datasets:</strong> Benchmarks use synthetically generated graphs (Erdős–Rényi, Barabási–Albert, SBM) with node and edge counts matching named real-world datasets. PPI-large, Flickr, ogbn-arxiv, and Yelp are simulated at matching scale; ego-Facebook, roadNet-CA, and soc-LiveJournal1 are downloaded from SNAP. Synthetic graphs are not the original datasets — they reproduce scale and community structure, not content.</li>
464+
<li><strong>Datasets:</strong> ego-Facebook, roadNet-CA, and soc-LiveJournal1 are downloaded from SNAP. PPI-large, Flickr, ogbn-arxiv, and Yelp are <strong>scale-matched synthetic graphs</strong> (generated via Erdős–Rényi, Barabási–Albert, SBM to reproduce node count, edge count, and community structure). Synthetic graphs are not the original datasets — they reproduce scale and structure, not content.</li>
461465
<li><strong>Walk-based params:</strong> num_walks=10, walk_length=20 (Facebook); excluded for larger graphs</li>
462466
<li><strong>Excluded algorithms:</strong> GraRep and HOPE (require dense n&times;n matrices, infeasible for 4k+ nodes)</li>
463467
</ul>

website/templates/docs.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ <h4>Tools</h4>
3838

3939
<div class="docs-content">
4040
<h1>Documentation</h1>
41-
<p>Complete guide to pycleora 3.0 — the fastest CPU-only graph embedding library.</p>
41+
<p>Complete guide to pycleora 3.2 — the fastest CPU-only graph embedding library.</p>
4242

4343
<h2 id="installation">Installation</h2>
4444
<h3>From PyPI (recommended)</h3>
@@ -127,7 +127,7 @@ <h2 id="basic-embedding">Cleora Embedding</h2>
127127

128128
<h2 id="algorithms">Alternative Algorithms</h2>
129129
<div class="callout callout-info">
130-
<strong>Note on negative sampling:</strong> DeepWalk, Node2Vec, NetMF, and GraRep all require negative sampling to approximate random walks. This introduces noise, stochastic variation, and reproducibility issues. Cleora eliminates negative sampling entirely — it computes all walks exactly via a single sparse matrix multiplication.
130+
<strong>Cleora vs walk-based methods:</strong> DeepWalk and Node2Vec sample random walks and train a skip-gram model (which uses negative sampling to approximate the softmax). NetMF factorizes the same co-occurrence matrix directly but still requires a negative sampling parameter. Cleora eliminates both walk sampling and skip-gram training entirely — it computes the full walk distribution via matrix powers.
131131
</div>
132132
<pre><code><span class="code-keyword">from</span> pycleora.algorithms <span class="code-keyword">import</span> (
133133
embed_deepwalk, embed_node2vec,

website/templates/functions.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ <h2>Everything Cleora Can Do</h2>
1717
</div>
1818
<div>
1919
<h3>Embedding Engine</h3>
20-
<p>9 algorithms unified under one API — spectral, walk-based, and matrix factorization methods</p>
20+
<p>8 algorithms unified under one API — spectral, walk-based, and matrix factorization methods</p>
2121
</div>
2222
</div>
2323
<div class="features-tag-grid">
@@ -48,7 +48,7 @@ <h3>Rust Performance Core</h3>
4848
<div class="features-highlights-row">
4949
<div class="features-highlight-chip"><strong>240x</strong> faster than GraphSAGE</div>
5050
<div class="features-highlight-chip"><strong>5 MB</strong> total footprint</div>
51-
<div class="features-highlight-chip"><strong>0</strong> external dependencies</div>
51+
<div class="features-highlight-chip"><strong>0</strong> heavy dependencies</div>
5252
</div>
5353
</div>
5454

@@ -396,12 +396,12 @@ <h3>Deterministic &amp; Reproducible</h3>
396396
</div>
397397
<div>
398398
<h3>Tiny Footprint</h3>
399-
<p>Just 5 MB installed with zero dependencies. Compare to 500 MB+ for PyTorch Geometric or DGL. Installs in seconds, not minutes.</p>
399+
<p>Just 5 MB installed — only numpy and scipy required. Compare to 500 MB+ for PyTorch Geometric or DGL. Installs in seconds, not minutes.</p>
400400
</div>
401401
</div>
402402
<div class="features-highlights-row">
403403
<div class="features-highlight-chip"><strong>5 MB</strong> vs 500 MB+</div>
404-
<div class="features-highlight-chip"><strong>0</strong> dependencies</div>
404+
<div class="features-highlight-chip"><strong>0</strong> heavy dependencies</div>
405405
<div class="features-highlight-chip">Installs in <strong>seconds</strong></div>
406406
</div>
407407
</div>

website/templates/index.html

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
<div class="hero-badge animate-in">v3.1 Released</div>
99
<h1 class="animate-in delay-1">Graph Embeddings.<br>Blazing Fast.</h1>
1010
<p class="subtitle animate-in delay-2">
11-
The only graph embedding library that performs <strong>all possible random walks in a single matrix multiplication</strong>.
12-
No negative sampling. No GPU. No noise. Just fast, deterministic, production-grade embeddings.
11+
The only graph embedding library that captures <strong>the equivalent of all random walks at each depth in one matrix power</strong>.
12+
No random walk sampling. No skip-gram training. No GPU. Just fast, deterministic, production-grade embeddings.
1313
</p>
1414
<div class="hero-buttons animate-in delay-3">
1515
<a href="/docs" class="btn btn-primary">Get Started</a>
@@ -62,8 +62,8 @@ <h3>Sparse Markov Matrix</h3>
6262
<div class="how-step scroll-reveal">
6363
<div class="step-number">02</div>
6464
<div class="step-content">
65-
<h3>Single Matrix Multiplication = All Random Walks</h3>
66-
<p>One sparse matrix multiplication captures <em>every possible random walk</em> of a given length. No sampling, no noise, no stochastic approximation. This is what makes Cleora deterministic and orders of magnitude faster.</p>
65+
<h3>Matrix Powers = All Walk Distributions</h3>
66+
<p>Each iteration multiplies the embedding matrix by the sparse transition matrix — M<sup>k</sup> captures <em>the full distribution of all walks of length k</em>. No sampling, no noise, no stochastic approximation. This is what makes Cleora deterministic and orders of magnitude faster.</p>
6767
</div>
6868
</div>
6969
<div class="how-connector scroll-reveal"></div>
@@ -87,8 +87,8 @@ <h2>What Makes Cleora Different</h2>
8787
<div class="advantage-icon">
8888
<svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="10"/><path d="M8 12l3 3 5-5"/></svg>
8989
</div>
90-
<h3>No Negative Sampling</h3>
91-
<p>Unlike DeepWalk, Node2Vec, and LINE, Cleora doesn't approximate random walks with negative sampling. It computes <strong>all walks exactly</strong> via matrix multiplication. Less noise, higher accuracy, perfect reproducibility.</p>
90+
<h3>No Sampling, No Training</h3>
91+
<p>Unlike DeepWalk, Node2Vec, and LINE, Cleora eliminates both random walk sampling AND skip-gram training entirely. It captures <strong>all walk distributions exactly</strong> via matrix powers. No noise, perfect reproducibility.</p>
9292
</div>
9393
<div class="advantage-card scroll-reveal" data-delay="100">
9494
<div class="advantage-icon">
@@ -102,7 +102,7 @@ <h3>240x Faster Than GraphSAGE</h3>
102102
<svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="3" y="3" width="18" height="18" rx="2"/><path d="M3 9h18M9 3v18"/></svg>
103103
</div>
104104
<h3>Deterministic Embeddings</h3>
105-
<p>Same input always produces the same output. No random seeds, no stochastic variation, no "run it 5 times and average" workflows. Critical for reproducible research and production ML pipelines.</p>
105+
<p>Same input always produces the same output. Deterministic by default — no stochastic variation, no "run it 5 times and average" workflows. Critical for reproducible research and production ML pipelines.</p>
106106
</div>
107107
<div class="advantage-card scroll-reveal" data-delay="300">
108108
<div class="advantage-icon">
@@ -115,8 +115,8 @@ <h3>Heterogeneous Hypergraphs</h3>
115115
<div class="advantage-icon">
116116
<svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 16V8a2 2 0 00-1-1.73l-7-4a2 2 0 00-2 0l-7 4A2 2 0 003 8v8a2 2 0 001 1.73l7 4a2 2 0 002 0l7-4A2 2 0 0021 16z"/></svg>
117117
</div>
118-
<h3>5 MB, Zero Dependencies</h3>
119-
<p>The entire library is ~5 MB. Compare: PyTorch Geometric is 500 MB+, DGL is 400 MB+. Cleora ships as a single compiled Rust extension. No CUDA, no cuDNN, no GPU driver headaches.</p>
118+
<h3>5 MB, No Heavy Dependencies</h3>
119+
<p>The entire library is ~5 MB with only numpy and scipy. Compare: PyTorch Geometric is 500 MB+, DGL is 400 MB+. Cleora ships as a single compiled Rust extension. No CUDA, no cuDNN, no GPU driver headaches.</p>
120120
</div>
121121
<div class="advantage-card scroll-reveal" data-delay="500">
122122
<div class="advantage-icon">
@@ -163,7 +163,7 @@ <h4>Customer-Restaurant Graph</h4>
163163
</div>
164164
<div>
165165
<h4>Cleora Embeddings <span class="flow-time">&lt; 5 minutes</span></h4>
166-
<p>197x faster than DeepWalk. No sampling of positive/negative examples. Purely structure-based — iterative weighted averaging of neighbor embeddings + L2 normalization.</p>
166+
<p>240x faster than GraphSAGE, 197x faster than DeepWalk (as measured by Zomato). No walk sampling, no skip-gram training. Purely structure-based — iterative weighted averaging of neighbor embeddings + L2 normalization.</p>
167167
</div>
168168
</div>
169169
<div class="flow-arrow scroll-reveal">&darr;</div>
@@ -235,10 +235,10 @@ <h2>Trusted in Production Worldwide</h2>
235235
</div>
236236
<div class="testimonial-card scroll-reveal" data-delay="200">
237237
<div class="testimonial-quote">
238-
"Cleora-powered solutions achieved top placements in KDD Cup 2021, WSDM WebTour 2021, and SIGIR eCom 2020 — beating deep learning approaches on travel, e-commerce, and web recommendation benchmarks."
238+
Cleora-powered solutions achieved top placements in KDD Cup 2021, WSDM WebTour 2021, and SIGIR eCom 2020 — beating deep learning approaches on travel, e-commerce, and web recommendation benchmarks.
239239
</div>
240240
<div class="testimonial-source">
241-
<div class="testimonial-company">ML Competitions</div>
241+
<div class="testimonial-company">Competition Results</div>
242242
<div class="testimonial-role">KDD Cup, WSDM, SIGIR</div>
243243
</div>
244244
</div>
@@ -374,11 +374,11 @@ <h3>Sparse Markov Matrix</h3>
374374
</div>
375375
<div class="pipeline-text">
376376
<div class="pipeline-step-num">04</div>
377-
<h3>Single Matrix Multiplication = All Walks</h3>
378-
<p>One sparse matrix multiplication captures <em>every possible random walk</em> of a given length. No sampling, no noise — this is the mathematical breakthrough that makes Cleora deterministic and fast.</p>
377+
<h3>Matrix Power = All Walk Distributions</h3>
378+
<p>Each iteration applies one sparse matrix power — M<sup>k</sup> captures <em>the full distribution of all walks of length k</em>. No sampling, no noise — this is what makes Cleora deterministic and fast.</p>
379379
<div class="pipeline-highlight-badge">
380380
<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="var(--green)" stroke-width="2"><path d="M13 2L3 14h9l-1 8 10-12h-9l1-8z"/></svg>
381-
All walks captured in 1 multiplication
381+
Complete walk distributions, zero sampling
382382
</div>
383383
</div>
384384
</div>
@@ -1300,7 +1300,7 @@ <h2>From Edges to Embeddings in 5 Lines</h2>
13001300
<section class="section cta-section">
13011301
<div class="cta-card scroll-reveal">
13021302
<h2>Ready to Embed Your Graph?</h2>
1303-
<p>Join Zomato, Dailymotion, and hundreds of ML teams using Cleora in production. Install in seconds, embed in minutes.</p>
1303+
<p>Join Zomato, Dailymotion, Synerise, and ML teams worldwide using Cleora in production. Install in seconds, embed in minutes.</p>
13041304
<div class="cta-buttons">
13051305
<a href="/docs" class="btn btn-primary btn-lg">Read the Docs</a>
13061306
<a href="https://github.com/BaseModelAI/cleora" class="btn btn-secondary btn-lg" target="_blank">Star on GitHub</a>

0 commit comments

Comments
 (0)