You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update project documentation and code examples for clarity
Update README.md to reflect changes in embedding algorithms and dataset count, and refine code examples for better user understanding.
Replit-Commit-Author: Agent
Replit-Commit-Session-Id: 2f70347b-d6bb-488b-85b2-389df1f2a2e8
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: 4f43b83c-7662-4eca-8c27-7fd712a78690
Replit-Helium-Checkpoint-Created: true
similar = find_most_similar(graph, embeddings, "alice", top_k=5)
55
65
for r in similar:
56
66
print(f"{r['entity_id']}: {r['similarity']:.4f}")
57
67
```
58
68
59
-
### Full Usage Example
69
+
### Step-by-Step Example
70
+
71
+
The high-level `embed()` function wraps the Markov propagation loop for convenience. Here's the full manual version, which gives you complete control over the process:
pycleora similar --input graph.tsv --entity alice --top-k 10
115
+
pycleora benchmark --dataset karate_club
124
116
```
125
117
126
118
---
@@ -139,7 +131,7 @@ Same input always produces the same output. No random seeds, no stochastic varia
139
131
### Heterogeneous Hypergraphs
140
132
Natively handles multi-type nodes and edges, bipartite graphs, and hypergraphs. TSV input with typed columns like `complex::reflexive::product`. No graph preprocessing needed.
141
133
142
-
### 5 MB, Zero Dependencies
134
+
### ~5 MB, Zero Dependencies
143
135
The entire library is ~5 MB. Compare: PyTorch Geometric is 500 MB+, DGL is 400 MB+. Cleora ships as a single compiled Rust extension. No CUDA, no cuDNN, no GPU driver headaches.
144
136
145
137
### Stable & Inductive
@@ -161,7 +153,53 @@ Embeddings are stable across runs and support inductive learning: new nodes can
161
153
|**GraRep**| Matrix Factorization | Graph Representations with Global Structural Information |
162
154
|**GCN**| Mini-GNN | 2-layer Graph Convolutional Network classifier in pure numpy/scipy — no PyTorch needed |
163
155
164
-
All 9 algorithms are unified under a single API. Switch between methods by changing one parameter.
156
+
All algorithms are unified under a single API. Switch between methods by changing one parameter:
Beyond the standard algorithms, Cleora supports several advanced embedding strategies:
167
+
168
+
-**Multiscale embeddings** — concatenates embeddings from different iteration depths (e.g. scales `[1, 2, 4, 8]`) to capture both local and global graph structure simultaneously
See the [full API reference](https://cleora.ai/api) for details on every function and parameter.
165
203
166
204
---
167
205
@@ -192,49 +230,48 @@ Zomato's ML team needed graph embeddings to power "People Like You" restaurant r
192
230
193
231
## Benchmarks
194
232
195
-
Tested on real-world graphs from 4K to 2M+ nodes. Cleora wins on accuracy, speed, and memory.
233
+
Benchmarked against **7 competing algorithms**on **5 real-world datasets** (ego-Facebook, Cora, CiteSeer, PubMed, PPI) plus a 2M-node scale test. All datasets are genuine academic benchmarks from SNAP, Planetoid, and DGL. Cleora wins on accuracy on **every single dataset**.
196
234
197
-
### Link Prediction Accuracy (AUC)
235
+
Full interactive benchmark results at [cleora.ai/benchmarks](https://cleora.ai/benchmarks).
> **Only 3 of 8 algorithms survive at 19.7K nodes.** HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all crash or time out. Cleora achieves perfect accuracy on PPI (50 classes).
@@ -253,6 +290,8 @@ Tested on real-world graphs from 4K to 2M+ nodes. Cleora wins on accuracy, speed
253
290
-**Drug Discovery** — Molecule and protein interaction networks
254
291
-**Supply Chain** — Supplier and logistics graph analysis
255
292
293
+
See [cleora.ai/use-cases](https://cleora.ai/use-cases) for detailed walkthroughs with code examples.
294
+
256
295
---
257
296
258
297
## How It Works
@@ -284,7 +323,7 @@ A: Any entities that interact with each other, co-occur or can be said to be pre
284
323
285
324
**Q: How should I construct the input?**
286
325
287
-
A: What works best is grouping entities co-occurring in a similar context, and feeding them in whitespace-separated lines using `complex::reflexive` modifier is a good idea. E.g. if you have product data, you can group the products by shopping baskets or by users. If you have urls, you can group them by browser sessions, of by (user, time window) pairs. Check out the usage example above. Grouping products by customers is just one possibility.
326
+
A: What works best is grouping entities co-occurring in a similar context, and feeding them in whitespace-separated lines using `complex::reflexive` modifier is a good idea. E.g. if you have product data, you can group the products by shopping baskets or by users. If you have urls, you can group them by browser sessions, or by (user, time window) pairs. Check out the usage example above. Grouping products by customers is just one possibility.
288
327
289
328
**Q: Can I embed users and products simultaneously, to compare them with cosine similarity?**
290
329
@@ -322,10 +361,12 @@ A: Not using negative sampling is a great boon. By constructing the (sparse) Mar
0 commit comments