vram-optimization

Here are 23 public repositories matching this topic...

EricRollei / Comfy_HunyuanImage3

Nodes to run Hunyuan Image 3 locally with BF16 and NF4 quantized options in Comfyui

Updated Apr 30, 2026
Python

megeezy / Chameleon

Stateless LLM runtime that dynamically routes, loads, executes, and unloads models per request with bounded VRAM caching and intelligent model selection.

systems-programming llm generative-ai ai-infrastructure latency-optimization model-routing vram-optimization model-scheduling

Updated Apr 12, 2026
Rust

mkim87404 / ComfyUI-ControlOrder-FreeMemory

Star

ComfyUI custom node that controls the order of node execution with linear routing of any data type through infinite I/O slots + option to free VRAM & RAM at any point in a workflow with device-agnostic memory management utilities managed by ComfyUI that safely unload all models, while preserving all connected data & models through to the next node.

Updated Apr 15, 2026
Python

philtimmes / KeSSie

Star

KeSSie HUGE Context Semantic recall for Large Language Models

Updated Feb 21, 2026
Python

Uni404x64 / ACE-Step-1.5-Installer

Star

One-click installer for ACE-Step 1.5

Updated Mar 23, 2026
HTML

Alperen012 / TurboQuant

Star

Ultra-Low Bit KV-Cache Compression optimization layer built on top of llama.cpp for LLM inference. Reduces VRAM overhead by ~75-80% using custom CUDA kernels.

machine-learning cuda inference quantization kv-cache llm llama-cpp agent-memory vram-optimization

Updated Apr 12, 2026
C++

damienos61 / SynapSwap

Star

Predictive VRAM Virtualization Engine

c performance deep-learning cuda artificial-intelligence memory-management gpu-computing system-programming c-language inference-engine llm-inference llm-inference-poisoning vram-optimization pcie-transfer

Updated Feb 5, 2026
C

AlfaPankaj / Neural_Memory_Operating_system

Star

NMOS (Neural Memory OS) is a predictive partial execution engine enabling 70B-level reasoning on 4GB VRAM. It uses the “Zero-Lag” hypothesis, leveraging typing latency as a compute window to mask memory limits via async layer prefetching and speculative decoding.

python machine cuda pytorch memory-management hnsw edge-ai llm generative-ai local-llm llm-inference speculative-decoding smollm2 vram-optimization anticipatory-inference layer-offloading prefeching 70b-model

Updated Apr 28, 2026
Python

WizardsForgeIo / sparsemma

Star

INT8 Sparse Tensor Core GEMM for PyTorch — built for Windows

windows gpu cuda inference pytorch nvidia sparse quantization gemm int8 ptx structured-sparsity tensor-cores vram-optimization

Updated Feb 16, 2026
Cuda

Pomilon / LEMA

Star

LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework for fine-tuning LLMs in VRAM-constrained environments using asynchronous binary pre-fetching and triple-tier memory orchestration.

machine-learning cuda pytorch memory-management lora system-architecture fine-tuning llm safetensors vram-optimization low-resource-computing lema

Updated Mar 27, 2026
Python

bendangnuksung / dynabatch

Star

PyTorch/Hugging Face batching utility that sorts variable-length text by difficulty, then dynamically increases batch size on easier samples using a pre-trained VRAM predictor to improve GPU utilization and throughput while reducing OOM risk with fallback handling.

machine-translation transformers pytorch dataloader sampler huggingface dynamic-batching vram-optimization huggingface-trainer

Updated Apr 28, 2026
Python

Pomilon / LEMA-llama

Star

A Proof of Concept for the LEMA (Layer-wise Efficient Memory Abstraction) framework. Enables stable fine-tuning of Llama-2-7B on consumer-grade hardware (16GB VRAM) through layer-wise weight streaming and triple-buffer memory virtualization.

machine-learning deep-learning pytorch kaggle memory-efficiency fine-tuning llm llama2 ai-infrastructure low-resource-ai vram-optimization low-resource-computing lema lema-framework

Updated Feb 18, 2026
Jupyter Notebook

iknowkungfubar / IronSilo

Star

Turn your PC into a private, autonomous AI lab, without melting your GPU.

docker privacy aider local-ai agentic-workflow vram-optimization

Updated Apr 30, 2026
Python

anthony-maio / fitcheck

Star

Know before you train — VRAM estimation for LLM fine-tuning.

training gpu fine-tuning-llm vram-optimization

Updated Feb 16, 2026
Python

WarrodSequen / sparsemma

Star

Accelerate INT8 sparse inference in PyTorch on Windows with minimal setup. Achieve high performance using Sparse Tensor Cores without Linux dependencies.

windows gpu cuda inference pytorch nvidia sparse quantization gemm int8 ptx structured-sparsity tensor-cores vram-optimization

Updated Feb 28, 2026

cryptopoly / ChaosEngine

Sponsor

Star

Compress the cache. Keep the quality

python machine-learning compression cuda inference pca mlx kv-cache llm vram-optimization

Updated Apr 13, 2026
Python

mkim87404 / ComfyUI-TransformerLLMTaskRunner

Star

ComfyUI Custom Node for running Transformer LLMs with zero dependency conflicts. Provides device-agnostic VRAM/RAM cleanup options post-run & optional dynamic LLM prompt formatting with variable inputs of any type.

Updated Mar 31, 2026
Python

junkyard22 / holster-memory

Star

Tiered GPU memory architecture for consumer AI inference. VRAM as execution cache, system RAM as passive staging layer.

inference pytorch transformer gpu-memory memory-management offloading vram llm local-ai consumer-gpu vram-optimization

Updated Apr 15, 2026
Python

JuiceB0xC0de / lucky-pick-scheduler

Star

Sticky-block topology lottery scheduler for transformer fine-tuning. Less VRAM, less wall-clock, bigger models.

machine-learning amd transformers pytorch lora rocm fine-tuning weights-and-biases llm vram-optimization

Updated Apr 28, 2026
Python

27tr7437 / Neural_Memory_Operating_system

Star

Predict large language model inference with memory prefetching and speculative decoding for faster reasoning on low VRAM hardware

python machine cuda pytorch memory-management rag hnsw ai-os generative-ai local-llm gguf agi-architecture vram-optimization anticipatory-inference layer-offloading

Updated Apr 30, 2026
Python

Improve this page

Add a description, image, and links to the vram-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vram-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vram-optimization

Here are 23 public repositories matching this topic...

EricRollei / Comfy_HunyuanImage3

megeezy / Chameleon

mkim87404 / ComfyUI-ControlOrder-FreeMemory

philtimmes / KeSSie

Uni404x64 / ACE-Step-1.5-Installer

Alperen012 / TurboQuant

damienos61 / SynapSwap

AlfaPankaj / Neural_Memory_Operating_system

WizardsForgeIo / sparsemma

Pomilon / LEMA

bendangnuksung / dynabatch

Pomilon / LEMA-llama

iknowkungfubar / IronSilo

anthony-maio / fitcheck

WarrodSequen / sparsemma

cryptopoly / ChaosEngine

mkim87404 / ComfyUI-TransformerLLMTaskRunner

junkyard22 / holster-memory

JuiceB0xC0de / lucky-pick-scheduler

27tr7437 / Neural_Memory_Operating_system

Improve this page

Add this topic to your repo