#in-browser-llm — 1sec.ai

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B was run in-browser at 255 tokens/s using WebGPU kernels developed with Fable 5 before its shutdown. The optimized kernels and demo have been released for public use on Hugging Face. This achievement shows that high-performance LLM inference is feasible on web platforms, enabling new deployment options for builders. The demo provides a practical example of optimized WebGPU kernels in action.

Key takeaways

255 tokens/s on M4 Max with WebGPU kernels.
Kernels and demo released on Hugging Face for public use.
Enables high-performance LLM inference on web platforms.

rr/LocalLLaMA#webgpu #in-browser-llm #local-llm #webml