Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5
Gemma 4 E2B was run in-browser at 255 tokens/s using WebGPU kernels developed with Fable 5 before its shutdown. The optimized kernels and demo have been released for public use on Hugging Face. This achievement shows that high-performance LLM inference is feasible on web platforms, enabling new deployment options for builders. The demo provides a practical example of optimized WebGPU kernels in action.
Key takeaways
- 255 tokens/s on M4 Max with WebGPU kernels.
- Kernels and demo released on Hugging Face for public use.
- Enables high-performance LLM inference on web platforms.
Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5
Gemma 4 E2B was run in-browser at 255 tokens/s using WebGPU kernels developed with Fable 5 before its shutdown. The optimized kernels and demo have been released for public use on Hugging Face. This achievement shows that high-performance LLM inference is feasible on web platforms, enabling new deployment options for builders. The demo provides a practical example of optimized WebGPU kernels in action.
Key takeaways
- 255 tokens/s on M4 Max with WebGPU kernels.
- Kernels and demo released on Hugging Face for public use.
- Enables high-performance LLM inference on web platforms.