1sec.ai

Tag

#in-browser-llm

Every item tagged in-browser-llm, newest first.

1 item

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B was run in-browser at 255 tokens/s using WebGPU kernels developed with Fable 5 before its shutdown. The optimized kernels and demo have been released for public use on Hugging Face. This achievement shows that high-performance LLM inference is feasible on web platforms, enabling new deployment options for builders. The demo provides a practical example of optimized WebGPU kernels in action.

Key takeaways
  • 255 tokens/s on M4 Max with WebGPU kernels.
  • Kernels and demo released on Hugging Face for public use.
  • Enables high-performance LLM inference on web platforms.