1sec.ai

Tag

#webgpu

Every item tagged webgpu, newest first.

2 items

I made a FAQ Chatbot that runs completely in browser; Local AI in Two Clicks

A developer created a FAQ chatbot that runs entirely in-browser using webLLM and a simple RAG. The chatbot is deployed on a static website, allowing easy updates to its knowledge base. With chromium's WebGPU support, the chatbot can run on modest hardware, including some phones. This showcases advancements in AI interface architecture and the capabilities of small models.

Key takeaways
  • Runs entirely in-browser with webLLM and RAG.
  • Deployed on a static website for easy knowledge base updates.
  • Works on modest hardware, including some phones, thanks to WebGPU support.

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B was run in-browser at 255 tokens/s using WebGPU kernels developed with Fable 5 before its shutdown. The optimized kernels and demo have been released for public use on Hugging Face. This achievement shows that high-performance LLM inference is feasible on web platforms, enabling new deployment options for builders. The demo provides a practical example of optimized WebGPU kernels in action.

Key takeaways
  • 255 tokens/s on M4 Max with WebGPU kernels.
  • Kernels and demo released on Hugging Face for public use.
  • Enables high-performance LLM inference on web platforms.