Tag

#deployment

Every item tagged deployment, newest first.

8 items

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

A user shared a Docker deployment configuration for running GLM-5.2 with SGLang on an HGX-H200 GPU. The setup uses 8 GPU tensor cores and allocates a fraction of system memory. This configuration may help others deploy GLM-5.2 locally with similar hardware.

Key takeaways

Uses lmsysorg/sglang:latest Docker image.
Configured for HGX-H200 GPU with 8 tensor cores.
Allocates a fraction of system memory for the model.

rr/LocalLLaMA#docker #local-llm #deployment

otherJul 10

Announcing New Hugging Face and KerasHub integration

Hugging Face and KerasHub have integrated to enable seamless model sharing and deployment. The integration allows users to easily share and deploy models across both platforms. This collaboration aims to streamline the model development and deployment process for builders. You can now access and deploy models from KerasHub directly within Hugging Face.

Key takeaways

Hugging Face and KerasHub are now integrated for model sharing.
The integration enables easy model deployment across platforms.
Model developers can now access and deploy models from KerasHub within Hugging Face.

HHugging Face Blog#model-sharing #deployment #integration

toolsDec 5

Optimum-NVIDIA Unlocking blazingly fast LLM inference in just 1 line of code

Optimum-NVIDIA enables one-line deployment of optimized LLM inference on NVIDIA hardware. This integration streamlines deployment for builders targeting high-performance, low-latency applications. Optimum-NVIDIA abstracts away low-level optimization details, allowing developers to focus on model development. You can now deploy optimized models with minimal code changes.

Key takeaways

One-line deployment of optimized LLM inference on NVIDIA hardware.
Simplifies deployment for high-performance applications.
Abstracts low-level optimization details for developers.

HHugging Face Blog#inference-optimization #nvidia #deployment

toolsJun 15

Deploy Livebook notebooks as apps to Hugging Face Spaces

Livebook notebooks can now be deployed as apps directly to Hugging Face Spaces. This integration allows developers to share interactive notebooks with a broader audience. Builders can focus on model development while leveraging Spaces for deployment and sharing. The integration streamlines the process of turning notebooks into web applications.

Key takeaways

Livebook notebooks deployable to Hugging Face Spaces.
Simplifies sharing interactive notebooks.
Streamlines notebook to web app conversion.

HHugging Face Blog#livebook #hugging-face #deployment

modelsMay 15

Run a Chatgpt-like Chatbot on a Single GPU with ROCm

The Hugging Face Transformers library now supports AMD's ROCm platform, enabling deployment of chatbots like Llama-3 on a single GPU. This integration lowers the hardware barrier for running large language models, making it feasible for developers to deploy AI models on more affordable hardware. By supporting ROCm, Hugging Face expands access to AI technology. Developers can now utilize AMD GPUs for model deployment.

Key takeaways

Hugging Face Transformers supports AMD's ROCm platform.
Enables deployment of large language models on a single GPU.
Reduces hardware requirements for AI model deployment.

HHugging Face Blog#gpu-acceleration #open-source #deployment

modelsOct 13

🧨 Stable Diffusion in JAX / Flax !

Hugging Face has released a JAX/Flax implementation of Stable Diffusion. This allows for faster and more efficient deployment on TPUs and GPUs. The new implementation enables builders to leverage JAX's performance optimizations and Flax's ease of use.

Key takeaways

Stable Diffusion now available in JAX/Flax.
Enables faster deployment on TPUs and GPUs.
Leverages JAX performance optimizations and Flax ease of use.

HHugging Face Blog#jax #flax #stable-diffusion #deployment

otherMay 13

Director of Machine Learning Insights [Part 2: SaaS Edition]

The Director of Machine Learning Insights at Hugging Face shares best practices for building and deploying machine learning models in SaaS applications. The role involves guiding the development of predictive models and ensuring their successful integration into products. Builders should focus on model interpretability, scalability, and monitoring to drive business value. Effective collaboration between data science and engineering teams is also crucial.

Key takeaways

Model interpretability and scalability are key for SaaS applications.
Monitoring and feedback loops help drive business value.
Collaboration between data science and engineering is essential.

HHugging Face Blog#saas #machine-learning #deployment

toolsNov 30

Getting Started with Hugging Face Transformers for IPUs with Optimum

The Hugging Face Optimum library now supports Graphcore's IPUs for transformer inference. This integration enables developers to deploy and optimize transformer models on Graphcore hardware. You can get started with example code and tutorials. The collaboration aims to make transformer deployment more accessible.

Key takeaways

Hugging Face Optimum supports Graphcore IPUs for transformer inference.
Developers can deploy and optimize models on Graphcore hardware.
Example code and tutorials are available.

HHugging Face Blog#hugging-face #ipu #transformers #deployment