1sec.ai

Tag

#pytorch

Every item tagged pytorch, newest first.

18 items

modelsJun 11

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

The Hugging Face blog details optimizing PyTorch performance by fusing MLP layers. Fusing nn.Linear layers into a single kernel improves inference speed and reduces memory usage. This technique can be applied to other PyTorch modules for similar performance gains. Builders can use these optimizations to deploy models more efficiently.

Key takeaways
  • Fusing nn.Linear layers improves inference speed.
  • Reduces memory usage.
  • Optimization technique applicable to other PyTorch modules.
tutorialsMay 29

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

PyTorch's torch.profiler module provides a built-in profiling tool for analyzing model performance. It helps identify performance bottlenecks and optimizes code. You can use it to profile PyTorch models and understand where time is spent during execution. This guide provides a beginner's introduction to using torch.profiler.

Key takeaways
  • torch.profiler is a built-in PyTorch module for profiling.
  • Helps identify performance bottlenecks in PyTorch models.
  • Optimizes code by understanding execution time distribution.
otherApr 8

Safetensors is Joining the PyTorch Foundation

Safetensors, a secure serialization format for machine learning models, has joined the PyTorch Foundation as a new member. This move aims to promote safe and secure model deployment across the AI ecosystem. By integrating with PyTorch, Safetensors can enhance model security and facilitate collaboration among developers. You can expect increased adoption of secure practices in model development.

Key takeaways
  • Safetensors joins PyTorch Foundation.
  • Promotes secure model deployment.
  • Enhances model security in AI ecosystem.
otherOct 10

Arm will be @ PyTorch Conference, Join Us!

Arm will be attending the PyTorch Conference. The company will likely showcase its work on optimizing PyTorch for Arm-based hardware. You can join the conference to learn more about Arm's efforts in the AI and machine learning space. The event will cover various topics related to PyTorch and its applications.

Key takeaways
  • Arm attending PyTorch Conference
  • Optimizing PyTorch for Arm-based hardware
  • Conference covers PyTorch applications
toolsMay 21

nanoVLM: The simplest repository to train your VLM in pure PyTorch

The nanoVLM repository on Hugging Face provides a simple way to train vision-language models in pure PyTorch. It offers a minimalistic approach to VLM training, making it accessible to builders who want to experiment with VLMs without complex setups. The repository includes example code and a straightforward training loop. You can use it to train your own VLMs with ease.

Key takeaways
  • Train VLMs in pure PyTorch with minimal code.
  • nanoVLM provides example code and a simple training loop.
  • Accessible to builders who want to experiment with VLMs.
toolsDec 24

Visualize and understand GPU memory in PyTorch

The Hugging Face blog post explains how to visualize and understand GPU memory usage in PyTorch. It provides practical tips and tools for monitoring and optimizing memory allocation during model training. You can use these techniques to reduce memory usage and prevent out-of-memory errors. Effective memory management is crucial for large-scale model training.

Key takeaways
  • Use PyTorch's built-in memory tracking tools.
  • Visualize memory usage with external tools.
  • Optimize memory allocation to prevent errors.
modelsSep 13

Fine-tuning Llama 2 70B using PyTorch FSDP

The Hugging Face Transformers library now supports PyTorch FSDP for fine-tuning large models like Llama 2 70B. This integration enables efficient use of GPU memory during training, making it possible to fine-tune large models on a single machine. Builders can now optimize model performance and reduce memory usage. The approach has been shown to be effective in practice.

Key takeaways
  • PyTorch FSDP now supported in Hugging Face Transformers.
  • Enables fine-tuning of large models like Llama 2 70B on a single machine.
  • Reduces GPU memory usage during training.
modelsFeb 6

Accelerating PyTorch Transformers with Intel Sapphire Rapids - part 2

Intel and Hugging Face collaborated to optimize PyTorch transformer inference on Intel Sapphire Rapids processors. The work resulted in up to 2x faster inference performance for certain transformer models. You can reproduce the results and apply similar optimizations to your own models using the provided code and benchmarks.

Key takeaways
  • Up to 2x faster inference on Sapphire Rapids processors.
  • Optimizations available for PyTorch transformers.
  • Code and benchmarks provided for reproducibility.
modelsJan 2

Accelerating PyTorch Transformers with Intel Sapphire Rapids - part 1

Intel and Hugging Face collaborated to optimize PyTorch transformer performance on Intel Sapphire Rapids CPUs. The work resulted in significant speedups for transformer inference, making it more efficient for builders to deploy AI models. This optimization enables faster and more cost-effective model serving. You can leverage these improvements in your own applications.

Key takeaways
  • PyTorch transformer inference sped up on Intel Sapphire Rapids.
  • Optimization achieved through Intel and Hugging Face collaboration.
  • Faster inference enables more efficient model deployment.
otherOct 21

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

The Hugging Face blog post explains how to master distributed training with PyTorch DDP, Accelerate, and Trainer. It provides a step-by-step guide on using these tools for efficient model training. You can learn how to scale your training process with ease. The post targets developers looking to optimize their model training workflows.

Key takeaways
  • Hugging Face provides a guide on distributed training with PyTorch DDP and Accelerate.
  • The guide covers using Trainer for efficient model training.
  • It aims to help developers optimize their training workflows.
toolsSep 27

How 🤗 Accelerate runs very large models thanks to PyTorch

Hugging Face's Accelerate library leverages PyTorch to enable efficient training and inference of very large models. It provides a simple API for scaling model training across multiple GPUs and TPU devices. This allows builders to handle large-scale AI workloads without worrying about low-level hardware management. By using Accelerate, you can focus on model development rather than infrastructure.

Key takeaways
  • Accelerate simplifies large model training across GPUs and TPUs.
  • PyTorch integration enables efficient model scaling.
  • Abstracts away low-level hardware complexity.
modelsSep 16

Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate

Hugging Face and Microsoft collaborated on optimized inference scripts for BLOOM, a large open LLM, using DeepSpeed and Accelerate. These scripts enable fast inference on consumer-grade hardware. You can access the scripts and benchmark results on the Hugging Face blog.

Key takeaways
  • BLOOM inference optimized using DeepSpeed and Accelerate.
  • Enables fast inference on consumer-grade hardware.
  • Scripts and benchmarks available on Hugging Face blog.
tutorialsJun 30

Policy Gradient with PyTorch

The Hugging Face blog post explains how to implement policy gradient methods using PyTorch. Policy gradient is a type of reinforcement learning algorithm. You can use it to train agents to make decisions in complex environments. The post provides a practical example of training an agent using PyTorch.

Key takeaways
  • Policy gradient is a type of reinforcement learning algorithm.
  • PyTorch can be used to implement policy gradient methods.
  • The Hugging Face blog post provides a practical example of training an agent.
toolsMay 2

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

PyTorch's Fully Sharded Data Parallel (FSDP) algorithm can accelerate large model training by reducing communication overhead. FSDP shards model parameters across workers, reducing memory usage and enabling faster training. You can implement FSDP using PyTorch's native APIs or through integrations with libraries like Hugging Face's Transformers. This technique is particularly useful for training large language models and computer vision models.

Key takeaways
  • FSDP reduces communication overhead in large model training.
  • Shards model parameters across workers, lowering memory usage.
  • Enables faster training for large language and computer vision models.
toolsNov 19

Accelerating PyTorch distributed fine-tuning with Intel technologies

Intel and Hugging Face collaborated to optimize PyTorch distributed fine-tuning on Intel hardware. The work improves training speed and efficiency for large models. You can now fine-tune models faster and more efficiently on Intel-based infrastructure. This acceleration enables builders to explore more model configurations and experiment with larger models.

Key takeaways
  • PyTorch distributed fine-tuning optimized for Intel hardware.
  • Faster training speeds and improved efficiency for large models.
  • Enables exploration of more model configurations and larger models.
modelsFeb 9

Hugging Face on PyTorch / XLA TPUs

Hugging Face now supports PyTorch on XLA TPUs, enabling faster and more efficient model training. This integration allows for optimized performance on Google Cloud TPU hardware. You can leverage this setup to train large models at scale. The move expands Hugging Face's hardware compatibility.

Key takeaways
  • PyTorch on XLA TPUs now supported by Hugging Face.
  • Optimized performance on Google Cloud TPU hardware.
  • Faster and more efficient model training available.
researchSep 10

Block Sparse Matrices for Smaller and Faster Language Models

Researchers propose block sparse matrices for compressing language models, reducing memory usage and improving inference speed. This technique can be applied to various models, enabling smaller and faster deployments. By leveraging block sparsity, builders can create more efficient language model implementations. The approach has been integrated into PyTorch.

Key takeaways
  • Block sparse matrices reduce memory usage and improve inference speed.
  • Technique applicable to various language models.
  • Integrated into PyTorch for easier adoption.