A100 vs V100 Deep Learning Benchmarks | Lambda (2024)

Lambda is now shipping Tesla A100 servers. In this post, we benchmark the PyTorch training speed of the Tesla A100 and V100, both with NVLink. For more info, including multi-GPU training performance, see our GPU benchmark center.

For training convnets with PyTorch, the Tesla A100 is...
  • 2.2x faster than the V100 using 32-bit precision.*
  • 1.6x faster than the V100 using mixed precision.
For training language models with PyTorch, the Tesla A100 is...
  • 3.4x faster than the V100 using 32-bit precision.
  • 2.6x faster than the V100 using mixed precision.

* In this post, for A100s, 32-bit refers to FP32 + TF32; for V100s, it refers to FP32.

View Lambda's Tesla A100 server

A100 vs V100 convnet training speed, PyTorch

A100 vs V100 Deep Learning Benchmarks | Lambda (4)
  • All numbers are normalized by the 32-bit training speed of 1x Tesla V100.
  • The chart shows, for example: 32-bit training with 1x A100 is 2.17x faster than 32-bit training 1x V100; 32-bit training with 4x V100s is 3.88x faster than 32-bit training with 1x V100; and mixed precision training with 8x A100 is 20.35x faster than 32-bit training with 1x V100.
  • Results averaged across SSD, ResNet-50, and Mask RCNN.
  • For batch size info, see the raw data at our GPU benchmarking center.

View Lambda's Tesla A100 server

A100 vs V100 language model training speed, PyTorch

A100 vs V100 Deep Learning Benchmarks | Lambda (5)
  • All numbers are normalized by the 32-bit training speed of 1x Tesla V100.
  • The chart shows, for example, that 32-bit training with 1x A100 is 3.39x faster than 32-bit training with a 1x V100; mixed precision training with 4x V100 is 7.97x faster than 32-bit training with 1x V100; and mixed precision training with 8x A100 is 42.60x faster than 32-bit training with 1x V100.
  • Results averaged across Transformer-XL base, Transformer-XL large, Tacotron 2, and BERT-base SQuAD.
  • For batch size info, see the raw data at our GPU benchmarking center.

View Lambda's Tesla A100 server

Benchmark software stack

  • Lambda's benchmark code is available at the GitHub repo here.
  • The Tesla A100 was benchmarked using NGC's PyTorch 20.10 docker image with Ubuntu 18.04, PyTorch 1.7.0a0+7036e91, CUDA 11.1.0, cuDNN 8.0.4, NVIDIA driver 460.27.04, and NVIDIA's optimized model implementations.
  • The Tesla V100 was benchmarked using NGC's PyTorch 20.01 docker image with Ubuntu 18.04, PyTorch 1.4.0a0+a5b4d78, CUDA 10.2.89, cuDNN 7.6.5, NVIDIA driver 440.33, and NVIDIA's optimized model implementations.
  • Benchmarks using the same software versions for the A100 and V100 coming soon!

Read On

A100 vs V100 Deep Learning Benchmarks | Lambda (2024)
Top Articles
Latest Posts
Article information

Author: Carlyn Walter

Last Updated:

Views: 6226

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.