Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models
Jie Ye,
Jaime Cernuda,
Neeraj Rajesh,
Keith Bateman,
Orcun Yildiz,
Tom Peterka,
Arnur Nigmetov,
Dmitriy Morozov,
Xian-He Sun,
Anthony Kougkas,
Bogdan Nicolae.
| Proceedings of the International Conference on Parallel Processing (ICPP), pages 812-821, 2024. |
Abstract
Scientific workflows increasingly need to train a DNN model in real-time during an
experiment (e.g. using ground truth from a simulation), while using it at the same time
for inferences. Instead of sharing the same model instance, the training (producer) and
inference server (consumer) often use different model replicas that are kept
synchronized. In addition to efficient I/O techniques to keep the model replica of the
producer and consumer synchronized, there is another important trade-off: frequent model
updates enhance inference quality but may slow down training; infrequent updates may
lead to less precise inference results. To address these challenges, we introduce Viper:
a new I/O framework designed to determine a near-optimal checkpoint schedule and
accelerate the delivery of the latest model updates. Viper builds an inference
performance predictor to identify the optimal checkpoint schedule to balance the trade-
off between training slowdown and inference quality improvement. It also creates a
memory-first model transfer engine to accelerate model delivery through direct memory-
to-memory communication. Our experiments show that Viper can reduce the model update
latency by ≈ 9x using the GPU-to-GPU data transfer engine and ≈ 3x using the DRAM-to-
DRAM host data transfer. The checkpoint schedule obtained from Viper’s predictor also
demonstrates improved cumulative inference accuracy compared to the baseline of epoch-
based solutions.