The Smallest Loss Compute Can Buy

With Gaurav Sood, Chris Alexiuk The most expensive portion of model training today is GPU time. Given that, it is useful to ask what is the best way to spend the compute budget. More formally, the optimization problem is: minimize test loss given a FLOPs budget. To achieve the smallest loss, there are many different levers that we can pull, including, Amount of data. Number of parameters. There is an implicit trade-off between this and the previous point given a particular amount of compute. Optimization hyperparameters. For e.g., Learning rate, learning rate schedule, batch size, optimizer, etc. Model architecture Width-to-depth ratio. Deeper aspects of model architecture. For e.g., RETRO, MoE models like switch transformers, MoE with expert choice, etc. Precision in which the parameters and hyperparameters are stored. Data quality. As some of the recent work shows, data quality matters a lot. We could reformulate the optimization problem to make it more general. For instance, rather than use FLOPs or GPU time, we may want to use dollars. This opens up opportunities to think about how to purchase GPU time most cheaply, e.g., using spot GPUs. We can abstract out the optimization problem further. If we knew the ROI of the prediction task, we could ask what is the profit-maximizing loss given a constraint on latency. Inference ROI is a function of ~ accuracy (or another performance metric of choice) and the compute cost of inference. ...

August 15, 2023 · Atul Dhingra

ML (O)Ops! Improving and Deploying On-Device Models With Confidence (Part 1)

With Gaurav Sood It is well known that ML Engineers today spend most of their time doing things that do not have a lot to do with machine learning. They spend time working on technically unsophisticated but important things like deployment of models, keeping track of experiments, etc.—operations. Atul and I dive into the reasons behind the status quo and propose solutions, starting with issues to do with on-device deployments. ...

February 21, 2021 · Atul Dhingra