I don't see the point of codebase defining batch size at a per gpu level. This means need to change the batch size param manually when up/down scaling experiment. I guess historically done in codebase that don't have grad acc ?
1,55K