Steps to reproduce:
1. Have a cluster with a hetergenous distribution of cores across nodes (some with 24 cores, some with 8)
2. Process things at scale
3. Note that the lower powered workers are frequently capped out, where the higher powered ones have lots of room
This is cause by the dispatcher blindly assuming that the sum of the current jobs is the best heuristic to dispatch by - sort by current load, then dispatch to the lowest. Instead we should be calculating the ratio of current utilization (current/max load) at dispatch time, and dispatching to the lowest current ratio.