We want to test the performance of different models within the inference server to understand how it scales with model size such as; * [distilgpt2](https://huggingface.co/distilgpt2) * [pythia-12B](https://huggingface.co/theblackcat102/pythia-12B-dedup-1000?text=My+name+is+Lewis+and+I+like+to)