Load test different models in the inference-server

We want to test the performance of different models within the inference server to understand how it scales with model size such as;
* [distilgpt2](https://huggingface.co/distilgpt2)
* [pythia-12B](https://huggingface.co/theblackcat102/pythia-12B-dedup-1000?text=My+name+is+Lewis+and+I+like+to)