US start-up Arthur claims its Bench tool has a range of scoring metrics to evaluate and compare large language models. With the surge of AI models launched this year, it seems inevitable that ...