We need to integrate JudgeArena to oellm-eval to be able to run LLM judge evaluations easily for our instructions tuned models. Possible step recommended: - [ ] look at PR which integrated evalchemy backend https://github.com/OpenEuroLLM/oellm-eval/pull/56 - [ ] iterate until a single task work in local mode - [ ] test on LUMI
We need to integrate JudgeArena to oellm-eval to be able to run LLM judge evaluations easily for our instructions tuned models.
Possible step recommended: