Methodology
Last updated
Was this helpful?
Last updated
Was this helpful?
Each node within the validation framework is assigned a random batch of data for evaluation. This process is executed in parallel, employing four distinct algorithms: Cosine Similarity, LLM Evaluation, SAFE, and Human Feedback. The assessments from these methods are then aggregated using a weighted average approach to yield a comprehensive performance score for each LLM under evaluation. This procedure ensures a balanced evaluation, reflecting both automated metrics and human judgment, thereby capturing a holistic view of the LLM's performance.
The evaluation of the Large Language Models (LLMs) "Mistral-7b-v1.0" and "Gemma-7b-it" was conducted using a flexstack.ai validation framework comprising four distinct methodologies: Cosine Similarity, GPT-3.5 Evaluation, SAFE methodology, and Human Feedback. Each method was weighted equally at 0.25 to ensure a balanced assessment of the models' performance across various dimensions.