flexstack.ai
  • Welcome to Flexstack AI
  • How Flexstack AI works
    • Three roles in Flexstack AI
    • AI Stack architecture
    • Models Directory
    • Open Source AI Demo
      • Image generation
      • LLM (Text completion)
      • Video generation
  • Flexstack AI API: Start making things with Flexstack AI
    • Environment setup
    • Restful APIs
      • User Endpoints
        • Login
        • Refresh Token
        • User Profile
        • Task History
      • LLMs
        • Models
        • Text Completion
      • Image Generation
        • Models
        • LoRA
          • List Types
          • List Categories
          • List Models
        • Create Image
        • Get Result
      • Video Generation
        • Models
        • Create video
        • Get Result
      • Audio Generation
        • Models
        • Music / Sound Effects Generation
          • Create audio
          • Get Result
        • Speech Generation
          • Create audio
          • Get Result
      • Text Embeddings
        • Models
        • Create embedding
        • Get Result
      • Feedback & Retrain model
        • Train LORA
        • Feedback
        • Feedback Request
      • Error Handling
        • Error Response
  • Flexstack AI Host: Start contributing
    • Prerequisites
    • Deployment Guideline
      • RunPod
      • VALDI
  • Flexstack AI Validator
    • LLM Validation
      • Methodology
      • Restful APIs
  • Additional Information
    • Technical support
Powered by GitBook
On this page
  • Procedure
  • Results

Was this helpful?

  1. Flexstack AI Validator
  2. LLM Validation

Methodology

PreviousLLM ValidationNextRestful APIs

Last updated 1 year ago

Was this helpful?

Procedure

Each node within the validation framework is assigned a random batch of data for evaluation. This process is executed in parallel, employing four distinct algorithms: Cosine Similarity, LLM Evaluation, SAFE, and Human Feedback. The assessments from these methods are then aggregated using a weighted average approach to yield a comprehensive performance score for each LLM under evaluation. This procedure ensures a balanced evaluation, reflecting both automated metrics and human judgment, thereby capturing a holistic view of the LLM's performance.

Results

The evaluation of the Large Language Models (LLMs) "Mistral-7b-v1.0" and "Gemma-7b-it" was conducted using a flexstack.ai validation framework comprising four distinct methodologies: Cosine Similarity, GPT-3.5 Evaluation, SAFE methodology, and Human Feedback. Each method was weighted equally at 0.25 to ensure a balanced assessment of the models' performance across various dimensions.

Comparative Evaluation Scores of LLMs: Mistral-7b-v1.0 vs. Gemma-7b-it