flexstack.ai
  • Welcome to Flexstack AI
  • How Flexstack AI works
    • Three roles in Flexstack AI
    • AI Stack architecture
    • Models Directory
    • Open Source AI Demo
      • Image generation
      • LLM (Text completion)
      • Video generation
  • Flexstack AI API: Start making things with Flexstack AI
    • Environment setup
    • Restful APIs
      • User Endpoints
        • Login
        • Refresh Token
        • User Profile
        • Task History
      • LLMs
        • Models
        • Text Completion
      • Image Generation
        • Models
        • LoRA
          • List Types
          • List Categories
          • List Models
        • Create Image
        • Get Result
      • Video Generation
        • Models
        • Create video
        • Get Result
      • Audio Generation
        • Models
        • Music / Sound Effects Generation
          • Create audio
          • Get Result
        • Speech Generation
          • Create audio
          • Get Result
      • Text Embeddings
        • Models
        • Create embedding
        • Get Result
      • Feedback & Retrain model
        • Train LORA
        • Feedback
        • Feedback Request
      • Error Handling
        • Error Response
  • Flexstack AI Host: Start contributing
    • Prerequisites
    • Deployment Guideline
      • RunPod
      • VALDI
  • Flexstack AI Validator
    • LLM Validation
      • Methodology
      • Restful APIs
  • Additional Information
    • Technical support
Powered by GitBook
On this page
  • Large Language Models (LLMs)
  • Text Embeddings
  • Image Generation Models
  • Audio Generation Models
  • Video Generation Models

Was this helpful?

  1. How Flexstack AI works

Models Directory

Welcome to the Open Source Models Directory! This document provides a comprehensive list of open-source models for image generation, audio generation, video generation, and Large Language Models (LLMs) supported by FlexStack.

Large Language Models (LLMs)

Model Name
Description
Document

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.

Text Embeddings

Model Name
Description
Document

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.

Image Generation Models

Model Name
Description
Document

With Stable Diffusion XL you can now make more realistic images with improved face generation, produce legible text within images, and create more aesthetically pleasing art using shorter prompts.

Audio Generation Models

Model Name
Description
Document

AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio). Internally, AudioGen operates over discrete representations learnt from the raw waveform, using an EnCodec tokenizer.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Video Generation Models

Model Name
Description
Document

The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.

PreviousAI Stack architectureNextOpen Source AI Demo

Last updated 1 year ago

Was this helpful?

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the models. Developed by Google DeepMind and other teams across Google, Gemma is named after the Latin gemma, meaning "precious stone".

The GTE (General Text Embedding) models, crafted by , are advanced text embedding models featuring a multi-stage contrastive learning approach. They're trained using a diverse mixture of datasets from multiple sources, including web pages, academic papers, social media, and code repositories. This model is particularly noted for its performance in a range of NLP and code-related tasks despite its modest parameter count of 110M.

The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve .

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. For more information, please refer to our research paper: . We open-source the model as part of the research.

AudioGen was presented at

The MusicGen model was proposed in the paper .

Bark is a transformer-based text-to-audio model created by . Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.

Gemma-7B-IT
Gemini
Link
Mixtral-7B
Link
GTE-Large
Alibaba DAMO Academy
Link
Mistral-Embedding
Link
Stable Diffusion 1.5
Stable-Diffusion-v1-2
classifier-free guidance sampling
Link
Stable Diffusion XL
Link
Stable Diffusion XL-Lightning
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Link
AudioGen
AudioGen: Textually Guided Audio Generation
Link
MusicGen
Simple and Controllable Music Generation
Link
Suno/Bark
Suno
Link
Damo Video Synthesis
Link