flexstack.ai
  • Welcome to Flexstack AI
  • How Flexstack AI works
    • Three roles in Flexstack AI
    • AI Stack architecture
    • Models Directory
    • Open Source AI Demo
      • Image generation
      • LLM (Text completion)
      • Video generation
  • Flexstack AI API: Start making things with Flexstack AI
    • Environment setup
    • Restful APIs
      • User Endpoints
        • Login
        • Refresh Token
        • User Profile
        • Task History
      • LLMs
        • Models
        • Text Completion
      • Image Generation
        • Models
        • LoRA
          • List Types
          • List Categories
          • List Models
        • Create Image
        • Get Result
      • Video Generation
        • Models
        • Create video
        • Get Result
      • Audio Generation
        • Models
        • Music / Sound Effects Generation
          • Create audio
          • Get Result
        • Speech Generation
          • Create audio
          • Get Result
      • Text Embeddings
        • Models
        • Create embedding
        • Get Result
      • Feedback & Retrain model
        • Train LORA
        • Feedback
        • Feedback Request
      • Error Handling
        • Error Response
  • Flexstack AI Host: Start contributing
    • Prerequisites
    • Deployment Guideline
      • RunPod
      • VALDI
  • Flexstack AI Validator
    • LLM Validation
      • Methodology
      • Restful APIs
  • Additional Information
    • Technical support
Powered by GitBook
On this page

Was this helpful?

  1. Flexstack AI API: Start making things with Flexstack AI
  2. Restful APIs
  3. LLMs

Text Completion

HTTP request

POST /ai/text_completion?stream={stream}

Authorization

Include your ACCESS TOKEN in HTTP Authorization header

Authorization: Bearer Token

Request Parameters

  • Query Parameters

KEY
TYPE
VALUE

stream

Boolean

If the stream is set to true, the result content will be sent back as streaming. Conversely, if it is false, it will return the result after fully completing the content. The default value is False.

  • JSON Body

KEY
TYPE
VALUE

messages

List

This is a list that contains the messages intended for the text generation. Each message within the list is represented as a small JSON object specifying the role (such as "user", "assistant") and the actual content of the message (e.g., "Hello!"). This setup helps the AI understand the context and the kind of interaction that's taking place. Example: [{"role": "user", "content": "Hello!"}]}.

configs

JSON

This parameter is a JSON object encompassing a variety of settings you can adjust to customize the text generation process. It includes several parameters, which we will describe next, allowing you to control different aspects of the generation.

model

String

Specifies the AI model used for generating the video. Default value is “gemma-7b”.

temperature

Float

The temperature setting controls the randomness or diversity of the generation process. A higher temperature value encourages the model to explore a wider range of possibilities, making the output more varied and sometimes more creative. Common range: 0-1. Default value is 0.7.

top_k

Integer

This integer value determines the sampling strategy by limiting the selection to the k most likely next tokens at each step of the generation. A lower value of k results in the model focusing more on the higher probability tokens, often leading to more predictable and coherent outcomes.

Common range: 1 to 1000. Default value is 50.

top_p

Float

The top_p controls the breadth of token selection based on cumulative probability. Setting a lower top_p value means the model will sample from a smaller, more likely set of tokens, which can help in maintaining the relevance and quality of the content generated. Common range: 0 - 1. Default value is 0.5.

max_tokens

Integer

This parameter sets the maximum number of tokens the model can generate. It acts as a cap, ensuring that the generation process does not exceed a certain length, which is crucial for keeping the content focused and within desired constraints. The maximum limit is 4096. Default value is 512.

User Guide

  1. Craft Your Input: Gather your messages including who's speaking and what's said, e.g., {"role": "user", "content": "What's the weather like?"}. Keep messages clear and relevant.

  2. Choose a Model: Pick an AI model, like "gemma-7b". The model influences and style of replies.

  3. Set Your Parameters

  • temperature: Affects creativity. Higher for more varied responses.

  • top_k and top_p: Controls response diversity. Lower numbers for more focused answers.

  • max_tokens: Sets the maximum length of replies. Keep it practical for chatbot interactions.

  1. Fine-tune for Quality: Experiment with temperature, top_k, and top_p to find the sweet spot between creativity and relevance.

  2. Limit Response Length: Use max_tokens to ensure responses are concise and to the point.

  3. Evaluate and Adjust: Review the generated responses. If they don't meet your needs, tweak the input or settings.

Example Request

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, have a good day!"
    }
  ],
  "configs": {
    "model": "gemma-7b",
    "max_tokens": 1024,
    "top_k": 10,
    "top_p": 0.9,
    "temperature": 0
  }
}
  • Response streaming with Python request

# Parrot API Endpoint
url = "https://api.joinparrot.ai/v1/ai/text_completion?stream=true"

# Setting configs
configs = dict(
    model=model, temperature=temperature, top_k=top_k, top_p=top_p, max_new_tokens=max_tokens
)

# Payload
payload = {"messages": messages, "configs": configs}

# Headers
headers = {"Authorization" :  "Bearer " + "YOUR_TOKEN_HERE"}
    
# Get response with streaming
with requests.post(url, json=payload, stream=True, headers=headers) as response:
    response.raise_for_status()
    for line in response.iter_lines():
        if line:
            print(line.decode())

Parrot API

  • Streaming

# Initialize the text generation process with the Parrot api.
generator = parrot.generate_text_stream(messages, model, top_k, top_p, temperature, max_tokens)

# Iterate over the generator object to fetch generated text.
for data in generator:
    print(data.decode().strip())
  • Not Streaming

response = parrot.text_generation(messages, model, top_k, top_p, temperature, max_tokens)

Response

Returns the result of task (only when stream=false)

{
    "data": {
        "is_success": true,
        "data": {
            "task_id": "cf86cbda217c481f8dbc9fb24b7e79e0",
            "total_tasks": 1,
            "percent": 100,
            "status": "COMPLETED",
            "response": "Hello, and thank you for stopping by! I hope you have a good day too!\n\nWould you like me to tell you what I can do today? I'm a large language model, and I'm here to help you with a variety of tasks."
        },
        "configs": {
            "model": "gemma-7b",
            "max_new_tokens": 2048,
            "top_k": 10,
            "temperature": 0.0,
            "task_type": "LLM-GEMMA-7B",
            "queue_name": "llm_gemma_7b_queue",
            "messages": [
                {
                    "role": "user",
                    "content": "Hello, have a good day!"
                }
            ]
        }
    },
    "errors": [],
    "error_description": "",
    "start_time": "2024-03-05 21:24:55.572492",
    "end_time": "2024-03-05 21:25:00.601529",
    "host_of_client_call_request": "103.186.100.36",
    "total_time_by_second": 5.029042,
    "status": "success"
}
PreviousModelsNextImage Generation

Last updated 1 year ago

Was this helpful?