Skip to content

LiteLLM Hexagon

Description

The LiteLLM Hexagon processor incorporates the LiteLLM gateway to allow invoking external large language models and other AI models from within a workflow.

Input

A LiteLLM Hexagon block has these input variables:

  • api_base (string): the base URL—endpoint—of the API to send requests to. If set, this input variable overrides the value of the API base url functional property.
    If input variable model is set to a fully qualified model name, for example openai/gpt-4o, this variable is not needed because the API endpoint is automatically inferred, so it must be used when:

    • Using Azure OpenAI custom endpoints
    • Pointing to a self-hosted model
    • Using a proxy or gateway
    • Connecting to any non-standard OpenAI-compatible API
  • api_key (string): authentication token used to access the model provider's API.

  • base64_files (array): one or more Base64-encoded files that are sent to the model. Used with image models, multimodal LLMs, audio transcription models, etc.
  • documents (array): used to provide textual context, RAG-style extra passages, custom instructions, reference material. Can be made of plain text, Markdown, JSON, etc.
  • messages (array): input for the model in OpenAI standard role-content pairs. For example:

    [
        {
            "role":  "system",
            "content": "You are an expert in physics"
        },
        {
            "role": "user",
            "content": "Explain quantum tunneling."
        }
    ]
    

    This input variable is incompatible with multi_content_messages.
    If this variable is set, input variable user_prompt and functional property User prompt are ignored.
    If a pair has role set to system, it acts as the system prompt, overriding input variable system_prompt and functional property System prompt.

  • model (string): the name of the model, for example openai/gpt-4o. If set, this input variable overrides the value of the Model name functional property.

  • multi_content_messages (array): to use to send inputs like images or PDFs.

    This input variable is incompatible with messages.
    If any item in the array contains text, input variable user_prompt and functional property User prompt are ignored.
    If an item in the array has role set to system, it acts as the system prompt, overriding input variable system_prompt and functional property System prompt.

  • params (array): for parameters like temperature, Top-P, Top-K, etc. Every item of the array is an object with two properties:

    • name: the name of the parameter
    • value: the value of the parameter, whose type—string, integer, boolean—varies based on the parameter
  • system_prompt (string): the instruction that defines the model role, behavior and overall tone. If set, this input variable overrides the value of the System prompt functional property.

  • text (string): used to provide textual context, as for the documents input variable, but in this case is just one value.
  • user_prompt (string): the main instruction for the model. If set, this input variable overrides the value of the User prompt functional property.
  • max_continuation_requests (integer): maximum number of continuations to get the full response. To use for models that can give partial responses due to a limit in the number of output tokens. If set, this input variable overrides the value of the Max response continuation requests functional property.

No input variable is mandatory, but the block will return an error if:

  • No input variables provides a user prompt and functional property User prompt is not set.
  • Neither input variable model nor functional property Model name are set.
  • Input variable api_key is not set and authentication/authorization information for the model has not been set in the runtime.

Block properties

Block properties can be set by editing the block.
LiteLLM Hexagon workflow blocks have the following properties:

  • Basic properties:

    • Block name, it can be edited
    • Component version (read only)
    • Block ID (read only)
  • Functional:

    • Model name: the name of the model, for example openai/gpt-4o. It is overridden by the value of the model input variable, when set.
    • API base url: the base URL—endpoint—of the API to send requests to. It is overridden by the value of the api_base input variable, when set.
    • System prompt: the instruction that defines the model role, behavior and overall tone. It is overridden by the value of the system_prompt input variable, when set.
    • User prompt: the main instruction for the model. It is overridden by the value of the user_prompt input variable, when set.
    • Number of max. attempts when retrying LLM call: maximum number of retries when calling the model fails.
    • Delay between each retry: how much time the block waits between retries, see the Number of max. attempts when retrying LLM call functional property.
    • Max response continuation requests: maximum number of continuations to get the full response. It is overridden by the value of the max_continuation_requests input variable, when set.
    • Enable result caching: when enabled, any new request-response pair is stored in block's memory and when another request matches a stored request, the cached response is returned instead of invoking the model. The maximum number of request-response pairs that are stores is determined by the Cache size functional parameter.
    • Cache size: maximum number of request-response pairs that are cached when the Enable result caching functional property is enabled. After this limit is reached, every time a new request doesn't match the cache the oldest cached pair is replaced with the new one, so to recycle the cache.
    • Use JSON Schema: when enabled, the model is asked to produce a JSON output that conforms to the schema specified by the Output JSON Schema functional property.
    • Output JSON Schema: output schema the model must conform to when functional property Use JSON Schema is enabled.
  • Deployment:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
    • Replicas: number of required instances.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
    • Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
  • Input: input properties correspond to the input variables of the component (see above).

  • Output: read-only, the output manifest of the component.

Output

The output of a LiteLMM Hexagon block is a JSON object with the following structure:

{
    "message": (string) can be Success or an error message,
    "model_name": (string) Detailed name of the model,
    "output" (string) Response of the model,
    "status_code": (string) The HTTP status code of the call to the model API
    "usage: {
        "completion_tokens": (integer) Number of tokens consumed for chat completion,
        "prompt_tokens": (integer) Number of tokens consumed for the prompts,
        "total_tokens":  (integer) Total tokens consumed
    }
}