Completion API¶
completion(model, messages, temperature=None, max_tokens=None, stream=False, tools=None, tool_choice=None, response_format=None, num_retries=3, retry_strategy='exponential_backoff_retry', cache=None, api_key=None, api_base=None, timeout=600.0, **kwargs)
¶
Make a completion request to an LLM provider.
Compatible with litellm.completion() API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model name in format "provider/model-name" or just "model-name" |
required |
messages
|
list[Dict[str, Any]]
|
List of message dicts with "role" and "content" |
required |
temperature
|
Optional[float]
|
Sampling temperature (0-2) |
None
|
max_tokens
|
Optional[int]
|
Maximum tokens to generate |
None
|
stream
|
bool
|
Whether to stream the response |
False
|
tools
|
Optional[list[Tool]]
|
List of tool/function definitions |
None
|
tool_choice
|
Optional[Union[str, Dict[str, Any]]]
|
How to choose tools ("auto", "required", or specific tool) |
None
|
response_format
|
Optional[ResponseFormat]
|
Response format (dict or Pydantic model) |
None
|
num_retries
|
int
|
Number of retries on rate limit or timeout |
3
|
retry_strategy
|
str
|
Retry strategy (currently only "exponential_backoff_retry") |
'exponential_backoff_retry'
|
cache
|
Optional[Dict[str, Any]]
|
Cache control dict (for compatibility, not used by ullm) |
None
|
api_key
|
Optional[str]
|
API key (if not in environment) |
None
|
api_base
|
Optional[str]
|
API base URL (if not default) |
None
|
timeout
|
float
|
Request timeout in seconds |
600.0
|
**kwargs
|
Any
|
Additional provider-specific parameters |
{}
|
Returns:
| Type | Description |
|---|---|
Union[ModelResponse, Iterator[StreamChunk]]
|
ModelResponse or Iterator[StreamChunk] if streaming |
Raises:
| Type | Description |
|---|---|
AuthenticationError
|
On authentication failure |
BadRequestError
|
On invalid request |
RateLimitError
|
On rate limit exceeded |
Timeout
|
On request timeout |
APIError
|
On other API errors |
Source code in ullm/main.py
acompletion(model, messages, temperature=None, max_tokens=None, stream=False, tools=None, tool_choice=None, response_format=None, num_retries=3, retry_strategy='exponential_backoff_retry', cache=None, api_key=None, api_base=None, timeout=600.0, **kwargs)
async
¶
Make an async completion request to an LLM provider.
Compatible with litellm.acompletion() API.
Returns:
| Type | Description |
|---|---|
Union[ModelResponse, AsyncIterator[StreamChunk]]
|
ModelResponse or AsyncIterator[StreamChunk] if streaming |
Source code in ullm/main.py
(Full API reference coming soon)