tuneapi.apis package

Submodules

tuneapi.apis.model_anthropic module

Connect to the Anthropic API to use Claude series of LLMs

class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', api_token: str | None = None, extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs): This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs): This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the async function to chat with the model in a distributed manner

get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False) → Tuple[List[Any] | Dict, str | None]: This is the blocking function to get the batch results

set_api_token(token: str) → None: This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) → Any: This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) → Any: This is the async function to stream chat with the model where each token is iteratively generated

submit_batch(threads: List[Thread | str], model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) → Tuple[str, List[str]] | Dict: This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses

tuneapi.apis.model_gemini module

Connect to the Google Gemini API to their LLMs. See more Gemini.

class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-2.0-flash-exp', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) → Any: This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) → Any: This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the async function to chat with the model in a distributed manner

embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False) → EmbeddingGen: This is the blocking function to get embeddings for the chat

async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[float, float] = (5.0, 60.0), raw: bool = False) → EmbeddingGen: This is the async function to get embeddings for the chat

set_api_token(token: str) → None: This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs): This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs): This is the async function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_gemini.get_structured_schema(model: type[BaseModel]) → Dict[str, Any]

Converts a Pydantic BaseModel to a JSON schema compatible with Gemini API, including anyOf for optional or union types and handling nested structures correctly.

Parameters:: model – The Pydantic BaseModel class to convert.
Returns:: A dictionary representing the JSON schema.

tuneapi.apis.model_openai module

Connect to the OpenAI API and use their LLMs.

class tuneapi.apis.model_openai.Groq(id: str = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Groq’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.

id

Identifier for the Mistral model.

Type:: str

base_url

The base URL for the Mistral API. Defaults to “https://api.groq.com/openai/v1/chat/completions”.

Type:: str

extra_headers

Additional headers to include in API requests.

Type:: Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:: Optional[str]

Note

For more information, visit the Mistral API documentation at https://console.groq.com/

embedding(**k): If you pass a list then returned items are in the insertion order

class tuneapi.apis.model_openai.Mistral(id: str = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Mistral’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.

id

Identifier for the Mistral model.

Type:: str

base_url

The base URL for the Mistral API. Defaults to “https://api.mistral.ai/v1/chat/completions”.

Type:: str

extra_headers

Additional headers to include in API requests.

Type:: Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:: Optional[str]

embedding(*a, **k): Raises NotImplementedError as Mistral does not support embeddings.

Note

For more information, visit the Mistral API documentation at https://console.mistral.ai/

embedding(**k): If you pass a list then returned items are in the insertion order

class tuneapi.apis.model_openai.Ollama(id: str, base_url: str = 'http://localhost:11434/v1/chat/completions', **kwargs): Bases: OpenAIProtocol

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), parallel_tool_calls: bool = False, **kwargs) → Any: This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs) → Any: This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs): This is the async function to chat with the model in a distributed manner

embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, raw: bool = False, extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60)) → EmbeddingGen: If you pass a list then returned items are in the insertion order

async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (10, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None) → EmbeddingGen: If you pass a list then returned items are in the insertion order

get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False): This is the blocking function to get the batch results

image_gen(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) → ImageGen: This is the blocking function to generate images

async image_gen_async(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) → ImageGen: This is the async function to generate images

set_api_token(token: str) → None: This are used to set the API token for the model

speech_to_text(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) → Transcript: This is the blocking function to convert speech to text

async speech_to_text_async(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) → Transcript: This is the async function to convert speech to text

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, parallel_tool_calls: bool = False, **kwargs): This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, timeout=(5, 60), **kwargs): This is the async function to stream chat with the model where each token is iteratively generated

submit_batch(threads: List[Thread | str], model: str | None = None, endpoint: str = '/v1/chat/completions', max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), raw: bool = False, verbose: bool = False, **kwargs) → Tuple[str, List[str]] | Dict: This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses

text_to_speech(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) → bytes

async text_to_speech_async(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) → bytes

class tuneapi.apis.model_openai.Openai(id: str = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None, image_gen_url: str | None = None, audio_transcribe: str | None = None, audio_gen_url: str | None = None, batch_url: str | None = None, files_url: str | None = None): Bases: OpenAIProtocol

class tuneapi.apis.model_openai.TuneModel(id: str = 'meta/llama-3.1-8b-instruct', base_url: str = 'https://proxy.tune.app/chat/completions', org_id: str | None = None, extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Groq’s Large Language Models (LLMs) via their API.

id

Identifier for the Mistral model.

Type:: str

base_url

The base URL for the Mistral API. Defaults to “https://proxy.tune.app/chat/completions”.

Type:: str

org_id

Organization ID for the Tune API.

Type:: Optional[str]

extra_headers

Additional headers to include in API requests.

Type:: Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:: Optional[str]

Note

For more information, visit the Mistral API documentation at https://tune.app/

embedding(chats: Thread | List[str] | str, model: str = 'openai/text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None): If you pass a list then returned items are in the insertion order

async embedding_async(chats: Thread | List[str] | str, model: str = 'openai/text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None): If you pass a list then returned items are in the insertion order

tuneapi.apis.turbo module

tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs) → List | Tuple[List, Usage]

Distributes multiple chat prompts across a thread pool for parallel processing.

This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.
Args:

model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model
instance to ensure thread safety.

prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these
prompts.

post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,
raw responses are stored. Function signature should be: f(chat_response) -> processed_response

max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and
system capabilities.

retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.

pbar (bool, default=True): Whether to display a progress bar.

debug (bool, default=False): Whether to log debug information.

usage (bool, default=False): Whether to return usage statistics. If True, the function will return a tuple of
(responses, usage) where usage is an instance of Usage.

time_metrics (bool, default=False): Whether to return time metrics. If True, the function will return a tuple of
(responses, time_metrics) where time_metrics is a list of time taken for each prompt.

Returns:

List[Any]: A list of responses or errors, maintaining the same order as input prompts.
Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.

Raises:
ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface

Example:
>>> from tuneapi import ta, tt
>>> model = ta.Gemini()
>>> prompts = [
...     tt.Thread([tt.human("What is 2+2?")]),
...     tt.Thread([tt.human("What is Python?")])
... ]
>>> responses = distributed_chat(model, prompts, max_threads=5)
>>> for prompt, response in zip(prompts, responses):
...     print(f"Q: {prompt}

A: {response} “)

Note:

Each worker thread gets its own model instance to prevent sharing state

Progress bar shows both initial processing and retries

The function maintains thread safety through message passing channels

async tuneapi.apis.turbo.distributed_chat_async(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs)

tuneapi.apis package

Submodules

tuneapi.apis.model_anthropic module

tuneapi.apis.model_gemini module

tuneapi.apis.model_openai module

tuneapi.apis.turbo module

Module contents