tuneapi.apis package

Submodules

tuneapi.apis.model_anthropic module

Connect to the Anthropic API to use Claude series of LLMs

class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', api_token: str | None = None, extra_headers: Dict[str, str] | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)

This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)

This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the async function to chat with the model in a distributed manner

get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False) Tuple[List[Any] | Dict, str | None]

This is the blocking function to get the batch results

set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any

This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any

This is the async function to stream chat with the model where each token is iteratively generated

submit_batch(threads: List[Thread | str], model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Tuple[str, List[str]] | Dict

This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses

tuneapi.apis.model_gemini module

Connect to the Google Gemini API to their LLMs. See more Gemini.

class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-2.0-flash-exp', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) Any

This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) Any

This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the async function to chat with the model in a distributed manner

embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False) EmbeddingGen

This is the blocking function to get embeddings for the chat

async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[float, float] = (5.0, 60.0), raw: bool = False) EmbeddingGen

This is the async function to get embeddings for the chat

set_api_token(token: str) None

This are used to set the API token for the model

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)

This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs)

This is the async function to stream chat with the model where each token is iteratively generated

tuneapi.apis.model_gemini.get_structured_schema(model: type[BaseModel]) Dict[str, Any]

Converts a Pydantic BaseModel to a JSON schema compatible with Gemini API, including anyOf for optional or union types and handling nested structures correctly.

Parameters:

model – The Pydantic BaseModel class to convert.

Returns:

A dictionary representing the JSON schema.

tuneapi.apis.model_openai module

Connect to the OpenAI API and use their LLMs.

class tuneapi.apis.model_openai.Groq(id: str = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Groq’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.

id

Identifier for the Mistral model.

Type:

str

base_url

The base URL for the Mistral API. Defaults to “https://api.groq.com/openai/v1/chat/completions”.

Type:

str

extra_headers

Additional headers to include in API requests.

Type:

Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:

Optional[str]

Note

For more information, visit the Mistral API documentation at https://console.groq.com/

embedding(**k)

If you pass a list then returned items are in the insertion order

class tuneapi.apis.model_openai.Mistral(id: str = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Mistral’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.

id

Identifier for the Mistral model.

Type:

str

base_url

The base URL for the Mistral API. Defaults to “https://api.mistral.ai/v1/chat/completions”.

Type:

str

extra_headers

Additional headers to include in API requests.

Type:

Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:

Optional[str]

embedding(*a, **k)

Raises NotImplementedError as Mistral does not support embeddings.

Note

For more information, visit the Mistral API documentation at https://console.mistral.ai/

embedding(**k)

If you pass a list then returned items are in the insertion order

class tuneapi.apis.model_openai.Ollama(id: str, base_url: str = 'http://localhost:11434/v1/chat/completions', **kwargs)

Bases: OpenAIProtocol

class tuneapi.apis.model_openai.OpenAIProtocol(id: str, base_url: str, extra_headers: Dict[str, str] | None, api_token: str | None, emebdding_url: str | None, image_gen_url: str | None, audio_transcribe_url: str | None, audio_gen_url: str | None, batch_url: str | None, files_url: str | None)

Bases: ModelInterface

chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), parallel_tool_calls: bool = False, **kwargs) Any

This is the blocking function to block chat with the model

async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs) Any

This is the async function to block chat with the model

distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the blocking function to chat with the model in a distributed manner

async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)

This is the async function to chat with the model in a distributed manner

embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, raw: bool = False, extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60)) EmbeddingGen

If you pass a list then returned items are in the insertion order

async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (10, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None) EmbeddingGen

If you pass a list then returned items are in the insertion order

get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False)

This is the blocking function to get the batch results

image_gen(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen

This is the blocking function to generate images

async image_gen_async(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen

This is the async function to generate images

set_api_token(token: str) None

This are used to set the API token for the model

speech_to_text(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) Transcript

This is the blocking function to convert speech to text

async speech_to_text_async(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) Transcript

This is the async function to convert speech to text

stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, parallel_tool_calls: bool = False, **kwargs)

This is the blocking function to stream chat with the model where each token is iteratively generated

async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, timeout=(5, 60), **kwargs)

This is the async function to stream chat with the model where each token is iteratively generated

submit_batch(threads: List[Thread | str], model: str | None = None, endpoint: str = '/v1/chat/completions', max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), raw: bool = False, verbose: bool = False, **kwargs) Tuple[str, List[str]] | Dict

This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses

text_to_speech(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
async text_to_speech_async(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
class tuneapi.apis.model_openai.Openai(id: str = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None, image_gen_url: str | None = None, audio_transcribe: str | None = None, audio_gen_url: str | None = None, batch_url: str | None = None, files_url: str | None = None)

Bases: OpenAIProtocol

class tuneapi.apis.model_openai.TuneModel(id: str = 'meta/llama-3.1-8b-instruct', base_url: str = 'https://proxy.tune.app/chat/completions', org_id: str | None = None, extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)

Bases: OpenAIProtocol

A class to interact with Groq’s Large Language Models (LLMs) via their API.

id

Identifier for the Mistral model.

Type:

str

base_url

The base URL for the Mistral API. Defaults to “https://proxy.tune.app/chat/completions”.

Type:

str

org_id

Organization ID for the Tune API.

Type:

Optional[str]

extra_headers

Additional headers to include in API requests.

Type:

Optional[Dict[str, str]]

api_token

API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.

Type:

Optional[str]

Note

For more information, visit the Mistral API documentation at https://tune.app/

embedding(chats: Thread | List[str] | str, model: str = 'openai/text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None)

If you pass a list then returned items are in the insertion order

async embedding_async(chats: Thread | List[str] | str, model: str = 'openai/text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None)

If you pass a list then returned items are in the insertion order

tuneapi.apis.turbo module

tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs) List | Tuple[List, Usage]

Distributes multiple chat prompts across a thread pool for parallel processing.

This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.

Args:
model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model

instance to ensure thread safety.

prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these

prompts.

post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,

raw responses are stored. Function signature should be: f(chat_response) -> processed_response

max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and

system capabilities.

retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.

pbar (bool, default=True): Whether to display a progress bar.

debug (bool, default=False): Whether to log debug information.

usage (bool, default=False): Whether to return usage statistics. If True, the function will return a tuple of

(responses, usage) where usage is an instance of Usage.

time_metrics (bool, default=False): Whether to return time metrics. If True, the function will return a tuple of

(responses, time_metrics) where time_metrics is a list of time taken for each prompt.

Returns:
List[Any]: A list of responses or errors, maintaining the same order as input prompts.

Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.

Raises:

ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface

Example:
>>> from tuneapi import ta, tt
>>> model = ta.Gemini()
>>> prompts = [
...     tt.Thread([tt.human("What is 2+2?")]),
...     tt.Thread([tt.human("What is Python?")])
... ]
>>> responses = distributed_chat(model, prompts, max_threads=5)
>>> for prompt, response in zip(prompts, responses):
...     print(f"Q: {prompt}

A: {response} “)

Note:
  • Each worker thread gets its own model instance to prevent sharing state

  • Progress bar shows both initial processing and retries

  • The function maintains thread safety through message passing channels

async tuneapi.apis.turbo.distributed_chat_async(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs)

Module contents