tuneapi.apis package
Submodules
tuneapi.apis.model_anthropic module
Connect to the Anthropic API to use Claude series of LLMs
- class tuneapi.apis.model_anthropic.Anthropic(id: str | None = 'claude-3-haiku-20240307', base_url: str = 'https://api.anthropic.com/v1/messages', api_token: str | None = None, extra_headers: Dict[str, str] | None = None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, **kwargs)
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the async function to chat with the model in a distributed manner
- get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False) Tuple[List[Any] | Dict, str | None]
This is the blocking function to get the batch results
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Any
This is the async function to stream chat with the model where each token is iteratively generated
- submit_batch(threads: List[Thread | str], model: str | None = None, max_tokens: int = 4096, temperature: float | None = None, token: str | None = None, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 30), raw: bool = False, **kwargs) Tuple[str, List[str]] | Dict
This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses
tuneapi.apis.model_gemini module
Connect to the Google Gemini API to their LLMs. See more Gemini.
- class tuneapi.apis.model_gemini.Gemini(id: str | None = 'gemini-2.0-flash-exp', base_url: str = 'https://generativelanguage.googleapis.com/v1beta/models/{id}:{rpc}', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[int, int] = (5, 60), raw: bool = False) EmbeddingGen
This is the blocking function to get embeddings for the chat
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-004', extra_headers: Dict[str, str] | None = None, token: str | None = None, timeout: Tuple[float, float] = (5.0, 60.0), raw: bool = False) EmbeddingGen
This is the async function to get embeddings for the chat
- set_api_token(token: str) None
This are used to set the API token for the model
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = 4096, temperature: float = 1, token: str | None = None, raw: bool = False, debug: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- tuneapi.apis.model_gemini.get_structured_schema(model: type[BaseModel]) Dict[str, Any]
Converts a Pydantic BaseModel to a JSON schema compatible with Gemini API, including anyOf for optional or union types and handling nested structures correctly.
- Parameters:
model – The Pydantic BaseModel class to convert.
- Returns:
A dictionary representing the JSON schema.
tuneapi.apis.model_openai module
Connect to the OpenAI API and use their LLMs.
- class tuneapi.apis.model_openai.Groq(id: str = 'llama3-70b-8192', base_url: str = 'https://api.groq.com/openai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)
Bases:
OpenAIProtocolA class to interact with Groq’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.groq.com/openai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://console.groq.com/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.Mistral(id: str = 'mistral-small-latest', base_url: str = 'https://api.mistral.ai/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)
Bases:
OpenAIProtocolA class to interact with Mistral’s Large Language Models (LLMs) via their API. Note this class does not contain the embedding method.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://api.mistral.ai/v1/chat/completions”.
- Type:
str
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
- embedding(*a, **k)
Raises NotImplementedError as Mistral does not support embeddings.
Note
For more information, visit the Mistral API documentation at https://console.mistral.ai/
- embedding(**k)
If you pass a list then returned items are in the insertion order
- class tuneapi.apis.model_openai.Ollama(id: str, base_url: str = 'http://localhost:11434/v1/chat/completions', **kwargs)
Bases:
OpenAIProtocol
- class tuneapi.apis.model_openai.OpenAIProtocol(id: str, base_url: str, extra_headers: Dict[str, str] | None, api_token: str | None, emebdding_url: str | None, image_gen_url: str | None, audio_transcribe_url: str | None, audio_gen_url: str | None, batch_url: str | None, files_url: str | None)
Bases:
ModelInterface- chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), parallel_tool_calls: bool = False, **kwargs) Any
This is the blocking function to block chat with the model
- async chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, timeout=(5, 60), **kwargs) Any
This is the async function to block chat with the model
- distributed_chat(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the blocking function to chat with the model in a distributed manner
- async distributed_chat_async(prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar: bool = True, debug: bool = False, time_metrics: bool = False, **kwargs)
This is the async function to chat with the model in a distributed manner
- embedding(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, raw: bool = False, extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60)) EmbeddingGen
If you pass a list then returned items are in the insertion order
- async embedding_async(chats: Thread | List[str] | str, model: str = 'text-embedding-3-small', token: str | None = None, timeout: Tuple[int, int] = (10, 60), raw: bool = False, extra_headers: Dict[str, str] | None = None) EmbeddingGen
If you pass a list then returned items are in the insertion order
- get_batch(batch_id: str, custom_ids: List[str] | None = None, usage: bool = False, token: str | None = None, raw: bool = False, verbose: bool = False)
This is the blocking function to get the batch results
- image_gen(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen
This is the blocking function to generate images
- async image_gen_async(prompt: str, style: str = 'natural', model: str = 'dall-e-3', n: int = 1, size: str = '1024x1024', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) ImageGen
This is the async function to generate images
- set_api_token(token: str) None
This are used to set the API token for the model
- speech_to_text(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) Transcript
This is the blocking function to convert speech to text
- async speech_to_text_async(prompt: str, audio: str, model='whisper-1', timestamp_granularities=['segment'], token: str | None = None, timeout: Tuple[int, int] = (5, 300), **kwargs) Transcript
This is the async function to convert speech to text
- stream_chat(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, token: str | None = None, timeout=(5, 60), usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, parallel_tool_calls: bool = False, **kwargs)
This is the blocking function to stream chat with the model where each token is iteratively generated
- async stream_chat_async(chats: Thread | str, model: str | None = None, max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, raw: bool = False, timeout=(5, 60), **kwargs)
This is the async function to stream chat with the model where each token is iteratively generated
- submit_batch(threads: List[Thread | str], model: str | None = None, endpoint: str = '/v1/chat/completions', max_tokens: int = None, temperature: float = 1, parallel_tool_calls: bool = False, token: str | None = None, usage: bool = False, extra_headers: Dict[str, str] | None = None, debug: bool = False, timeout=(5, 60), raw: bool = False, verbose: bool = False, **kwargs) Tuple[str, List[str]] | Dict
This is the blocking function to submit a batch of threads. It will return the batch_id and custom_ids for ordering the responses
- text_to_speech(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- async text_to_speech_async(prompt: str, voice: str = 'shimmer', model='tts-1', response_format='wav', extra_headers: Dict[str, str] | None = None, timeout: Tuple[int, int] = (5, 60), **kwargs) bytes
- class tuneapi.apis.model_openai.Openai(id: str = 'gpt-4o', base_url: str = 'https://api.openai.com/v1/chat/completions', extra_headers: Dict[str, str] | None = None, api_token: str | None = None, emebdding_url: str | None = None, image_gen_url: str | None = None, audio_transcribe: str | None = None, audio_gen_url: str | None = None, batch_url: str | None = None, files_url: str | None = None)
Bases:
OpenAIProtocol
- class tuneapi.apis.model_openai.TuneModel(id: str = 'meta/llama-3.1-8b-instruct', base_url: str = 'https://proxy.tune.app/chat/completions', org_id: str | None = None, extra_headers: Dict[str, str] | None = None, api_token: str | None = None, **kwargs)
Bases:
OpenAIProtocolA class to interact with Groq’s Large Language Models (LLMs) via their API.
- id
Identifier for the Mistral model.
- Type:
str
- base_url
The base URL for the Mistral API. Defaults to “https://proxy.tune.app/chat/completions”.
- Type:
str
- org_id
Organization ID for the Tune API.
- Type:
Optional[str]
- extra_headers
Additional headers to include in API requests.
- Type:
Optional[Dict[str, str]]
- api_token
API token for authenticating requests. If not provided, it will use the token from the environment variable MISTRAL_TOKEN.
- Type:
Optional[str]
Note
For more information, visit the Mistral API documentation at https://tune.app/
tuneapi.apis.turbo module
- tuneapi.apis.turbo.distributed_chat(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs) List | Tuple[List, Usage]
Distributes multiple chat prompts across a thread pool for parallel processing.
This function creates a pool of worker threads to process multiple chat prompts concurrently. It handles retry logic for failed requests and maintains the order of responses corresponding to the input prompts.
- Args:
- model (ModelInterface): The base model instance to clone for each worker thread. Each thread gets its own model
instance to ensure thread safety.
- prompts (List[Thread]): A list of chat prompts to process. The order of responses will match the order of these
prompts.
- post_logic (Optional[callable], default=None): A function to process each chat response before storing. If None,
raw responses are stored. Function signature should be: f(chat_response) -> processed_response
- max_threads (int, default=10): Maximum number of concurrent worker threads. Adjust based on API rate limits and
system capabilities.
retry (int, default=3): Number of retry attempts for failed requests. Set to 0 to disable retries.
pbar (bool, default=True): Whether to display a progress bar.
debug (bool, default=False): Whether to log debug information.
- usage (bool, default=False): Whether to return usage statistics. If True, the function will return a tuple of
(responses, usage) where usage is an instance of Usage.
- time_metrics (bool, default=False): Whether to return time metrics. If True, the function will return a tuple of
(responses, time_metrics) where time_metrics is a list of time taken for each prompt.
- Returns:
- List[Any]: A list of responses or errors, maintaining the same order as input prompts.
Successful responses will be either raw or processed (if post_logic provided). Failed requests (after retries) will contain the last error encountered.
- Raises:
ValueError: If max_threads < 1 or retry < 0 TypeError: If model is not an instance of ModelInterface
- Example:
>>> from tuneapi import ta, tt >>> model = ta.Gemini() >>> prompts = [ ... tt.Thread([tt.human("What is 2+2?")]), ... tt.Thread([tt.human("What is Python?")]) ... ] >>> responses = distributed_chat(model, prompts, max_threads=5) >>> for prompt, response in zip(prompts, responses): ... print(f"Q: {prompt}
A: {response} “)
- Note:
Each worker thread gets its own model instance to prevent sharing state
Progress bar shows both initial processing and retries
The function maintains thread safety through message passing channels
- async tuneapi.apis.turbo.distributed_chat_async(model: ModelInterface, prompts: List[Thread], post_logic: callable | None = None, max_threads: int = 10, retry: int = 3, pbar=True, debug=False, usage: bool = False, time_metrics: bool = False, **kwargs)