gperc Configurations¶

Configs¶

PerceiverConfig is the final config object that is fed to the model, but it requires knowing exactly what you need to know about the data and the architecture. For this very purpose, there are some simpler configs that are more convenient to use in some cases. They are:

TextConfig: A config that is used for text classification tasks.
ImageConfig: A config that is used for image tasks, supports classification and segmentation.

Discussion¶

At it’s core the model processes either signals (image, audio, time-series) or it consumes discrete inputs (tokens) that gets converted to signals by using embeddings. This simplicity and abstraction has to be brought to config as well, currently each use case has it’s own config. We can take inspiration from:

PEP-518 which talks about using TOML for pyproject.toml

Documentation¶

class gperc.configs.PerceiverConfig(input_len: int = 64, input_dim: int = 8, latent_len: int = 4, latent_dim: int = 16, output_len: int = 1, output_dim: int = 10, ffw_latent: int = 32, ffw_output: int = 32, num_heads: int = 2, num_layers: int = 2, input_type: str = 'raw', input_num_tokens: Optional[int] = None, decoder_reduction: str = 'mean', decoder_residual: bool = False, decoder_projection: bool = True, n_classes: Optional[int] = None, pos_init_std: float = 0.02, dropout: float = 0.1, seed: int = 4, **kwargs)[source]¶

Bases: object

Since perciever is such a powerful and versatile model, we need a good config for this. Different application we will simply define different configurations and wrap them in some model registry-kinda thing. There are many attributes in the config file and the user must understand what they are doing.

I highly recommend reading examples before you start working with this.

Parameters

input_len (int, optional) – (m) The length of the input space
input_dim (int, optional) – (c) The dimension of the input space
latent_len (int, optional) – (n) The length of the latent space
latent_dim (int, optional) – (d) The dimension of the latent space
output_len (int, optional) – (o) The length of the output space
output_dim (int, optional) – (e) The dimension of the output space
ffw_latent (int, optional) – The dimension of the latent space in the feed-forward
ffw_output (int, optional) – The dimension of the output space in the feed-forward
num_heads (int, optional) – The number of heads in the multi-head attention
num_layers (int, optional) – The number of layers in the encoder and decoder
input_type (str, optional) – The type of the input space. Can be either raw or tokens
input_num_tokens (int, optional) – If the input_type == 'tokens' what is the number of tokens
decoder_reduction (str, optional) – After the decoder, how should the output be reduced, should be one of "mean", "max", "sum", "min", "last", "first", None
decoder_residual (bool, optional) – Whether output_array combines with latent_array
decoder_projection (bool, optional) – Whether apply projection on output_array
n_classes (int, optional) – The number of classes in the classification task, must be set if decoder_projection == True
pos_init_std (float, optional) – The standard deviation of the position encoding
dropout (float, optional) – The dropout rate
seed (int, optional) – The seed for the random number generator
**kwargs – Any other arguments to be stored in the config

get_dict()[source]¶

to_json(path=None)[source]¶

from_json(path)[source]¶

class gperc.configs.TextConfig(latent_dim, vocab_size, max_len, latent_frac=0.25, ffw_ratio=1.0, **kwargs)[source]¶

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the text modality cases

Parameters

latent_dim (int) – The dimension of the latent space
vocab_size (int) – The size of the vocabulary
max_len (int) – The maximum length of the input sequence
latent_frac (float) – latent_len will be this multiplied by max_len
ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension

class gperc.configs.ImageConfig(image_shape: Tuple, latent_len: int, latent_dim: int, n_classes: int, decoder_reduction: str = 'mean', ffw_ratio: float = 1.0, task: str = 'classification', **kwargs)[source]¶

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the image modality cases

Parameters

image_shape (Tuple) – The shape of the image in [H, W, C]
latent_len (int) – The length of the latent space
latent_dim (int) – The dimension of the latent space
n_classes (int) – The number of classes after the output space
decoder_reduction (str, optional) – Read more in the PerceiverConfig documentation above
ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension
task (str, optional) – The task to be performed, can be one of classification and segmentation

class gperc.configs.AudioConfig(sample_rate: int, duration: int, hop_length: int, num_mfcc: int, num_segments: int, num_channels: int, latent_len: int, latent_dim: int, n_classes: int, **kwargs)[source]¶

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the audio modality cases

Parameters

sample_rate (int) – Sampling Rate of the audio in Hertz
duration (int) – Duration of the audio in seconds
hop_length (int) – Hop-length of sliding window for FFT in number of samples
num_mfcc (int) – The number of MFCC (Mel-frequency cepstral coefficients) values considered
num_segments (int) – The number of segments the audio is divided into
num_channels (int) – The number of channels in the audio sample (mono or stereo)
latent_len (int) – The length of the latent space
latent_dim (int) – The dimension of the latent space
n_classes (int) – The number of classes after the output space

class gperc.configs.BinaryConfig(seqlen, vocab_size, latent_dim, latent_frac=0.1, n_classes=None, ffw_ratio=1.0, task='classification', **kwargs)[source]¶

Bases: gperc.configs.PerceiverConfig

This is the config format for the binary modality

Parameters

seqlen (int) – The length of the sequence (input_array)
vocab_size (int) – The size of the vocabulary
latent_dim (int) – The dimension of the latent space
latent_frac (float, optional) – latent_len = latent_frac x seqlen
n_classes (int, optional) – The number of classes after the output space
ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension
task (str, optional) – The task to be performed, can be one of classification and None