gperc Configurations

Configs

PerceiverConfig is the final config object that is fed to the model, but it requires knowing exactly what you need to know about the data and the architecture. For this very purpose, there are some simpler configs that are more convenient to use in some cases. They are:

  • TextConfig: A config that is used for text classification tasks.

  • ImageConfig: A config that is used for image tasks, supports classification and segmentation.

Discussion

At it’s core the model processes either signals (image, audio, time-series) or it consumes discrete inputs (tokens) that gets converted to signals by using embeddings. This simplicity and abstraction has to be brought to config as well, currently each use case has it’s own config. We can take inspiration from:

  1. PEP-518 which talks about using TOML for pyproject.toml

Documentation

class gperc.configs.PerceiverConfig(input_len: int = 64, input_dim: int = 8, latent_len: int = 4, latent_dim: int = 16, output_len: int = 1, output_dim: int = 10, ffw_latent: int = 32, ffw_output: int = 32, num_heads: int = 2, num_layers: int = 2, input_type: str = 'raw', input_num_tokens: Optional[int] = None, decoder_reduction: str = 'mean', decoder_residual: bool = False, decoder_projection: bool = True, n_classes: Optional[int] = None, pos_init_std: float = 0.02, dropout: float = 0.1, seed: int = 4, **kwargs)[source]

Bases: object

Since perciever is such a powerful and versatile model, we need a good config for this. Different application we will simply define different configurations and wrap them in some model registry-kinda thing. There are many attributes in the config file and the user must understand what they are doing.

I highly recommend reading examples before you start working with this.

Parameters
  • input_len (int, optional) – (m) The length of the input space

  • input_dim (int, optional) – (c) The dimension of the input space

  • latent_len (int, optional) – (n) The length of the latent space

  • latent_dim (int, optional) – (d) The dimension of the latent space

  • output_len (int, optional) – (o) The length of the output space

  • output_dim (int, optional) – (e) The dimension of the output space

  • ffw_latent (int, optional) – The dimension of the latent space in the feed-forward

  • ffw_output (int, optional) – The dimension of the output space in the feed-forward

  • num_heads (int, optional) – The number of heads in the multi-head attention

  • num_layers (int, optional) – The number of layers in the encoder and decoder

  • input_type (str, optional) – The type of the input space. Can be either raw or tokens

  • input_num_tokens (int, optional) – If the input_type == 'tokens' what is the number of tokens

  • decoder_reduction (str, optional) – After the decoder, how should the output be reduced, should be one of "mean", "max", "sum", "min", "last", "first", None

  • decoder_residual (bool, optional) – Whether output_array combines with latent_array

  • decoder_projection (bool, optional) – Whether apply projection on output_array

  • n_classes (int, optional) – The number of classes in the classification task, must be set if decoder_projection == True

  • pos_init_std (float, optional) – The standard deviation of the position encoding

  • dropout (float, optional) – The dropout rate

  • seed (int, optional) – The seed for the random number generator

  • **kwargs – Any other arguments to be stored in the config

get_dict()[source]
to_json(path=None)[source]
from_json(path)[source]
class gperc.configs.TextConfig(latent_dim, vocab_size, max_len, latent_frac=0.25, ffw_ratio=1.0, **kwargs)[source]

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the text modality cases

Parameters
  • latent_dim (int) – The dimension of the latent space

  • vocab_size (int) – The size of the vocabulary

  • max_len (int) – The maximum length of the input sequence

  • latent_frac (float) – latent_len will be this multiplied by max_len

  • ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension

class gperc.configs.ImageConfig(image_shape: Tuple, latent_len: int, latent_dim: int, n_classes: int, decoder_reduction: str = 'mean', ffw_ratio: float = 1.0, task: str = 'classification', **kwargs)[source]

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the image modality cases

Parameters
  • image_shape (Tuple) – The shape of the image in [H, W, C]

  • latent_len (int) – The length of the latent space

  • latent_dim (int) – The dimension of the latent space

  • n_classes (int) – The number of classes after the output space

  • decoder_reduction (str, optional) – Read more in the PerceiverConfig documentation above

  • ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension

  • task (str, optional) – The task to be performed, can be one of classification and segmentation

class gperc.configs.AudioConfig(sample_rate: int, duration: int, hop_length: int, num_mfcc: int, num_segments: int, num_channels: int, latent_len: int, latent_dim: int, n_classes: int, **kwargs)[source]

Bases: gperc.configs.PerceiverConfig

Config class to specially deal with the audio modality cases

Parameters
  • sample_rate (int) – Sampling Rate of the audio in Hertz

  • duration (int) – Duration of the audio in seconds

  • hop_length (int) – Hop-length of sliding window for FFT in number of samples

  • num_mfcc (int) – The number of MFCC (Mel-frequency cepstral coefficients) values considered

  • num_segments (int) – The number of segments the audio is divided into

  • num_channels (int) – The number of channels in the audio sample (mono or stereo)

  • latent_len (int) – The length of the latent space

  • latent_dim (int) – The dimension of the latent space

  • n_classes (int) – The number of classes after the output space

class gperc.configs.BinaryConfig(seqlen, vocab_size, latent_dim, latent_frac=0.1, n_classes=None, ffw_ratio=1.0, task='classification', **kwargs)[source]

Bases: gperc.configs.PerceiverConfig

This is the config format for the binary modality

Parameters
  • seqlen (int) – The length of the sequence (input_array)

  • vocab_size (int) – The size of the vocabulary

  • latent_dim (int) – The dimension of the latent space

  • latent_frac (float, optional) – latent_len = latent_frac x seqlen

  • n_classes (int, optional) – The number of classes after the output space

  • ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension

  • task (str, optional) – The task to be performed, can be one of classification and None