gperc Configurations¶
Configs¶
PerceiverConfig
is the final config object that is fed to the model, but it requires knowing
exactly what you need to know about the data and the architecture. For this very purpose, there are
some simpler configs that are more convenient to use in some cases. They are:
TextConfig
: A config that is used for text classification tasks.ImageConfig
: A config that is used for image tasks, supportsclassification
andsegmentation
.
Discussion¶
At it’s core the model processes either signals (image, audio, time-series) or it consumes discrete inputs
(tokens) that gets converted to signals by using embeddings. This simplicity and abstraction has to be
brought to config
as well, currently each use case has it’s own config. We can take inspiration from:
PEP-518 which talks about using TOML for
pyproject.toml
Documentation¶
- class gperc.configs.PerceiverConfig(input_len: int = 64, input_dim: int = 8, latent_len: int = 4, latent_dim: int = 16, output_len: int = 1, output_dim: int = 10, ffw_latent: int = 32, ffw_output: int = 32, num_heads: int = 2, num_layers: int = 2, input_type: str = 'raw', input_num_tokens: Optional[int] = None, decoder_reduction: str = 'mean', decoder_residual: bool = False, decoder_projection: bool = True, n_classes: Optional[int] = None, pos_init_std: float = 0.02, dropout: float = 0.1, seed: int = 4, **kwargs)[source]¶
Bases:
object
Since perciever is such a powerful and versatile model, we need a good config for this. Different application we will simply define different configurations and wrap them in some model registry-kinda thing. There are many attributes in the config file and the user must understand what they are doing.
I highly recommend reading examples before you start working with this.
- Parameters
input_len (int, optional) – (
m
) The length of the input spaceinput_dim (int, optional) – (
c
) The dimension of the input spacelatent_len (int, optional) – (
n
) The length of the latent spacelatent_dim (int, optional) – (
d
) The dimension of the latent spaceoutput_len (int, optional) – (
o
) The length of the output spaceoutput_dim (int, optional) – (
e
) The dimension of the output spaceffw_latent (int, optional) – The dimension of the latent space in the feed-forward
ffw_output (int, optional) – The dimension of the output space in the feed-forward
num_heads (int, optional) – The number of heads in the multi-head attention
num_layers (int, optional) – The number of layers in the encoder and decoder
input_type (str, optional) – The type of the input space. Can be either
raw
ortokens
input_num_tokens (int, optional) – If the
input_type == 'tokens'
what is the number of tokensdecoder_reduction (str, optional) – After the decoder, how should the output be reduced, should be one of
"mean", "max", "sum", "min", "last", "first", None
decoder_residual (bool, optional) – Whether
output_array
combines withlatent_array
decoder_projection (bool, optional) – Whether apply projection on
output_array
n_classes (int, optional) – The number of classes in the classification task, must be set if
decoder_projection == True
pos_init_std (float, optional) – The standard deviation of the position encoding
dropout (float, optional) – The dropout rate
seed (int, optional) – The seed for the random number generator
**kwargs – Any other arguments to be stored in the config
- class gperc.configs.TextConfig(latent_dim, vocab_size, max_len, latent_frac=0.25, ffw_ratio=1.0, **kwargs)[source]¶
Bases:
gperc.configs.PerceiverConfig
Config class to specially deal with the text modality cases
- Parameters
latent_dim (int) – The dimension of the latent space
vocab_size (int) – The size of the vocabulary
max_len (int) – The maximum length of the input sequence
latent_frac (float) –
latent_len
will be this multiplied bymax_len
ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension
- class gperc.configs.ImageConfig(image_shape: Tuple, latent_len: int, latent_dim: int, n_classes: int, decoder_reduction: str = 'mean', ffw_ratio: float = 1.0, task: str = 'classification', **kwargs)[source]¶
Bases:
gperc.configs.PerceiverConfig
Config class to specially deal with the image modality cases
- Parameters
image_shape (Tuple) – The shape of the image in [H, W, C]
latent_len (int) – The length of the latent space
latent_dim (int) – The dimension of the latent space
n_classes (int) – The number of classes after the output space
decoder_reduction (str, optional) – Read more in the
PerceiverConfig
documentation aboveffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension
task (str, optional) – The task to be performed, can be one of
classification
andsegmentation
- class gperc.configs.AudioConfig(sample_rate: int, duration: int, hop_length: int, num_mfcc: int, num_segments: int, num_channels: int, latent_len: int, latent_dim: int, n_classes: int, **kwargs)[source]¶
Bases:
gperc.configs.PerceiverConfig
Config class to specially deal with the audio modality cases
- Parameters
sample_rate (int) – Sampling Rate of the audio in Hertz
duration (int) – Duration of the audio in seconds
hop_length (int) – Hop-length of sliding window for FFT in number of samples
num_mfcc (int) – The number of MFCC (Mel-frequency cepstral coefficients) values considered
num_segments (int) – The number of segments the audio is divided into
num_channels (int) – The number of channels in the audio sample (mono or stereo)
latent_len (int) – The length of the latent space
latent_dim (int) – The dimension of the latent space
n_classes (int) – The number of classes after the output space
- class gperc.configs.BinaryConfig(seqlen, vocab_size, latent_dim, latent_frac=0.1, n_classes=None, ffw_ratio=1.0, task='classification', **kwargs)[source]¶
Bases:
gperc.configs.PerceiverConfig
This is the config format for the binary modality
- Parameters
seqlen (int) – The length of the sequence (input_array)
vocab_size (int) – The size of the vocabulary
latent_dim (int) – The dimension of the latent space
latent_frac (float, optional) –
latent_len = latent_frac x seqlen
n_classes (int, optional) – The number of classes after the output space
ffw_ratio (float, optional) – The ratio of the feed-forward layer in Block to input dimension
task (str, optional) – The task to be performed, can be one of
classification
and None