gperc Architecture

Perceiver Model

This file has code on the neural network of the pervceiver architecture. gperc.models.Perceiver sits at the heart of this project. Use Perceiver for everyday use of the model, when you want to train really large models with model parallellism read here.


gperc out-of-box can handle distributed model parallel training with get_distributed_model() During distributed training and inference with torch.distributed.pipeline.sync.Pipe (read tutorial) the input has to be a nn.Sequential object.


gperc.models.build_position_encoding(position_encoding_type, config, num_index_items, emb_dim)[source]

Get the positional encoding matrix. If position_encoding_type == "trainable" then a random normal matrix is returned, if it is “sinusoid” then

  • position_encoding_type (str) – type of embedding, should be one of “trainable”, “sinusoid”

  • configgperc.PerceiverConfig

  • num_index_items (int) – number of items in the embedding, eg. vocab_size

  • emb_dim (int) – embedding dimension


Item that can be used as a parameter in a torch.nn.Embedding

Return type


class gperc.models.Block(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

Generic block with Attention and MLP layers

  • kv_dim (int) – dimension of the key-value embeddings

  • q_dim (int) – dimension of the query embeddings

  • num_heads (int) – number of heads in the multihead attention

  • ffw_dim (int) – dimension of the feed-forward layer

  • dropout (float, optional) – dropout rate

  • add_residual (bool, optional) – whether to add residual to the query

forward(kv, q, attn_mask=None)[source]

Forward pass of the block that taken in a a key-value tensor and a query tensor and performs the attention and mlp layers. Since it consumes kv and q seperately, the blocks are responisble for cross attention like features. Returns a

  • kv (torch.Tensor) – tensor to extract information from

  • q (torch.Tensor) – tensor for querying the information


tuple of output Tensor and Attention matrix

Return type

Tuple[torch.Tensor, torch.Tensor]

class gperc.models.Embeddings(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

forward(input_array, attention_mask=None, output_array=None)[source]

Takes in either the input_array or tuple with 3 items (input_array, attention_mask, output) and returns a tuple with 4 values (input_array, attention_mask, latent_array, output_array). If configured input_array can have tokens and will be automatically embedded.


When using GPipe you need to send in tensors because it will try to send items as microbatches for each GPU. Now that requires all the inputs to be tensors, so here I have written some basic dumb heuristic that can set attention_mask and output_array to None if average of the values in those tensors is -69 and -420 resp.

Image classification task does not require any attention_mask you can pass that as a tensor with values attention_mask = torch.tensor([-69. for _ in range(batch_size)]) and similarly you can send output_array as a tensor with values output_array = torch.tensor([-420. for _ in range(batch_size)])

class gperc.models.EncoderBlock(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

Encoder Block with postional embeddings

forward(input_array, attention_mask, latent_array, output_array)[source]

takes in a tuple with 4 values (input_array, attention_mask, latent_array, output_array) and returns a tuple with 3 items (latent_array, output_array, attentions)

class gperc.models.ProcessorBlock(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

Processor Block without positional embeddings

forward(latent_array, output_array, attentions)[source]

takes in a tuple with 3 values (latent_array, output_array, attentions) and returns a tuple with 3 items (latent_array, output_array, attentions)

class gperc.models.DecoderBlock(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

forward(input_array, latent_array, output_array, attentions)[source]

takes in a tuple with 3 values (latent_array, output_array, attentions) and returns a tuple with 2 items (output_logits, attentions)

class gperc.models.Perceiver(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

Unassuming Perceiver Architecture that sits at the heart of this project. In practive this is a nice wrapper around model returned by get_sequential_from_config that automatically handles different types of input in a simple fashion. This is a great approach when using on a single GPU or performing Data Parallel training on multiple GPUs. When using this for Model Parallel training, you will need to write your own list etc. read story on distributed for more details.


configgperc.PerceiverConfig object

num_parameters(include_non_trainable: bool = True)[source]

function that returns the number of parameters in the modle


include_non_trainable (bool, optional) – If true includes tensors that have requires_grad=False as well


number of parameters in the model

Return type


save(path: str)[source]

saves the model to a file


path – path to save the model to

forward(input_array, attention_mask=None, output_array=None, return_attentions=False)[source]

Performs the forward pass of the Perceiver.

  • input_array (torch.Tensor) – Input array to the Perceiver, read paper for reference

  • attention_mask (torch.Tensor, optional) – Mask for the decoder, attends at location with value 1

  • output_array (torch.Tensor, optional) – Output array to the Perceiver, read paper for reference

  • return_attentions (bool, optional) – If true returns the attentions as a list


The output of the Perceiver and the attention matrices

Return type

Tuple[torch.Tensor, List[torch.Tensor]] if return_attentions is True else torch.Tensor


This function returns the model that is used for distributed training. This is not a wrapper around Perceiver but instead returns a Pipe object.


config (PerceiverConfig) – Configuration object for the Perceiver


Model that can be used inplace of Perceiver but note that it can only take in torch.Tensor objects and not None.

Return type


class gperc.models.PerceiverMLM(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module

class gperc.models.PerceiverImage(*args: Any, **kwargs: Any)[source]

Bases: torch.nn.Module
