cvpods.layers package¶

class cvpods.layers.MemoryEfficientSwish[source]¶

Bases: torch.nn.modules.module.Module

forward(x)[source]¶

class cvpods.layers.Swish[source]¶

Bases: torch.nn.modules.module.Module

Implement the Swish activation function. See: https://arxiv.org/abs/1710.05941 for more details.

forward(x)[source]¶

class cvpods.layers.FrozenBatchNorm2d(num_features, eps=1e-05)[source]¶

Bases: torch.nn.modules.module.Module

BatchNorm2d where the batch statistics and the affine parameters are fixed.

It contains non-trainable buffers called “weight” and “bias”, “running_mean”, “running_var”, initialized to perform identity transformation.

The pre-trained backbone models from Caffe2 only contain “weight” and “bias”, which are computed from the original four parameters of BN. The affine transform x * weight + bias will perform the equivalent computation of (x - running_mean) / sqrt(running_var) * weight + bias. When loading a backbone model from Caffe2, “running_mean” and “running_var” will be left unchanged as identity transformation.

Other pre-trained backbone models may contain all 4 parameters.

The forward is implemented by F.batch_norm(…, training=False).

forward(x)[source]¶

classmethod convert_frozen_batchnorm(module)[source]¶

Convert BatchNorm/SyncBatchNorm in module into FrozenBatchNorm.

Parameters: module (torch.nn.Module) –
Returns: If module is BatchNorm/SyncBatchNorm, returns a new module. Otherwise, in-place convert module and return it.

Similar to convert_sync_batchnorm in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py

class cvpods.layers.NaiveSyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]¶

Bases: torch.nn.modules.batchnorm.BatchNorm2d

torch.nn.SyncBatchNorm has known unknown bugs. It produces significantly worse AP (and sometimes goes NaN) when the batch size on each worker is quite different (e.g., when scale augmentation is used, or when it is applied to mask head).

Use this implementation before nn.SyncBatchNorm is fixed. It is slower than nn.SyncBatchNorm.

forward(input)[source]¶

cvpods.layers.get_activation(activation)[source]¶

Parameters: norm (str or callable) –
Returns: nn.Module or None – the normalization layer

cvpods.layers.get_norm(norm, out_channels)[source]¶

Parameters: norm (str or callable) –
Returns: nn.Module or None – the normalization layer

class cvpods.layers.DeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=False, norm=None, activation=None)[source]¶

Bases: torch.nn.modules.module.Module

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=False, norm=None, activation=None)[source]¶

Deformable convolution.

Arguments are similar to Conv2D. Extra arguments:

Parameters

deformable_groups (int) – number of groups used in deformable convolution.
norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

forward(x, offset)[source]¶

extra_repr()[source]¶

class cvpods.layers.ModulatedDeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=True, norm=None, activation=None)[source]¶

Bases: torch.nn.modules.module.Module

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=True, norm=None, activation=None)[source]¶

Modulated deformable convolution.

Arguments are similar to Conv2D. Extra arguments:

Parameters

deformable_groups (int) – number of groups used in deformable convolution.
norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

forward(x, offset, mask)[source]¶

extra_repr()[source]¶

class cvpods.layers.DeformConvWithOff(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, deformable_groups=1)[source]¶

Bases: torch.nn.modules.module.Module

forward(input)[source]¶

class cvpods.layers.ModulatedDeformConvWithOff(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, deformable_groups=1)[source]¶

Bases: torch.nn.modules.module.Module

forward(input)[source]¶

cvpods.layers.paste_masks_in_image(masks, boxes, image_shape, threshold=0.5)[source]¶

Paste a set of masks that are of a fixed resolution (e.g., 28 x 28) into an image. The location, height, and width for pasting each mask is determined by their corresponding bounding boxes in boxes.

Parameters

masks (tensor) – Tensor of shape (Bimg, Hmask, Wmask), where Bimg is the number of detected object instances in the image and Hmask, Wmask are the mask width and mask height of the predicted mask (e.g., Hmask = Wmask = 28). Values are in [0, 1].
boxes (Boxes or Tensor) – A Boxes of length Bimg or Tensor of shape (Bimg, 4). boxes[i] and masks[i] correspond to the same object instance.
image_shape (tuple) – height, width
threshold (float) – A threshold in [0, 1] for converting the (soft) masks to binary masks.

Returns

img_masks (Tensor) – A tensor of shape (Bimg, Himage, Wimage), where Bimg is the number of detected object instances and Himage, Wimage are the image width and height. img_masks[i] is a binary mask for object instance i.

cvpods.layers.batched_nms(boxes, scores, idxs, iou_threshold)[source]¶: Same as torchvision.ops.boxes.batched_nms, but safer.

cvpods.layers.batched_nms_rotated(boxes, scores, idxs, iou_threshold)[source]¶

Performs non-maximum suppression in a batched fashion.

Each index value correspond to a category, and NMS will not be applied between elements of different categories.

Parameters

boxes (Tensor[N, 5]) – boxes where NMS will be performed. They are expected to be in (x_ctr, y_ctr, width, height, angle_degrees) format
scores (Tensor[N]) – scores for each one of the boxes
idxs (Tensor[N]) – indices of the categories for each one of the boxes.
iou_threshold (float) – discards all overlapping boxes with IoU < iou_threshold

Returns

Tensor – int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

cvpods.layers.batched_softnms(boxes, scores, idxs, iou_threshold, score_threshold=0.001, soft_mode='gaussian')[source]¶

cvpods.layers.batched_softnms_rotated(boxes, scores, idxs, iou_threshold, score_threshold=0.001, soft_mode='gaussian')[source]¶

cvpods.layers.cluster_nms(boxes, scores, iou_threshold)[source]¶

cvpods.layers.generalized_batched_nms(boxes, scores, idxs, iou_threshold, score_threshold=0.001, nms_type='normal')[source]¶

cvpods.layers.matrix_nms(seg_masks, cate_labels, cate_scores, kernel='gaussian', sigma=2.0, sum_masks=None)[source]¶

Matrix NMS for multi-class masks. See: https://arxiv.org/pdf/2003.10152.pdf for more details.

Parameters

seg_masks (Tensor) – shape: [N, H, W], binary masks.
cate_labels (Tensor) – shepe: [N], mask labels in descending order.
cate_scores (Tensor) – shape [N], mask scores in descending order.
kernel (str) – ‘linear’ or ‘gaussian’.
sigma (float) – std in gaussian method.
sum_masks (Tensor) – The sum of seg_masks.

Returns

Tensor – cate_scores_update, tensors of shape [N].

cvpods.layers.nms(boxes, scores, iou_threshold)[source]¶

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.

boxesTensor[N, 4]): boxes to perform NMS on. They are expected to be in (x1, y1, x2, y2) format
scoresTensor[N]: scores for each one of the boxes
iou_thresholdfloat: discards all overlapping boxes with IoU > iou_threshold

keepTensor: int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

cvpods.layers.nms_rotated(boxes, scores, iou_threshold)[source]¶

Performs non-maximum suppression (NMS) on the rotated boxes according to their intersection-over-union (IoU).

Rotated NMS iteratively removes lower scoring rotated boxes which have an IoU greater than iou_threshold with another (higher scoring) rotated box.

Note that RotatedBox (5, 3, 4, 2, -90) covers exactly the same region as RotatedBox (5, 3, 4, 2, 90) does, and their IoU will be 1. However, they can be representing completely different objects in certain tasks, e.g., OCR.

As for the question of whether rotated-NMS should treat them as faraway boxes even though their IOU is 1, it depends on the application and/or ground truth annotation.

As an extreme example, consider a single character v and the square box around it.

If the angle is 0 degree, the object (text) would be read as ‘v’;

If the angle is 90 degrees, the object (text) would become ‘>’;

If the angle is 180 degrees, the object (text) would become ‘^’;

If the angle is 270/-90 degrees, the object (text) would become ‘<’

All of these cases have IoU of 1 to each other, and rotated NMS that only uses IoU as criterion would only keep one of them with the highest score - which, practically, still makes sense in most cases because typically only one of theses orientations is the correct one. Also, it does not matter as much if the box is only used to classify the object (instead of transcribing them with a sequential OCR recognition model) later.

On the other hand, when we use IoU to filter proposals that are close to the ground truth during training, we should definitely take the angle into account if we know the ground truth is labeled with the strictly correct orientation (as in, upside-down words are annotated with -180 degrees even though they can be covered with a 0/90/-90 degree box, etc.)

The way the original dataset is annotated also matters. For example, if the dataset is a 4-point polygon dataset that does not enforce ordering of vertices/orientation, we can estimate a minimum rotated bounding box to this polygon, but there’s no way we can tell the correct angle with 100% confidence (as shown above, there could be 4 different rotated boxes, with angles differed by 90 degrees to each other, covering the exactly same region). In that case we have to just use IoU to determine the box proximity (as many detection benchmarks (even for text) do) unless there’re other assumptions we can make (like width is always larger than height, or the object is not rotated by more than 90 degrees CCW/CW, etc.)

In summary, not considering angles in rotated NMS seems to be a good option for now, but we should be aware of its implications.

Parameters

boxes (Tensor[N, 5]) – Rotated boxes to perform NMS on. They are expected to be in (x_center, y_center, width, height, angle_degrees) format.
scores (Tensor[N]) – Scores for each one of the rotated boxes
iou_threshold (float) – Discards all overlapping rotated boxes with IoU < iou_threshold

Returns

keep (Tensor) – int64 tensor with the indices of the elements that have been kept by Rotated NMS, sorted in decreasing order of scores

cvpods.layers.softnms(boxes, scores, sigma, score_threshold, soft_mode='gaussian')[source]¶

cvpods.layers.softnms_rotated(boxes, scores, sigma, score_threshold, soft_mode='gaussian')[source]¶

class cvpods.layers.ROIAlign(output_size, spatial_scale, sampling_ratio, aligned=True)[source]¶

Bases: torch.nn.modules.module.Module

__init__(output_size, spatial_scale, sampling_ratio, aligned=True)[source]¶

Parameters

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely.
aligned (bool) – if False, use the legacy implementation in Detectron. If True, align the results more perfectly.

Note

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors; see cvpods/tests/test_roi_align.py for verification.

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input, rois)[source]¶

Parameters

input – NCHW images
rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.

cvpods.layers.roi_align()¶

class cvpods.layers.ROIAlignRotated(output_size, spatial_scale, sampling_ratio)[source]¶

Bases: torch.nn.modules.module.Module

__init__(output_size, spatial_scale, sampling_ratio)[source]¶

Parameters

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely.

Note

ROIAlignRotated supports continuous coordinate by default: Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5).

forward(input, rois)[source]¶

Parameters

input – NCHW images
rois – Bx6 boxes. First column is the index into N. The other 5 columns are (x_ctr, y_ctr, width, height, angle_degrees).

cvpods.layers.roi_align_rotated()¶

class cvpods.layers.ShapeSpec[source]¶

Bases: cvpods.layers.shape_spec._ShapeSpec

A simple structure that contains basic shape specification about a tensor. It is often used as the auxiliary inputs/outputs of models, to obtain the shape inference ability among pytorch modules.

channels¶

height¶

width¶

stride¶

class cvpods.layers.SwapAlign2Nat(lambda_val, pad_val=- 6.0)[source]¶

Bases: torch.nn.modules.module.Module

The op SwapAlign2Nat described in https://arxiv.org/abs/1903.12174. Given an input tensor that predicts masks of shape (N, C=VxU, H, W), apply the op, it will return masks of shape (N, V’xU’, H’, W’) where the unit lengths of (V, U) and (H, W) are swapped, and the mask representation is transformed from aligned to natural. :param lambda_val: the relative unit length ratio between (V, U) and (H, W), :type lambda_val: int :param as we always have larger unit lengths for: :type as we always have larger unit lengths for: V, U) than (H, W :param lambda_val is always >= 1.: :param pad_val: padding value for the values falling outside of the input :type pad_val: float :param tensor, default set to -6 as sigmoid: :type tensor, default set to -6 as sigmoid: -6 :param that is no masks outside of the tensor.:

forward(X)[source]¶

cvpods.layers.swap_align2nat()¶

class cvpods.layers.TreeFilterV2(guide_channels, in_channels, embed_channels, num_groups=1, eps=1e-08)[source]¶

Bases: torch.nn.modules.module.Module

num_groups = None¶: Embedding Layers

gamma = None¶: Core of Tree Filter

tree_filter_layer = None¶: Parameters init

reset_parameter()[source]¶

split_groups(x)[source]¶

expand_groups(x)[source]¶

forward(feature, guide)[source]¶

class cvpods.layers.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]¶

Bases: torch.nn.modules.batchnorm._BatchNorm

Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0.

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.

If track_running_stats is set to False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

Note

This momentum argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.

Parameters

num_features – \(C\) from an expected input of size \((N, C, H, W)\)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

Shape:

Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)

Examples:

>>> # With Learnable Parameters
>>> m = nn.BatchNorm2d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm2d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)

class cvpods.layers.Conv2d(*args, **kwargs)[source]¶

Bases: torch.nn.modules.conv.Conv2d

A wrapper around torch.nn.Conv2d to support empty inputs and more features.

__init__(*args, **kwargs)[source]¶

Extra keyword arguments supported in addition to those in torch.nn.Conv2d:

Parameters

norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

It assumes that norm layer is used before activation.

forward(x)[source]¶

bias = None¶

class cvpods.layers.Conv2dSamePadding(*args, **kwargs)[source]¶

Bases: torch.nn.modules.conv.Conv2d

A wrapper around torch.nn.Conv2d to support “SAME” padding mode and more features.

__init__(*args, **kwargs)[source]¶

Extra keyword arguments supported in addition to those in torch.nn.Conv2d:

Parameters

norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

It assumes that norm layer is used before activation.

forward(x)[source]¶

bias = None¶

class cvpods.layers.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')[source]¶

Bases: torch.nn.modules.conv._ConvTransposeNd

Applies a 2D transposed convolution operator over an input image composed of several input planes.

This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation).

stride controls the stride for the cross-correlation.
padding controls the amount of implicit zero-paddings on both sides for dilation * (kernel_size - 1) - padding number of points. See note below for details.
output_padding controls the additional size added to one side of the output shape. See note below for details.
dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.
groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,
- At groups=1, all inputs are convolved to all outputs.
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- At groups= in_channels, each input channel is convolved with its own set of filters (of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\)).

The parameters kernel_size, stride, padding, output_padding can either be:

a single int – in which case the same value is used for the height and width dimensions

a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension

Note

Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.

Note

The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input. This is set so that when a Conv2d and a ConvTranspose2d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1, Conv2d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output.

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

Shape:

Input: \((N, C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) where

\[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1\]

\[W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1\]

weight¶

the learnable weights of the module of shape \((\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)

Type: Tensor

bias¶

the learnable bias of the module of shape (out_channels) If bias is True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)

Type: Tensor

Examples:

>>> # With square kernels and equal stride
>>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
>>> # exact output size can be also specified as an argument
>>> input = torch.randn(1, 16, 12, 12)
>>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
>>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
>>> h = downsample(input)
>>> h.size()
torch.Size([1, 16, 6, 6])
>>> output = upsample(h, output_size=input.size())
>>> output.size()
torch.Size([1, 16, 12, 12])

forward(input, output_size=None)[source]¶

bias = None¶

class cvpods.layers.DisAlignLinear(in_features: int, out_features: int, bias: bool = True)[source]¶

Bases: torch.nn.modules.linear.Linear

A wrapper for nn.Linear with support of DisAlign method.

forward(input: torch.Tensor)[source]¶

class cvpods.layers.DisAlignNormalizedLinear(in_features: int, out_features: int, bias: bool = False, **args)[source]¶

Bases: cvpods.layers.wrappers.NormalizedLinear

A wrapper for nn.Linear with support of DisAlign method.

forward(input: torch.Tensor)[source]¶

class cvpods.layers.MaxPool2dSamePadding(*args, **kwargs)[source]¶

Bases: torch.nn.modules.pooling.MaxPool2d

A wrapper around torch.nn.MaxPool2d to support “SAME” padding mode and more features.

See: https://github.com/pytorch/pytorch/issues/3867

forward(x)[source]¶

class cvpods.layers.NormalizedConv2d(*args, **kwargs)[source]¶

Bases: torch.nn.modules.conv.Conv2d

A wrapper around torch.nn.Conv2d to support empty inputs and more features.

__init__(*args, **kwargs)[source]¶

Extra keyword arguments supported in addition to those in torch.nn.Conv2d:

Parameters

norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

It assumes that norm layer is used before activation.

extra_repr()[source]¶

forward(x)[source]¶

bias = None¶

class cvpods.layers.NormalizedLinear(in_features, out_features, bias=False, feat_norm=True, scale_mode='learn', scale_init=1.0)[source]¶

Bases: torch.nn.modules.module.Module

A advanced Linear layer which supports weight normalization or cosine normalization.

reset_parameters()[source]¶

forward(inputs)[source]¶

Parameters: inputs (torch.Tensor) – (N, C)
Returns: output (torch.Tensor) – (N, D)

extra_repr()[source]¶

class cvpods.layers.SeparableConvBlock(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True, norm=None, activation=None)[source]¶

Bases: torch.nn.modules.module.Module

Depthwise seperable convolution block.

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True, norm=None, activation=None)[source]¶

Parameters

in_channels (int) – the number of input tensor channels.
out_channels (int) – the number of output tensor channels.
kernel_size (int) – the kernel size.
stride (int or tuple or list) – the stride.
bias (bool) – if True, the pointwise conv applies bias.
apply_bn (bool) – if True, apply BN layer after conv layer.
norm (nn.Module, optional) – a normalization layer
activation (callable(Tensor) -> Tensor) – a callable activation function

It assumes that norm layer is used before activation.

forward(inputs)[source]¶

cvpods.layers.cat(tensors, dim=0)[source]¶: Efficient version of torch.cat that avoids a copy if there is only a single element in a list

cvpods.layers.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)[source]¶: A wrapper around torch.nn.functional.interpolate() to support zero-size tensor.