logo

PyTorch 模型层

王哲峰 / 2022-08-12


目录

模型层简介

深度学习模型一般由各种模型层组合而称。torch.nn 中内置了非常丰富的各种模型层, 它们都属于 torch.nn.Module 的子类,具备参数管理功能。

如果这些内置的模型层不能满足需求,也可以通过继承 torch.nn.Module 基类构建自定义的模型层。 实际上 PyTorch 不区分模型和模型层,都是通过继承 torch.nn.Module 进行构建, 因此,只要继承 torch.nn.Module 基类并实现 forward 方法即可自定义模型层。

基础层

全连接层

nn.Linear:全连接层

nn.Flatten:压平层

Embedding 层

nn.Embedding:嵌入层

Normalization 层

归一化层,Normalization

BatchNormalization 层

其他 Normalization 层

Dropout 层

Dropout 层是一种正则化手段

Padding 层

限幅层

卷积网络相关层

卷积层

一维卷积

nn.Conv1d:普通一维卷积,常用于文本

二维卷积

nn.Conv2d:普通二维卷积,常用于图像

三维卷积

nn.Conv3d:普通三维卷积,常用于视频

池化层

最大池化层

平均池化层

其他

循环网络相关层

RNN 层

LSTM 层

GRU 层

Transformer 相关层

Transformer 网络结构是替代循环网络的一种结构,解决了循环网络难以并行,难以捕捉长期依赖的缺陷。 它是目前 NLP 任务的主流模型的主要构成部分

自定义模型层

如果这些内置的模型层不能满足需求,也可以构建自定义的模型层。实际上 PyTorch 不区分模型和模型层, 因此,只要继承 torch.nn.Module 基类并实现 forward 方法即可自定义模型层。

可以仿照下面的 torch.nn.Linear 层源码自定义模型层:

import math

import torch
from torch import nn
import torch.nn.functional as F


class Linear(nn.Module):
    __constants__ = ["in_features", "out_features"]

    def __init__(self, in_features, out_features, bias = True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(
            torch.Tensor(out_features, in_features)
        )
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter("bias", None)
        self.reset_parameters()

    def reset_parameters(self):
        nn.init.kaiming_uniform_(self.weight, a = math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = nn.init.calculate_fan_in_and_fan_out(
                self.weight
            )
            bound = 1 / math.sqrt(fan_in)
            nn.init.uniform_(self.bias, -bound, bound)
    
    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return f"in_features={self.in_features}, \
                 out_features={self.out_features}, \
                 bias={self.bias is not None}"

functional 和 Module

PyTorch 和神经网络相关的功能组件大多都封装在 torch.nn 模块下, 这些功能组件的绝大部分既有函数形式实现,也有类形式实现:

functional

torch.nn.functional 有各种功能组件的函数实现:

import torch.nn.functional as F

示例:

import torch
import torch.nn.functional as F

torch.relu(torch.tensor(-1.0))
F.relu(torch.tensor(-1.0))
tensor(0.)
tensor(0.)

Module

为了便于对参数进行管理,一般通过继承 torch.nn.Module 转换称为类的实现形式, 并直接封装在 torch.nn 模块下。

from torch import nn

实际上,torch.nn.Module 除了可以管理其引用的各种参数,还可以管理其引用的子模块,功能十分强大。

使用 Module 管理参数

在 PyTorch 中,模型的参数是需要被优化器训练的,因此,通常要设置参数为 requires_grad = True 的张量。 同时,在一个模型中,往往有许多的参数,要手动管理这些参数并不是一件容易的事情。 PyTorch 一般将参数用 nn.Parameter 来表示,并且用 nn.Module 来管理其结构下的所有参数。

import torch
from torch import nn
import torch.nn.functional as F
torch.randn(2, 2, requires_grad = True)
w = nn.Parameter(torch.randn(2, 2))
print(w)
print(w.requires_grad)
params_list = nn.ParameterList([
    nn.Parameter(torch.rand(8, i))
    for i in range(1, 3)
])
print(params_list)
print(params_list[0].requires_grad)
params_dict = nn.ParameterDict({
    "a": nn.Parameter(torch.rand(2, 2)),
    "b": nn.Parameter(torch.zeros(2)),
})
print(params_dict)
print(params_dict["a"].requires_grad)
module = nn.Module()
module.w = nn.Parameter(
    torch.randn(2, 2)
)
module.params_list = nn.ParameterList([
    nn.Parameter(torch.rand(8, i))
    for i in range(1, 3)
])
module.param_dict = nn.ParameterDict({
    "a": nn.Parameter(torch.rand(2, 2)),
    "b": nn.Parameter(torch.zeros(2)),
})

num_param = 0
for param in module.named_parameters():
    print(param, "\n")
    num_param = num_param + 1
print(f"Number of Parameters = {num_param}")
class Linear(nn.Module):
    __constants__ = ["in_features", "out_features"]

    def __init__(self, in_features, out_features, bias = True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(
            torch.Tensor(out_features, in_features)
        )
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter("bias", None)
        
    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

使用 Module 管理子模块

一般情况下,很少直接使用 nn.Parameter 来定义参数构建模型,而是通过一些拼装一些常用的模型层来构造模型。 这些模型层也是继承自 nn.Module 的对象,本身也包括参数,属于要定义的模块的子模块。

nn.Module 提供了一些方法可以管理这些子模块:

其中:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()

        self.embedding = nn.Embedding(
            num_embedding = 10000, 
            embedding_dim = 3, 
            padding_idx = 1
        )

        self.conv = nn.Sequential()
        self.conv.add_module(
            "conv_1", 
            nn.Conv1d(in_channels = 3, out_channels = 16, kernel_size = 5),
        )
        self.conv.add_module(
            "pool_1",
            nn.MaxPool1d(kernel_size = 2),
        )
        self.conv.add_module(
            "relu",
            nn.ReLU(),
        )
        self.conv.add_module(
            "conv_2",
            nn.Conv1d(in_channels = 16, out_channels = 128, kernel_size = 2),
        )
        self.conv.add_module(
            "pool_2",
            nn.MaxPool1d(kernel_size = 2),
        )
        self.conv.add_module(
            "relu_2",
            nn.ReLU(),
        )

        self.dense = nn.Sequential()
        self.dense.add_module("flatten", nn.Flatten())
        self.dense.add_module("linear", nn.Linear(6144, 1))
    
    def forward(self, x):
        x = self.embedding(x).transpose(1, 2)
        x = self.conv(x)
        y = self.dense(x)
        return y

net = Net()
i = 0
for child in net.children():
    i += 1
    print(child, "\n")
print("child number", i)
Embedding(10000, 3, padding_idx=1) 

Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
) 

child number 3
i = 0
for name, child in net.named_children():
    i += 1
    print(name, ":", child, "\n")
print("child number", i)
embedding : Embedding(10000, 3, padding_idx=1) 

conv : Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

dense : Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
) 

child number 3
i = 0
for module in net.modules():
    i += 1
    print(module)
print("module number:", i)
Net(
  (embedding): Embedding(10000, 3, padding_idx=1)
  (conv): Sequential(
    (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
    (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_1): ReLU()
    (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
    (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_2): ReLU()
  )
  (dense): Sequential(
    (flatten): Flatten(start_dim=1, end_dim=-1)
    (linear): Linear(in_features=6144, out_features=1, bias=True)
  )
)
Embedding(10000, 3, padding_idx=1)
Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
)
Conv1d(3, 16, kernel_size=(5,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Conv1d(16, 128, kernel_size=(2,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
)
Flatten(start_dim=1, end_dim=-1)
Linear(in_features=6144, out_features=1, bias=True)
module number: 12
children_dict = {
    name: module for name, module in net.named_children()
}
print(children_dict)

embedding = children_dict["embedding"]
embedding.requires_grad_(False)
{'embedding': Embedding(10000, 3, padding_idx=1), 'conv': Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
), 'dense': Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
)}
Embedding(10000, 3, padding_idx=1)
# 第一层的参数已经不可以被训练
for param in embedding.parameters():
    print(param.requires_grad)
    print(param.numel())
False
30000