实现回归 线性回归 首先是一些基础包的导入
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import randomimport matplotlib.pyplot as pltimport torch as timport seaborn as snssns.set() sns.set_context("notebook" , rc={"font.size" :16 , "axes.titlesize" :20 ,"axes.labelsize" :18 })CB91_Blue = '#2CBDFE' CB91_Green = '#47DBCD' CB91_Pink = '#F3A0F2' CB91_Purple = '#9D2EC5' CB91_Violet = '#661D98' CB91_Amber = '#F5B14C' color_list = [CB91_Blue, CB91_Pink, CB91_Green, CB91_Amber, CB91_Purple, CB91_Violet] plt.rcParams['axes.prop_cycle' ] = plt.cycler(color=color_list)
产生随机数据
1 2 x=np.arange(20 ) y=np.array([5 *x[i]+random.randint(1 ,20 ) for i in range(len(x))])
1 2 3 4 5 print(x) print(y) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ] [ 2 20 24 24 30 43 40 55 45 48 66 73 80 76 75 79 88 104 100 107 ]
作图如下
训练数据,首先要转换为Tensor类型
1 2 3 4 5 6 7 8 9 10 x_train = t.from_numpy(x).float() y_train = t.from_numpy(y).float() print(x_train) print(y_train) tensor([ 0. , 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 9. , 10. , 11. , 12. , 13. , 14. , 15. , 16. , 17. , 18. , 19. ]) tensor([ 2. , 20. , 24. , 24. , 30. , 43. , 40. , 55. , 45. , 48. , 66. , 73. , 80. , 76. , 75. , 79. , 88. , 104. , 100. , 107. ])
定义训练模型
1 2 3 4 5 6 class LinearRegression (t.nn.Module ): def __init__ (self ): super(LinearRegression,self).__init__() self.linear = t.nn.Linear(1 ,1 ) def forward (self,x ): return self.linear(x)
super()函数作用:首先找到父类t.nn.Module,后将类LinearRegression的对象self转换为t.nn.Module的对象,然后“被转换”的类nn.Module对象调用自己的__init__函数.
进行模型的训练
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 model = LinearRegression() criterion = t.nn.MSELoss() optimizer = t.optim.SGD(model.parameters(),0.001 ) num_epochs = 1000 for i in range(num_epochs): input_data = x_train.unsqueeze(1 ) target = y_train.unsqueeze(1 ) out = model(input_data) loss = criterion(out,target) optimizer.zero_grad() loss.backward() optimizer.step() if ((i+1 )%200 ==0 ): predict = model(input_data) plt.plot(x_train.data.numpy(),predict.squeeze(1 ).data.numpy()) loss=criterion(predict,target) plt.title("Loss:{:.4f}" .format(loss.item())) plt.xlabel("X" ) plt.ylabel("Y" ) plt.scatter(x_train,y_train) plt.show()
进行1000次迭代,每两百次打印图片
多项式回归 线性回归虽然可以拟合出来一条直线,但精度较低,可以看到上图的线性回归的Loss的值为43.2122 ,是较高的。我们可以通过多项式回归提高模型精度,多项式回归就是提高特征的次数,如前面的线性回归的$x$是1次,实际上可以采用二次、三次的,增加模型的复杂度,可能会带来overfitting。
Example Ⅰ 下面实现一个简单的多项式回归,用模型拟合一个复杂的多项式方程:
多项式数据准备:
1 2 3 4 x = t.linspace(-3 ,3 ,50 ) y = -1.13 *x-2.14 *t.pow(x,2 )+3.15 *t.pow(x,3 )-0.01 *t.pow(x,4 )+0.512 plt.scatter(x.data.numpy(),y.data.numpy())
借助公式随机产生50个点,可视化图像如下:
由于现在的输入不再是一元函数的一维,而是现在的四维,输入变成一个矩阵的形式:
其中$x_n^4$代表第n组样本的第四个feature。
下面将数据拼接成如上所示的矩阵形式:
1 2 3 def features (x ): x = x.unsqueeze(1 ) return t.cat([x**i for i in range(1 ,5 )],1 )
现在得到了标准的输入矩阵$X$,还缺少$y$的值,而$y$是通过函数$f(x)$计算出来的,通过以下函数来计算得到$y$值:
1 2 3 4 5 x_weight = t.Tensor([-1.13 ,-2.14 ,3.15 ,-0.01 ]) x_weight = x_weight.unsqueeze(1 ) b = t.Tensor([0.512 ]) def target (x ): return x.mm(x_weight)+b.item()
上面的代码用到了Tensor的mm方法,它是表示矩阵相乘(Matrix Multiplication)。现在通过上面两个方法,批量生成用于训练的数据:
1 2 3 4 5 6 7 def get_data (batch_size ): batch_x = t.randn(batch_size) feature_x = features(batch_x) target_y = target(feature_x) return feature_x,target_y
下面创建多项式回归模型,使用torch.nn.Linear模型:
1 2 3 4 5 6 7 class PloynomialRegression (t.nn.Module ): def __init__ (self ): super(PloynomialRegression,self).__init__() self.ploy = t.nn.Linear(4 ,1 ) def forward (self,x ): return self.ploy(x)
模型新建好后开始训练模型,为了动态的显示模型训练结果,在程序中设置每1000个epoch,就对测试数据进行一次预测,并将预测的误差及预测的输出值可视化显示。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 epochs = 1000 batch_size = 32 model = PloynomialRegression() criterion = t.nn.MSELoss() optimizer = t.optim.SGD(model.parameters(),0.001 ) for epoch in range(epochs): batch_x,batch_y = get_data(batch_size) out = model(batch_x) loss = criterion(out,batch_y) optimizer.zero_grad() loss.backward() optimizer.step() if (epoch%200 ==0 ): predict = model(features(x)) plt.plot(x.data.numpy(),predict.squeeze(1 ).data.numpy(),CB91_Pink) loss = criterion(predict,y.unsqueeze(1 )) plt.title("Loss:{:.4f}" .format(loss.item())) plt.xlabel("X" ) plt.ylabel("Y" ) plt.scatter(x,y) plt.show()
经过大量的迭代后,模型能够很好的拟合测试数据。
Example Ⅱ 实现更复杂的曲线拟合,这里我们对心形曲线进行拟合,函数表达式如下:
首先借助公式随机生成10000个点:
1 2 3 4 e = t.linspace(-15 ,15 ,10000 ) x1 = 16 *t.sin(e)**3 y1 = 13 *t.cos(e)-5 *t.cos(2 *e)-2 *t.cos(3 *e)-t.cos(4 *e) plt.scatter(x1.data.numpy(),y1.data.numpy())
可视化如下:
与Example Ⅰ不同的是,在这里有两个变量,需要对X,Y都要进行预测。
对X,Y两个变量,分别进行数据处理:
1 2 3 4 5 6 def fea_x (e ): e = e.unsqueeze(1 ) return t.sin(e)**3 def fea_y (e ): e = e.unsqueeze(1 ) return t.cat([t.cos(i*e) for i in range(1 ,5 )],1 )
有了这两个方法,便可以批量生成X,Y的训练数据了:
1 2 3 4 5 6 7 def data (size ): r = t.randn(size) x_fea = fea_x(r) y_fea = fea_y(r) x_tg = 16 *x_fea y_tg = tg_y(y_fea) return x_fea,y_fea,x_tg,y_tg
接下来就是定义模型,由于有两个变量,需要分别进行预测,故对X,Y定义两个模型:
1 2 3 4 5 6 7 8 9 10 11 12 class HeartModel_x (t.nn.Module ): def __init__ (self ): super(HeartModel_x,self).__init__() self.heartx = t.nn.Linear(1 ,1 ) def forward (self,x ): return self.heartx(x) class HeartModel_y (t.nn.Module ): def __init__ (self ): super(HeartModel_y,self).__init__() self.hearty = t.nn.Linear(4 ,1 ) def forward (self,x ): return self.hearty(x)
最后,进行模型训练,与Example Ⅰ大致相同,不再赘述,不同的是同时对X,Y进行训练:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 epoachs = 50000 batch_size = 10000 model_x = HeartModel_x() model_y = HeartModel_y() criterion = t.nn.MSELoss() optimizer1 = t.optim.SGD(model_x.parameters(),0.001 ) optimizer2 = t.optim.SGD(model_y.parameters(),0.001 ) for epoach in range(epoachs): x_train,y_train,x_target,y_target = data(batch_size) out_x = model_x(x_train) out_y = model_y(y_train) loss_x = criterion(out_x,x_target) loss_y = criterion(out_y,y_target) optimizer1.zero_grad() optimizer2.zero_grad() loss_x.backward() loss_y.backward() optimizer1.step() optimizer2.step() if (epoach%10000 ==0 ): predict_x = model_x(fea_x(e)) predict_y = model_y(fea_y(e)) plt.plot(predict_x.squeeze(1 ).data.numpy(),predict_y.squeeze(1 ).data.numpy(),CB91_Pink) loss_x = criterion(predict_x,x1.unsqueeze(1 )) loss_y = criterion(predict_y,y1.unsqueeze(1 )) plt.title("Loss X:{:.4f},Loss Y:{:.4f}" .format(loss_x,loss_y)) plt.xlabel("X" ) plt.ylabel("Y" ) plt.scatter(x1,y1) plt.show()
每10000次迭代打印一次训练结果,并将其可视化如下: