神经网络

神经元

神经元

神经元和感知器本质上是一样的,当我们说神经元时,激活函数往往选择为Sigmoid\text{Sigmoid}函数或者tanh\tanh函数。

Sigmoid\text{Sigmoid}函数定义如下:

Sigmoid(x)=11+ex\text{Sigmoid}(x)=\frac1{1+e^{-x}}

那么对于输出yy

y=11+eωTxy=\frac1{1+e^{-\omega^T\cdot x}}

Sigmoid函数图像

y=Sigmoid(x)y=\text{Sigmoid}(x),则y=y(1y)y'=y(1-y)

神经网络

神经网络就是按照一定规则连接起来的多个神经元。

全连接网络

左图就是一个**全连接(full connected)**神经网络。

  • 神经元按照来布局。最左边的层叫做输入层,负责接收输入数据;最右边的层叫输出层,负责输出数据。输入层和输出层之间的层叫做隐藏层,因为它们对于外部来说是不可见的。
  • 同一层的神经元之间没有连接。
  • 第N层的每个神经元和第N-1层的所有神经元相连,第N-1层神经元的输出就是第N层神经元的输入。
  • 每个连接都有一个权值

计算神经网络的输出

神经网络

以节点4的计算为例:

a4=sigmoid(ωTx)=sigmoid(ω41x1+ω42x2+ω43x3+ω4b)a_4=\text{sigmoid}(\omega^Tx)=\text{sigmoid}(\omega_{41}x_1+\omega_{42}x_2+\omega_{43}x_3+\omega_{4b})

其中ω4b\omega_{4b}是节点4的偏置项,图中没有画出来。在给权重ωji\omega_{ji}编号时,我们把目标节点编号jj放在前面,把源节点编号ii放在后面。

这样所有节点的输出值计算完毕,我们就得到了在输入向量x=[x1x2x3]\vec x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}时,神经网络的输出向量y=[y1y2]\vec y=\begin{bmatrix}y_1\\y_2\end{bmatrix}

输入向量的维度和输入层神经元个数相同,输出向量的维度和输出层神经元个数相同。

神经网络的训练

对于人为设置的参数,如神经网络的连接方式、网络的层数、每层的节点数,这些参数叫做超参数。

神经网络

反向传播算法的推导

对目标函数

Ed12ioutputs(tiyi)2E_d\equiv\frac12\sum_{i\in\text{outputs}}(t_i-y_i)^2

也使用随机梯度下降算法对目标函数进行优化:

ωjiωjiηEdωji\omega_{ji}\larr\omega_ {ji}-\eta\frac{\partial E_d}{\partial \omega_{ji}}

netjnet_j是节点jj的加权输入,即

netj=ωjxj=iωjixjinet_j=\vec{\omega_j}\cdot\vec{x_j}=\sum_i\omega_{ji}x_{ji}

我们仍然需要Edωji\frac{\partial E_d}{\partial \omega_{ji}}. 而

Edωji=Ednetjnetjωji=Ednetjiωjixjiωji=Ednetjxji\begin{aligned}\frac{\partial E_d}{\partial\omega_{ji}}&=\frac{\partial E_d}{\partial net_j}\frac{\partial net_j}{\partial \omega_{ji}}\\&=\frac{\partial E_d}{\partial net_j}\frac{\partial\sum_i\omega_{ji}x_{ji}}{\partial\omega_{ji}}\\&=\frac{\partial E_d}{\partial net_j}x_{ji}\end{aligned}

这样就转化为求Ednetj\frac{\partial E_d}{\partial net_{j}}的问题

输出层权值训练

对于输出层来说,EdE_dyjy_j的函数,而yjy_jnetjnet_j的函数,则

Ednetj=Edyjyjnetj\frac{\partial E_d}{\partial net_j}=\frac{\partial E_d}{\partial y_j}\frac{\partial y_j}{\partial net_j}

其中,

Edyj=yj12ioutputs(tiyi)2=(tjyj)\begin{aligned}\frac{\partial E_d}{\partial y_j}&=\frac{\partial}{\partial y_j}\frac12\sum_{i\in \text{outputs}}(t_i-y_i)^2\\&=-(t_j-y_j)\end{aligned}

yjnetj=yj(1yj)\frac{\partial y_j}{\partial net_j}=y_j(1-y_j)

代入,得:

Ednetj=(tjyj)yj(1yj)\frac{\partial E_d}{\partial net_j}=-(t_j-y_j)y_j(1-y_j)

δj=Ednetj\delta_j=-\frac{\partial E_d}{\partial net_j},即一个节点的误差项,代入得:

δj=(tjyj)yj(1yj)\delta_j=(t_j-y_j)y_j(1-y_j)

即可得到ωji\omega_{ji}的更新算法

ωjiωjiηEdωji=ωji+ηδjxji\begin{aligned}\omega _{ji}&\larr\omega_{ji}-\eta\frac{\partial E_d}{\partial \omega_{ji}}\\&=\omega_{ji}+\eta\delta_jx_{ji}\end{aligned}

隐藏层权值训练

对于隐藏层来说,EdE_d是下游节点的加权输入netknet_k的函数,而netknet_k是隐藏层的加权输入netjnet_j的函数,因此

Ednetj=kdownstream(j)Ednetknetknetj=kdownstream(j)δknetkajajnetj=kdownstream(j)δkωkjajnetj=kdownstream(j)δkωkjaj(1aj)=aj(1aj)kdownstream(j)δkωkj\begin{aligned}\frac{\partial E_d}{\partial net_j}&=\sum_{k\in\text{downstream(j)}}\frac{\partial E_d}{\partial net_k}\frac{\partial net_k}{\partial net_j}\\&=\sum_{k\in\text{downstream(j)}}-\delta_k\frac{\partial net_k}{\partial a_j}\frac{\partial a_j}{\partial net_j}\\&=\sum_{k\in\text{downstream(j)}}-\delta_k\omega_{kj}\frac{\partial a_j}{\partial net_j}\\&=\sum_{k\in\text{downstream(j)}}-\delta_k\omega_{kj}a_j(1-a_j)\\&=-a_j(1-a_j)\sum_{k\in\text{downstream(j)}}\delta_k\omega_{kj}\end{aligned}

因为δj=Ednetj\delta_j=-\frac{\partial E_d}{\partial net_j},代入得:

δj=aj(1aj)kdownstream(j)δkωkj\delta_j=a_j(1-a_j)\sum_{k\in\text{downstream(j)}}\delta_k\omega_{kj}

反向传播算法

神经网络

刚刚已经推导得出输出层和隐藏层各自的δi\delta_i计算方法。下面是反向传播算法的实现,以上图的神经网络为例:

  1. 首先,根据输入x1,x2,x3x_1,x_2,x_3和激活函数sigmoid(x)\text{sigmoid}(x)计算隐藏层节点4,5,6,7的输出a4,a5,a6,a7a_4,a_5,a_6,a_7

  2. 根据a4,a5,a6,a7a_4,a_5,a_6,a_7计算输出层节点8,9的输出y1,y2y_1,y_2

  3. 根据y1,y2y_1,y_2和标签t1,t2t_1,t_2计算输出层节点8,9的误差项δ8,δ9\delta_8,\delta_9

    对于输出层节点ii

    δi=yi(1yi)(tiyi)\delta_i=y_i(1-y_i)(t_i-y_i)

  4. 根据δ8,δ9\delta_8,\delta_9和隐藏层到输出层的权重ω\omega计算隐藏层节点a4,a5,a6,a7a_4,a_5,a_6,a_7的误差项δ4,δ5,δ6,δ7\delta_4,\delta_5,\delta_6,\delta_7

    对于隐藏层节点ii

    δi=ai(1ai)koutputsδkωki\delta_i=a_i(1-a_i)\sum_{k\in\text{outputs}}\delta_k\omega_{ki}

  5. 最后,更新每个连接上的权值:

    ωjiωji+ηδjxi\omega_{ji}\larr\omega_{ji}+\eta\delta_jx_i

    偏置项的输入值永远为1.因此,偏置项ωjb\omega_{jb}采用如下方法计算:

    ωjbωjb=ηδj\omega_{jb}\larr\omega_{jb}=\eta\delta_j

神经网络的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
import math
from functools import reduce
from random import random

def sigmoid(x):
#Sigmoid函数
return 1 / (1 + math.exp(-x))

class Node(object):
#节点类
def __init__(self, layer_index, node_index):
self.layer_index = layer_index
self.node_index = node_index
self.downstream = []
self.upstream = []
self.output = 0.0
self.delta = 0

def set_output(self, output):
self.output = output

def append_downstream_connection(self, conn):
self.downstream.append(conn)

def append_upstream_connection(self, conn):
self.upstream.append(conn)

def calc_output(self):
output = reduce(lambda ret, conn: ret + conn.upstream_node.output * conn.weight, self.upstream, 0.0)
self.output = sigmoid(output)

def calc_hidden_layer_delta(self):
#计算[隐藏层节点的误差](https://www.notion.so/1dab9196b5df804eab50fe661afe42f8?pvs=21)\delta
downstream_delta = reduce(
lambda ret, conn: ret + conn.downstream_node.delta * conn.weight,
self.downstream, 0.0)
self.delta = self.output * (1 - self.output) * downstream_delta

def calc_output_layer_delta(self, label):
#计算[输出层节点的误差](https://www.notion.so/1dab9196b5df804eab50fe661afe42f8?pvs=21)\delta
self.delta = self.output * (1 - self.output) * (label - self.output)

def __str__(self):
node_str = '%u-%u: output:%f, delta:%f\n' % (self.layer_index, self.node_index, self.output, self.delta)
downstream_str = reduce(lambda ret, conn: ret + '\n\t' + str(conn), self.downstream, '')
upstream_str = reduce(lambda ret, conn: ret + '\n\t' + str(conn), self.upstream, '')
return node_str + '\n\tdownstream:' + downstream_str + '\n\tupstream:' + upstream_str

class ConstNode(object):
#偏置节点类
def __init__(self, layer_index, node_index):
self.layer_index = layer_index
self.node_index = node_index
self.downstream = []
self.output = 1.0

def __str__(self):
node_str = '%u-%u: output:1\n' % (self.layer_index, self.node_index)
downstream_str = reduce(lambda ret, conn: ret + '\n\t' + str(conn), self.downstream, '')
return node_str + '\n\tdownstream:' + downstream_str

class Layer(object):
#层类
def __init__(self, layer_index, node_count):
self.layer_index = layer_index
self.nodes = []
for i in range(node_count):
self.nodes.append(Node(layer_index, i))
self.nodes.append(ConstNode(layer_index, node_count))

def set_output(self, data):
for i in range(len(data)):
self.nodes[i].set_output(data[i])

def calc_output(self):
for node in self.nodes:
node.calc_output()

def dump(self):
for node in self.nodes:
print(node)

class Connection(object):
#连接类
def __init__(self, upstream_node, downstream_node):
self.upstream_node = upstream_node
self.downstream_node = downstream_node
self.weight = random.uniform(-0.1, 0.1) #初始化权重
self.gradient = 0.0

def calc_gradient(self):
self.gradient = self.downstream_node.delta * self.upstream_node.output

def get_gradient(self):
return self.gradient

def update_weight(self, rate):
#更新权重
self.calc_gradient()
self.weight += rate * self.gradient

def __str__(self):
return '%u-%u -> %u-%u: weight:%f, gradient:%f' % (
self.upstream_node.layer_index,
self.upstream_node.node_index,
self.downstream_node.layer_index,
self.downstream_node.node_index,
self.weight, self.gradient)

class Connections(object):
def __init__(self):
self.connections = []

def add_connection(self, connection):
self.connections.append(connection)

def dump(self):
for conn in self.connections:
print(conn)

class Network(object):
#神经网络类
def __init__(self, layers):
self.connections = Connections()
self.layers = []
layer_count = len(layers)
for i in range(layer_count):
#初始化层
self.layers.append(Layer(i, layers[i]))
for layer in range(layer_count - 1):
#每层节点进行连接
connections = [Connection(upstream_node, downstream_node)
for upstream_node in self.layers[layer].nodes
for downstream_node in self.layers[layer + 1].nodes[:-1]]
for conn in connections:
self.connections.add_connection(conn)
conn.upstream_node.append_downstream_connection(conn)
conn.downstream_node.append_upstream_connection(conn)

def train(self, labels, data_set, rate, iteration):
for i in range(iteration):
for d in range(len(data_set)):
self.train_one_sample(labels[d], data_set[d], rate)

def train_one_sample(self, label, sample, rate):
self.predict(sample) #前向预测
self.calc_delta(label) #反向传播
self.update_weights(rate) #更新权重

def calc_delta(self, label):
output_nodes = self.layers[-1].nodes
for i in range(len(label)):
output_nodes[i].calc_output_layer_delta(label[i])
for layer in self.layers[-2::-1]:
for node in layer.nodes:
node.calc_hidden_layer_delta()

def update_weight(self, rate):
for layer in self.layers[:-1]:
for node in layer.nodes:
for conn in node.downstream:
conn.update_weight(rate)

def calc_gradient(self):
for layer in self.layers[:-1]:
for node in layer.nodes:
for conn in node.upstream:
conn.calc_gradient()

def get_gradient(self, label, sample):
self.predict(sample)
self.calc_delta(label)
self.calc_gradient()

def predict(self, sample):
self.layers[0].set_output(sample)
for i in range(1, len(self.layers)):
self.layers[i].calc_output()
return map(lambda node: node.output, self.layers[-1].nodes[:-1])

def dump(self):
for layer in self.layers:
layer.dump()

梯度检查

如何检查梯度的计算是否正确?

对于Edωji\frac{\partial E_d}{\partial\omega_{ji}},

Ed(ωji)ωji=limϵ0Ed(ωji+ϵ)Ed(ωjiϵ)2ϵ\frac{\partial E_d(\omega_{ji})}{\partial\omega_{ji}}=\lim_{\epsilon\rarr0}\frac{E_d(\omega_{ji}+\epsilon)-E_d(\omega_{ji}-\epsilon)}{2\epsilon}

ϵ\epsilon为一个很小的数如10410^{-4},

Ed(ωji)ωjiEd(ωji+ϵ)Ed(ωjiϵ)2ϵ\frac{\partial E_d(\omega_{ji})}{\partial\omega_{ji}}\approx\frac{E_d(\omega_{ji}+\epsilon)-E_d(\omega_{ji}-\epsilon)}{2\epsilon}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from functools import reduce

def gradient_check(network, sample_feature, sample_label):
network_error = lambda vec1, vec2: \ #1/2(a-b)^2
0.5 * reduce(lambda a, b: a + b,
map(lambda v: (v[0] - v[1]) * (v[0] - v[1]),
zip(vec1, vec2)))

network.get_gradient(sample_feature, sample_label)

for conn in network.connections.connections:
actual_gradient = conn.get_gradient()
epsilon = 0.0001
conn.weight += epsilon
error1 = network_error(network.predict(sample_feature), sample_label)
conn.weight -= 2 * epsilon
error2 = network_error(network.predict(sample_feature), sample_label)
expected_gradient = (error2 - error1) / (2 * epsilon)
print
'expected gradient: \t%f\nactual gradient: \t%f' % (
expected_gradient, actual_gradient)

神经网络实战——手写数字识别

超参数的确定

输入层节点数是确定的,因为MNIST数据集每个训练数据是282828*28的图片,共784784个像素,因此输入层节点数应该是784784,每个像素对应一个输入节点。输出层节点数也是确定的。因为数字只可能是0~9中的一个,所以是1010分类,可以用1010个节点作为输出层。输出最大值的那个节点所对应的分类就是模型的预测结果。

隐藏层的节点数量不好确定。有几个经验公式如下。

nn为输入层节点数,ll为输出层节点数,α\alpha为一个1~10之间的常数:

m=n+l+αm=log2nm=nl\begin{aligned}m&=\sqrt{n+l}+\alpha\\m&=\log_2n\\m&=\sqrt{nl}\end{aligned}

这里先设置隐藏层节点数为300300.

模型训练和评估

MNIST数据集有10000个测试样本,先用60000个训练样本训练网络,再用测试样本对网络进行测试。错误率

错误率=错误预测样本数总样本数错误率=\frac{错误预测样本数}{总样本数}

每训练10轮评估一次准确率,当准确率开始下降(出现过拟合)终止训练。

代码实现

首先将MNIST数据集处理为神经网络能够接受的形式。将28*28的图像按行优先转化为一个784维的向量,每个标签是0~9的值,将其转换为一个10维的one-hot 向量:如果标签值为n,就把向量的第n维设置为0.9,其余维设为0.1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import struct
from bp import *
from datetime import datetime

# 数据加载器基类
class Loader(object):
def __init__(self, path, count):
self.path = path
self.count = count

def get_file_content(self):
f = open(self.path, 'rb')
content = f.read()
f.close()
return content

def to_int(self, byte):
return struct.unpack('B', byte)[0]

# 图像数据加载器
class ImageLoader(Loader):
def get_picture(self, content, index):
start = index * 28 * 28 + 16
picture = []
for i in range(28):
picture.append([])
for j in range(28):
picture[i].append(
self.to_int(content[start + i * 28 + j]))
return picture

def get_one_sample(self, picture):
sample = []
for i in range(28):
for j in range(28):
sample.append(picture[i][j])
return sample

def load(self):
content = self.get_file_content()
data_set = []
for index in range(self.count):
data_set.append(
self.get_one_sample(
self.get_picture(content, index)))
return data_set

# 标签数据加载器
class LabelLoader(Loader):
def load(self):
content = self.get_file_content()
labels = []
for index in range(self.count):
labels.append(self.norm(content[index + 8]))
return labels

def norm(self, label):
label_vec = []
label_value = self.to_int(label)
for i in range(10):
if i == label_value:
label_vec.append(0.9)
else:
label_vec.append(0.1)
return label_vec

def get_training_data_set():
image_loader = ImageLoader('train-images-idx3-ubyte', 60000)
label_loader = LabelLoader('train-labels-idx1-ubyte', 60000)
return image_loader.load(), label_loader.load()

def get_test_data_set():
image_loader = ImageLoader('t10k-images-idx3-ubyte', 10000)
label_loader = LabelLoader('t10k-labels-idx1-ubyte', 10000)
return image_loader.load(), label_loader.load()

def get_result(vec):
max_value_index = 0
max_value = 0
for i in range(len(vec)):
if vec[i] > max_value:
max_value = vec[i]
max_value_index = i
return max_value_index

def evaluate(network, test_data_set, test_labels):
error = 0
total = len(test_data_set)

for i in range(total):
label = get_result(test_labels[i])
predict = get_result(network.predict(test_data_set[i]))
if label != predict:
error += 1
return float(error) / float(total)

def now():
return datetime.now().strftime('%c')

def train_and_evaluate():
last_error_ratio = 1.0
epoch = 0
train_data_set, train_labels = get_training_data_set()
test_data_set, test_labels = get_test_data_set()
network = Network([784, 300, 10])
while True:
epoch += 1
network.train(train_labels, train_data_set, 0.3, 1)
print ('%s epoch %d finished ' % (now(), epoch))
if epoch % 10 == 0:
error_ratio = evaluate(network, test_data_set, test_labels)
print ('%s after epoch %d, error ratio is %f') % (now(), epoch, error_ratio)
if error_ratio > last_error_ratio:
break
else:
last_error_ratio = error_ratio

if __name__ == '__main__':
train_and_evaluate()

向量化编程

下面用向量化编程的方法重新实现本全连接网络,将所有的计算都表达为向量的形式。

对于前向计算:

a=σ(Wx)\vec a=\sigma(W\cdot\vec x)

对于反向传播,我们将其用向量来表示:

δ=y(1y)(ty)δ(l)=a(l)(1a(l))WTδ(l+1)\vec\delta=\vec y(1-\vec y)(\vec t-\vec y)\\\vec{\delta^{(l)}}=\vec{a^{(l)}}(1-\vec{a^{(l)}})W^T\delta^{(l+1)}

其中,δ(l)\delta^{(l)}表示第ll层的误差项,WTW^T表示矩阵WW的转置

还需将权重数组WW和偏置项bb的梯度计算进行向量化表示。

WW+ηδxTbb+ηδW\larr W+\eta\vec\delta\vec{x^T}\\\vec b\larr\vec b+\eta\vec\delta

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
import random
import numpy as np

# 全连接层实现类
class FullConnectedLayer(object):
def __init__(self, input_size, output_size,
activator):
'''
构造函数
input_size: 本层输入向量的维度
output_size: 本层输出向量的维度
activator: 激活函数
'''
self.input_size = input_size
self.output_size = output_size
self.activator = activator
# 权重数组W
self.W = np.random.uniform(-0.1, 0.1,
(output_size, input_size))
# 偏置项b
self.b = np.zeros((output_size, 1))
# 输出向量
self.output = np.zeros((output_size, 1))

def forward(self, input_array):
'''
前向计算
input_array: 输入向量,维度必须等于input_size
'''
# 式2
self.input = input_array
self.output = self.activator.forward(
np.dot(self.W, input_array) + self.b)

def backward(self, delta_array):
'''
反向计算W和b的梯度
delta_array: 从上一层传递过来的误差项
'''
# 式8
self.delta = self.activator.backward(self.input) * np.dot(
self.W.T, delta_array)
self.W_grad = np.dot(delta_array, self.input.T)
self.b_grad = delta_array

def update(self, learning_rate):
'''
使用梯度下降算法更新权重
'''
self.W += learning_rate * self.W_grad
self.b += learning_rate * self.b_grad

class SigmoidActivator(object):
def forward(self, weighted_input):
'''
Sigmoid激活函数的正向计算
'''
return 1.0 / (1.0 + np.exp(-weighted_input))

def backward(self, output):
'''
Sigmoid激活函数的反向计算
'''
return output * (1.0 - output)

# 神经网络类
class Network(object):
def __init__(self, layers):
'''
构造函数
layers: 各层的节点数
'''
self.layers = []
for i in range(len(layers) - 1):
self.layers.append(
FullConnectedLayer(
layers[i], layers[i + 1],
SigmoidActivator()
)
)

def predict(self, sample):
'''
使用神经网络实现预测
sample: 输入样本
'''
output = sample
for layer in self.layers:
layer.forward(output)
output = layer.output
return output

def train(self, labels, data_set, rate, epoch):
'''
训练函数
labels: 样本标签
data_set: 输入样本
rate: 学习速率
epoch: 训练轮数
'''
for i in range(epoch):
for d in range(len(data_set)):
self.train_one_sample(labels[d],
data_set[d], rate)

def train_one_sample(self, label, sample, rate):
'''
用一个样本训练网络
'''
self.predict(sample)
self.calc_gradient(label)
self.update_weight(rate)

def calc_gradient(self, label):
'''
计算每个连接的梯度
'''
delta = self.layers[-1].activator.backward(
self.layers[-1].output
) * (label - self.layers[-1].output)
for layer in self.layers[::-1]:
layer.backward(delta)
delta = layer.delta
return delta

def update_weight(self, rate):
'''
更新每个连接的权重
'''
for layer in self.layers:
layer.update(rate)

def dump(self):
for layer in self.layers:
layer.dump()

def loss(self, output, label):
'''
计算平方误差
'''
return 0.5 * ((label - output) * (label - output)).sum()

def gradient_check(self, sample_feature, sample_label):
'''
梯度检查
sample_feature: 样本的特征
sample_label: 样本的标签
'''

# 获取网络在当前样本下每个连接的梯度
self.predict(sample_feature)
self.calc_gradient(sample_label)

# 检查梯度
epsilon = 10e-4
for fc in self.layers:
for i in range(fc.W.shape[0]):
for j in range(fc.W.shape[1]):
fc.W[i, j] += epsilon
output = self.predict(sample_feature)
err1 = self.loss(sample_label, output)
fc.W[i, j] -= 2 * epsilon
output = self.predict(sample_feature)
err2 = self.loss(sample_label, output)
expect_grad = (err1 - err2) / (2 * epsilon)
fc.W[i, j] += epsilon
print('weights(%d,%d): expected - actural %.4e - %.4e' % (
i, j, expect_grad, fc.W_grad[i, j]))