TLDR
我一直在尝试在 MNIST 上安装一个简单的神经网络,它适用于一个小的调试设置,但是当我把它带到 MNIST 的一个子集时,它训练得非常快,梯度很快接近 0,但是然后对于任何给定的输入,它输出相同的值,最终成本相当高。我一直在尝试故意过度拟合以确保它确实有效,但它不会在 MNIST 上这样做,这表明设置中存在深层问题。我已经使用梯度检查检查了我的反向传播实现,它似乎匹配,所以不确定错误在哪里,或者现在要做什么!
非常感谢您提供的任何帮助,我一直在努力解决这个问题!
解释
我一直在尝试在 Numpy 中制作一个神经网络,基于这个解释:
http://ufldl.stanford.edu/wiki/index.php/Neural_Networks
http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm
反向传播似乎匹配梯度检查:
Backpropagation: [ 0.01168585, 0.06629858, -0.00112408, -0.00642625, -0.01339408,
-0.07580145, 0.00285868, 0.01628148, 0.00365659, 0.0208475 ,
0.11194151, 0.16696139, 0.10999967, 0.13873069, 0.13049299,
-0.09012582, -0.1344335 , -0.08857648, -0.11168955, -0.10506167]
Gradient Checking: [-0.01168585 -0.06629858 0.00112408 0.00642625 0.01339408
0.07580145 -0.00285868 -0.01628148 -0.00365659 -0.0208475
-0.11194151 -0.16696139 -0.10999967 -0.13873069 -0.13049299
0.09012582 0.1344335 0.08857648 0.11168955 0.10506167]
当我训练这个简单的调试设置时:a is a neural net w/ 2 inputs -> 5 hidden -> 2 outputs, and learning rate 0.5
a.gradDesc(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]]))
ie. x1 = [0.1, 0.9] and y1 = [0,1]
我得到了这些可爱的训练曲线

# Number of input, hidden and ouput nodes
# Input = 28 x 28 pixels
input_nodes=784
# Arbitrary number of hidden nodes, experiment to improve
hidden_nodes=200
# Output = one of the digits [0,1,2,3,4,5,6,7,8,9]
output_nodes=10
# Learning rate
learning_rate=0.4
# Regularisation parameter
lambd=0.0
通过在下面的代码上运行这个设置,对于 100 次迭代,它似乎首先训练然后只是“平线”很快并且没有实现非常好的模型:Initial ===== Cost (unregularised): 2.09203670985 /// Cost (regularised): 2.09203670985 Mean Gradient: 0.0321241229793
Iteration 100 Cost (unregularised): 0.980999805477 /// Cost (regularised): 0.980999805477 Mean Gradient: -5.29639499854e-09
TRAINED IN 26.45932364463806
这会给出非常差的测试准确性并预测相同的输出,即使在所有输入为 0.1 或全部为 0.9 的情况下进行测试时,我也只会得到相同的输出(尽管它输出的确切数字因初始随机权重而异):Test accuracy: 8.92
Targets 2 2 1 7 2 2 0 2 3
Hypothesis 5 5 5 5 5 5 5 5 5
MNIST 训练曲线:

# Import dependencies
import numpy as np
import time
import csv
import matplotlib.pyplot
import random
import math
# Read in training data
with open('MNIST/mnist_train_100.csv') as file:
train_data=np.array([list(map(int,line.strip().split(','))) for line in file.readlines()])
# In[197]:
# Plot a sample of training data to visualise
displayData(train_data[:,1:], 25)
# In[198]:
# Read in test data
with open('MNIST/mnist_test.csv') as file:
test_data=np.array([list(map(int,line.strip().split(','))) for line in file.readlines()])
# Main neural network class
class neuralNetwork:
# Define the architecture
def __init__(self, i, h, o, lr, lda):
# Number of nodes in each layer
self.i=i
self.h=h
self.o=o
# Learning rate
self.lr=lr
# Lambda for regularisation
self.lda=lda
# Randomly initialise the parameters, input-> hidden and hidden-> output
self.ih=np.random.normal(0.0,pow(self.h,-0.5),(self.h,self.i))
self.ho=np.random.normal(0.0,pow(self.o,-0.5),(self.o,self.h))
def predict(self, X):
# GET HYPOTHESIS ESTIMATES/ OUTPUTS
# Add bias node x(0)=1 for all training examples, X is now m x n+1
# Then compute activation to hidden node
z2=np.dot(X,self.ih.T) + 1
#print(a1.shape)
a2=sigmoid(z2)
#print(ha)
# Add bias node h(0)=1 for all training examples, H is now m x h+1
# Then compute activation to output node
z3=np.dot(a2,self.ho.T) + 1
h=sigmoid(z3)
outputs=np.argmax(h.T,axis=0)
return outputs
def backprop (self, X, y):
try:
m = X.shape[0]
except:
m=1
# GET HYPOTHESIS ESTIMATES/ OUTPUTS
# Add bias node x(0)=1 for all training examples, X is now m x n+1
# Then compute activation to hidden node
z2=np.dot(X,self.ih.T)
#print(a1.shape)
a2=sigmoid(z2)
#print(ha)
# Add bias node h(0)=1 for all training examples, H is now m x h+1
# Then compute activation to output node
z3=np.dot(a2,self.ho.T)
h=sigmoid(z3)
# Compute error/ cost for this setup (unregularised and regularise)
costReg=self.costFunc(h,y)
costUn=self.costFuncReg(h,y)
# Output error term
d3=-(y-h)*sigmoidGradient(z3)
# Hidden error term
d2=np.dot(d3,self.ho)*sigmoidGradient(z2)
# Partial derivatives for weights
D2=np.dot(d3.T,a2)
D1=np.dot(d2.T,X)
# Partial derivatives of theta with regularisation
T2Grad=(D2/m)+(self.lda/m)*(self.ho)
T1Grad=(D1/m)+(self.lda/m)*(self.ih)
# Update weights
# Hidden layer (weights 1)
self.ih-=self.lr*(((D1)/m) + (self.lda/m)*self.ih)
# Output layer (weights 2)
self.ho-=self.lr*(((D2)/m) + (self.lda/m)*self.ho)
# Unroll gradients to one long vector
grad=np.concatenate(((T1Grad).ravel(),(T2Grad).ravel()))
return costReg, costUn, grad
def backpropIter (self, X, y):
try:
m = X.shape[0]
except:
m=1
# GET HYPOTHESIS ESTIMATES/ OUTPUTS
# Add bias node x(0)=1 for all training examples, X is now m x n+1
# Then compute activation to hidden node
z2=np.dot(X,self.ih.T)
#print(a1.shape)
a2=sigmoid(z2)
#print(ha)
# Add bias node h(0)=1 for all training examples, H is now m x h+1
# Then compute activation to output node
z3=np.dot(a2,self.ho.T)
h=sigmoid(z3)
# Compute error/ cost for this setup (unregularised and regularise)
costUn=self.costFunc(h,y)
costReg=self.costFuncReg(h,y)
gradW1=np.zeros(self.ih.shape)
gradW2=np.zeros(self.ho.shape)
for i in range(m):
delta3 = -(y[i,:]-h[i,:])*sigmoidGradient(z3[i,:])
delta2 = np.dot(self.ho.T,delta3)*sigmoidGradient(z2[i,:])
gradW2= gradW2 + np.outer(delta3,a2[i,:])
gradW1 = gradW1 + np.outer(delta2,X[i,:])
# Update weights
# Hidden layer (weights 1)
#self.ih-=self.lr*(((gradW1)/m) + (self.lda/m)*self.ih)
# Output layer (weights 2)
#self.ho-=self.lr*(((gradW2)/m) + (self.lda/m)*self.ho)
# Unroll gradients to one long vector
grad=np.concatenate(((gradW1).ravel(),(gradW2).ravel()))
return costUn, costReg, grad
def gradDesc(self, X, y):
# Backpropagate to get updates
cost,costreg,grad=self.backpropIter(X,y)
# Unroll parameters
deltaW1=np.reshape(grad[0:self.h*self.i],(self.h,self.i))
deltaW2=np.reshape(grad[self.h*self.i:],(self.o,self.h))
# m = no. training examples
m=X.shape[0]
#print (self.ih)
self.ih -= self.lr * ((deltaW1))#/m) + (self.lda * self.ih))
self.ho -= self.lr * ((deltaW2))#/m) + (self.lda * self.ho))
#print(deltaW1)
#print(self.ih)
return cost,costreg,grad
# Gradient checking to compute the gradient numerically to debug backpropagation
def gradCheck(self, X, y):
# Unroll theta
theta=np.concatenate(((self.ih).ravel(),(self.ho).ravel()))
# perturb will add and subtract epsilon, numgrad will store answers
perturb=np.zeros(len(theta))
numgrad=np.zeros(len(theta))
# epsilon, e is a small number
e = 0.00001
# Loop over all theta
for i in range(len(theta)):
# Perturb is zeros with one index being e
perturb[i]=e
loss1=self.costFuncGradientCheck(theta-perturb, X, y)
loss2=self.costFuncGradientCheck(theta+perturb, X, y)
# Compute numerical gradient and update vectors
numgrad[i]=(loss1-loss2)/(2*e)
perturb[i]=0
return numgrad
def costFuncGradientCheck(self,theta,X,y):
T1=np.reshape(theta[0:self.h*self.i],(self.h,self.i))
T2=np.reshape(theta[self.h*self.i:],(self.o,self.h))
m=X.shape[0]
# GET HYPOTHESIS ESTIMATES/ OUTPUTS
# Compute activation to hidden node
z2=np.dot(X,T1.T)
a2=sigmoid(z2)
# Compute activation to output node
z3=np.dot(a2,T2.T)
h=sigmoid(z3)
cost=self.costFunc(h, y)
return cost #+ ((self.lda/2)*(np.sum(pow(T1,2)) + np.sum(pow(T2,2))))
def costFunc(self, h, y):
m=h.shape[0]
return np.sum(pow((h-y),2))/m
def costFuncReg(self, h, y):
cost=self.costFunc(h, y)
return cost #+ ((self.lda/2)*(np.sum(pow(self.ih,2)) + np.sum(pow(self.ho,2))))
# Helper functions to compute sigmoid and gradient for an input number or matrix
def sigmoid(Z):
return np.divide(1,np.add(1,np.exp(-Z)))
def sigmoidGradient(Z):
return sigmoid(Z)*(1-sigmoid(Z))
# Pre=processing helper functions
# Normalise data to 0.1-1 as 0 inputs kills the weights and changes
def scaleDataVec(data):
return (np.asfarray(data[1:]) / 255.0 * 0.99) + 0.1
def scaleData(data):
return (np.asfarray(data[:,1:]) / 255.0 * 0.99) + 0.1
# DISPLAY DATA
# plot_data will be what to plot, num_ex must be a square number of how many examples to plot, random examples will then be plotted
def displayData(plot_data, num_ex, rand=1):
if rand==0:
data=plot_data
else:
rand_indexes=random.sample(range(plot_data.shape[0]),num_ex)
data=plot_data[rand_indexes,:]
# Useful variables, m= no. train ex, n= no. features
m=data.shape[0]
n=data.shape[1]
# Shape for one example
example_width=math.ceil(math.sqrt(n))
example_height=math.ceil(n/example_width)
# No. of items to display
display_rows=math.floor(math.sqrt(m))
display_cols=math.ceil(m/display_rows)
# Padding between images
pad=1
# Setup blank display
display_array = -np.ones((pad + display_rows * (example_height + pad), (pad + display_cols * (example_width + pad))))
curr_ex=0
for i in range(1,display_rows+1):
for j in range(1,display_cols+1):
if curr_ex>m:
break
# Max value of this patch
max_val=max(abs(data[curr_ex, :]))
display_array[pad + (j-1) * (example_height + pad) : j*(example_height+1), pad + (i-1) * (example_width + pad) : i*(example_width+1)] = data[curr_ex, :].reshape(example_height, example_width)/max_val
curr_ex+=1
matplotlib.pyplot.imshow(display_array, cmap='Greys', interpolation='None')
# In[312]:
a=neuralNetwork(2,5,2,0.5,0.0)
print(a.backpropIter(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]])))
print(a.gradCheck(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]])))
D=[]
C=[]
for i in range(100):
c,b,d=a.gradDesc(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]]))
C.append(c)
D.append(np.mean(d))
#print(c)
print(a.predict(np.array([[0.1,0.9]])))
# Debugging plot
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(C)
matplotlib.pyplot.ylabel("Error")
matplotlib.pyplot.xlabel("Iterations")
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(D)
matplotlib.pyplot.ylabel("Gradient")
matplotlib.pyplot.xlabel("Iterations")
#print(J)
# In[313]:
# Class instance
# Number of input, hidden and ouput nodes
# Input = 28 x 28 pixels
input_nodes=784
# Arbitrary number of hidden nodes, experiment to improve
hidden_nodes=200
# Output = one of the digits [0,1,2,3,4,5,6,7,8,9]
output_nodes=10
# Learning rate
learning_rate=0.4
# Regularisation parameter
lambd=0.0
# Create instance of Nnet class
nn=neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate,lambd)
# In[314]:
time1=time.time()
# Scale inputs
inputs=scaleData(train_data)
# 0.01-0.99 range as the sigmoid function can't reach 0 or 1, 0.01 for all except 0.99 for target
targets=(np.identity(output_nodes)*0.98)[train_data[:,0],:]+0.01
J=[]
JR=[]
Grad=[]
iterations=100
for i in range(iterations):
j,jr,grad=nn.gradDesc(inputs, targets)
grad=np.mean(grad)
if i == 0:
print("Initial ===== Cost (unregularised): ", j, "\t///", "Cost (regularised): ",jr," Mean Gradient: ",grad)
print("\r", end="")
print("Iteration ", i+1, "\tCost (unregularised): ", j, "\t///", "Cost (regularised): ", jr," Mean Gradient: ",grad,end="")
J.append(j)
JR.append(jr)
Grad.append(grad)
time2 = time.time()
print ("\nTRAINED IN ",time2-time1)
# In[315]:
# Debugging plot
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(J)
matplotlib.pyplot.plot(JR)
matplotlib.pyplot.ylabel("Error")
matplotlib.pyplot.xlabel("Iterations")
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(Grad)
matplotlib.pyplot.ylabel("Gradient")
matplotlib.pyplot.xlabel("Iterations")
#print(J)
# In[316]:
# Scale inputs
inputs=scaleData(test_data)
# 0.01-0.99 range as the sigmoid function can't reach 0 or 1, 0.01 for all except 0.99 for target
targets=test_data[:,0]
h=nn.predict(inputs)
score=[]
targ=[]
hyp=[]
for i,line in enumerate(targets):
if line == h[i]:
score.append(1)
else:
score.append(0)
hyp.append(h[i])
targ.append(line)
print("Test accuracy: ", sum(score)/len(score)*100)
indexes=random.sample(range(len(hyp)),9)
print("Targets ",end="")
for j in indexes:
print (targ[j]," ",end="")
print("\nHypothesis ",end="")
for j in indexes:
print (hyp[j]," ",end="")
displayData(test_data[indexes, 1:], 9, rand=0)
# In[277]:
nn.predict(0.9*np.ones((784,)))
编辑 1

Initial ===== Cost (unregularised): 4.07208963507 /// Cost (regularised): 4.07208963507 Mean Gradient: 0.0540251381858
Iteration 50 Cost (unregularised): 0.613310215166 /// Cost (regularised): 0.613310215166 Mean Gradient: -0.000133981500849Initial ===== Cost (unregularised): 5.67535252616 /// Cost (regularised): 5.67535252616 Mean Gradient: 0.0644797515914
Iteration 50 Cost (unregularised): 0.381080434935 /// Cost (regularised): 0.381080434935 Mean Gradient: 0.000427866902699Initial ===== Cost (unregularised): 3.54658422176 /// Cost (regularised): 3.54658422176 Mean Gradient: 0.0672211732868
Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: 2.34515341943e-20Initial ===== Cost (unregularised): 4.05269658215 /// Cost (regularised): 4.05269658215 Mean Gradient: 0.0469666696193
Iteration 50 Cost (unregularised): 0.980999999999 /// Cost (regularised): 0.980999999999 Mean Gradient: -1.0582706063e-14Initial ===== Cost (unregularised): 2.40881492228 /// Cost (regularised): 2.40881492228 Mean Gradient: 0.0516056901574
Iteration 50 Cost (unregularised): 1.74539997258 /// Cost (regularised): 1.74539997258 Mean Gradient: 1.01955789614e-09Initial ===== Cost (unregularised): 2.58498876008 /// Cost (regularised): 2.58498876008 Mean Gradient: 0.0388768685257
Iteration 3 Cost (unregularised): 1.72520399313 /// Cost (regularised): 1.72520399313 Mean Gradient: 0.0134040908157
Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: -4.49319474346e-43Initial ===== Cost (unregularised): 4.40141352357 /// Cost (regularised): 4.40141352357 Mean Gradient: 0.0689167742968
Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: -1.01563966458e-22
0.01 的学习率,相当低,有最好的结果,但是探索这个区域的学习率,我只得出了 30-40% 的准确率,比我之前看到的 8% 甚至 0% 有了很大的改进,但并不是它应该实现的目标!

Test accuracy: 61.150000000000006
Targets 6 9 8 2 2 2 4 3 8
Hypothesis 6 9 8 4 7 1 4 3 8
编辑3,什么数据集?最佳答案
解决了
我解决了我的神经网络。下面是一个简短的描述,以防它对其他人有帮助。感谢所有帮助提供建议的人。
基本上,我已经用完全矩阵方法实现了它,即。反向传播每次都使用所有示例。后来我尝试将它实现为向量方法,即。每个例子的反向传播。那时我意识到矩阵方法不会更新每个示例的参数,因此通过这种方式运行与依次运行每个示例不同,实际上整个训练集作为一个示例进行反向传播。因此,我的矩阵实现确实有效,但经过多次迭代,最终花费的时间比向量方法更长!已经打开了一个新问题以了解有关此特定部分的更多信息,但是我们开始了,它只需要使用矩阵方法或更渐进的示例方法进行大量迭代。
关于python - 调试神经网络,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42140866/
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题?更新问题,以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点?我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等),但我确实想创建一个输出文件。
GivenIamadumbprogrammerandIamusingrspecandIamusingsporkandIwanttodebug...mmm...let'ssaaay,aspecforPhone.那么,我应该把“require'ruby-debug'”行放在哪里,以便在phone_spec.rb的特定点停止处理?(我所要求的只是一个大而粗的箭头,即使是一个有挑战性的程序员也能看到:-3)我已经尝试了很多位置,除非我没有正确测试它们,否则会发生一些奇怪的事情:在spec_helper.rb中的以下位置:require'rubygems'require'spork'
我想在Ruby中创建一个用于开发目的的极其简单的Web服务器(不,不想使用现成的解决方案)。代码如下:#!/usr/bin/rubyrequire'socket'server=TCPServer.new('127.0.0.1',8080)whileconnection=server.acceptheaders=[]length=0whileline=connection.getsheaders想法是从命令行运行这个脚本,提供另一个脚本,它将在其标准输入上获取请求,并在其标准输出上返回完整的响应。到目前为止一切顺利,但事实证明这真的很脆弱,因为它在第二个请求上中断并出现错误:/usr/b
使用Ruby1.9.2运行IDE提示说需要gemruby-debug-base19x并提供安装它。但是,在尝试安装它时会显示消息Failedtoinstallgems.Followinggemswerenotinstalled:C:/ProgramFiles(x86)/JetBrains/RubyMine3.2.4/rb/gems/ruby-debug-base19x-0.11.30.pre2.gem:Errorinstallingruby-debug-base19x-0.11.30.pre2.gem:The'linecache19'nativegemrequiresinstall
这个问题在这里已经有了答案:关闭10年前。PossibleDuplicate:Pythonconditionalassignmentoperator对于这样一个简单的问题表示歉意,但是谷歌搜索||=并不是很有帮助;)Python中是否有与Ruby和Perl中的||=语句等效的语句?例如:foo="hey"foo||="what"#assignfooifit'sundefined#fooisstill"hey"bar||="yeah"#baris"yeah"另外,类似这样的东西的通用术语是什么?条件分配是我的第一个猜测,但Wikipediapage跟我想的不太一样。
我有:When/^(?:|I)follow"([^"]*)"(?:within"([^"]*)")?$/do|link,selector|with_scope(selector)doclick_link(link)endend我打电话的地方:Background:GivenIamanexistingadminuserWhenIfollow"CLIENTS"我的HTML是这样的:CLIENTS我一直收到这个错误:.F-.F--U-----U(::)failedsteps(::)nolinkwithtitle,idortext'CLIENTS'found(Capybara::Element
什么是ruby的rack或python的Java的wsgi?还有一个路由库。 最佳答案 来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht
华为OD机试题本篇题目:明明的随机数题目输入描述输出描述:示例1输入输出说明代码编写思路最近更新的博客华为od2023|什么是华为od,od薪资待遇,od机试题清单华为OD机试真题大全,用Python解华为机试题|机试宝典【华为OD机试】全流程解析+经验分享,题型分享,防作弊指南华为o
我想解析一个已经存在的.mid文件,改变它的乐器,例如从“acousticgrandpiano”到“violin”,然后将它保存回去或作为另一个.mid文件。根据我在文档中看到的内容,该乐器通过program_change或patch_change指令进行了更改,但我找不到任何在已经存在的MIDI文件中执行此操作的库.他们似乎都只支持从头开始创建的MIDI文件。 最佳答案 MIDIpackage会为您完成此操作,但具体方法取决于midi文件的原始内容。一个MIDI文件由一个或多个音轨组成,每个音轨是十六个channel中任何一个上的
网络编程套接字网络编程基础知识理解源`IP`地址和目的`IP`地址理解源MAC地址和目的MAC地址认识端口号理解端口号和进程ID理解源端口号和目的端口号认识`TCP`协议认识`UDP`协议网络字节序socket编程接口`sockaddr``UDP`网络程序服务器端代码逻辑:需要用到的接口服务器端代码`udp`客户端代码逻辑`udp`客户端代码`TCP`网络程序服务器代码逻辑多个版本服务器单进程版本多进程版本多线程版本线程池版本服务器端代码客户端代码逻辑客户端代码TCP协议通讯流程TCP协议的客户端/服务器程序流程三次握手(建立连接)数据传输四次挥手(断开连接)TCP和UDP对比网络编程基础知识