mlp原来是这么回事

尽整些没用的 2023-12-01 原文

开篇先告诉自己一件事，nerf用的是最快的relu激活，因为relu没有梯度消失现象，所以快，

至于这种现象的解释请看下图（还有elu和prelu这两个梯度保留的更好，nerf跑一跑？嘻嘻！）：

ok，开始谈谈mlp，mlp实际上就是一个拥有多层神经网络的所谓多层感知机，感知机都是用来分类的

由上图可知mlp最大的作用就是可以实现非线性的分类，而为什么可进行非线性分类，就是因为这个隐藏层进行了空间的转换，也就是我前一篇博客说的为了实现非线性必须要的操作。

mlp缺点也挺多的，速度慢算一个，难怪nerf跑得这么慢，给一个转载自其他人博客的mlp代码在这：

from __future__ import print_function, division
import numpy as np
import math
from sklearn import datasets

from mlfromscratch.utils import train_test_split, to_categorical, normalize, accuracy_score, Plot
from mlfromscratch.deep_learning.activation_functions import Sigmoid, Softmax
from mlfromscratch.deep_learning.loss_functions import CrossEntropy

class MultilayerPerceptron():
"""Multilayer Perceptron classifier. A fully-connected neural network with one hidden layer.
Unrolled to display the whole forward and backward pass.
Parameters:
-----------
n_hidden: int:
The number of processing nodes (neurons) in the hidden layer.
n_iterations: float
The number of training iterations the algorithm will tune the weights for.
learning_rate: float
The step length that will be used when updating the weights.
"""
def __init__(self, n_hidden, n_iterations=3000, learning_rate=0.01):
self.n_hidden = n_hidden
self.n_iterations = n_iterations
self.learning_rate = learning_rate
self.hidden_activation = Sigmoid()
self.output_activation = Softmax()
self.loss = CrossEntropy()

def _initialize_weights(self, X, y):
n_samples, n_features = X.shape
_, n_outputs = y.shape
# Hidden layer
limit = 1 / math.sqrt(n_features)
self.W = np.random.uniform(-limit, limit, (n_features, self.n_hidden))
self.w0 = np.zeros((1, self.n_hidden))
# Output layer
limit = 1 / math.sqrt(self.n_hidden)
self.V = np.random.uniform(-limit, limit, (self.n_hidden, n_outputs))
self.v0 = np.zeros((1, n_outputs))

def fit(self, X, y):

self._initialize_weights(X, y)

for i in range(self.n_iterations):

# ..............
# Forward Pass
# ..............

# HIDDEN LAYER
hidden_input = X.dot(self.W) + self.w0
hidden_output = self.hidden_activation(hidden_input)
# OUTPUT LAYER
output_layer_input = hidden_output.dot(self.V) + self.v0
y_pred = self.output_activation(output_layer_input)

# ...............
# Backward Pass
# ...............

# OUTPUT LAYER
# Grad. w.r.t input of output layer
grad_wrt_out_l_input = self.loss.gradient(y, y_pred) * self.output_activation.gradient(output_layer_input)
grad_v = hidden_output.T.dot(grad_wrt_out_l_input)
grad_v0 = np.sum(grad_wrt_out_l_input, axis=0, keepdims=True)
# HIDDEN LAYER
# Grad. w.r.t input of hidden layer
grad_wrt_hidden_l_input = grad_wrt_out_l_input.dot(self.V.T) * self.hidden_activation.gradient(hidden_input)
grad_w = X.T.dot(grad_wrt_hidden_l_input)
grad_w0 = np.sum(grad_wrt_hidden_l_input, axis=0, keepdims=True)

# Update weights (by gradient descent)
# Move against the gradient to minimize loss
self.V -= self.learning_rate * grad_v
self.v0 -= self.learning_rate * grad_v0
self.W -= self.learning_rate * grad_w
self.w0 -= self.learning_rate * grad_w0

# Use the trained model to predict labels of X
def predict(self, X):
# Forward pass:
hidden_input = X.dot(self.W) + self.w0
hidden_output = self.hidden_activation(hidden_input)
output_layer_input = hidden_output.dot(self.V) + self.v0
y_pred = self.output_activation(output_layer_input)
return y_pred

def main():
data = datasets.load_digits()
X = normalize(data.data)
y = data.target

# Convert the nominal y values to binary
y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, seed=1)

# MLP
clf = MultilayerPerceptron(n_hidden=16,
n_iterations=1000,
learning_rate=0.01)

clf.fit(X_train, y_train)
y_pred = np.argmax(clf.predict(X_test), axis=1)
y_test = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy:", accuracy)

# Reduce dimension to two using PCA and plot the results
Plot().plot_in_2d(X_test, y_pred, title="Multilayer Perceptron", accuracy=accuracy, legend_labels=np.unique(y))

if __name__ == "__main__":
main()

这里的隐藏层是全连接层，因为这个隐藏层要换x的空间肯定是要作用于全部的x上，在卷积网络上也有全连接层但那个和这个的意思不太一样（全连接只是表示这一层于上一层所有神经元都连接了，根据各个神经元的参数不同，全连接层的作用自然也是不同的），卷积里的是用来分类，

这里全连接层的神经元是激活函数（可能有点语义表达错误和sigmoid那些应该不一样，刚看了一下是一样的，因为前一层神经元要先经过全连接层处理，然后经过激活函数处理，使用就是由激活函数判断它是否激活某个条件，我看Alex net用的是relu激活（这个函数在同样数据下激活态会多一点，我觉得可能是因为非饱和，值的范围比较大导致的，不过relu在梯度下降方面表现的似乎不错，先不管这个了））。

你如果前一层的神经元和权重的组合达到了一定的条件，那么这一层的某些神经元就会被激活（达到激活函数的条件了），最后的输出层只要把这些激活的东西拼在一起看是什么就行（当然这个拼起来的结果在数学上的表示是一个抽象值，这点我在之前的博客说过，得到了这个值就可以把它和我训练出来的猫的决策分界的值进行对比，就可以知道是不是猫了）。

有人跟我说全连接的输出维度如果小于输入维度（他称这个为隐层，我觉得和隐藏层的概念不同）是为了更好的拟合，我觉得有道理，减小了输入那原来的特征就只能被迫组合，这样也就必须出来一个组合后的产物（有点像数学上的拟合过程），叫拟合是正常的。放一个转载的连接层代码，方便理解：

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
#nn.Module子类的函数必须在构建函数中执行父类的构造函数
#下式等价于nn.Module.__init__(self)
super(Net, self).__init__()
#卷积层“1”表示输入图片为单通道，“6”表示输出通道数,‘5’表示卷积核为5*5
self.conv1 = nn.Conv2d(1, 6, 5)
#卷积层
self.conv2 = nn.Conv2d(6, 16, 5)
#全连接层，y=Wx+b
self.fc1 = nn.Linear(16*5*5, 120)
#参考第三节，这里第一层的核大小是前一层卷积层的输出和核大小16*5*5，一共120层
self.fc2 = nn.Linear(120, 84)
#接下来每一层的核大小为1*1
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
#卷积--激活--池化
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
#reshape ,'-1'表示自适应
x = x.view(x.size()[0], -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3
return x

net = Net()
print(net)

我觉得这几个函数的特点我都要放一下，方便我以后清楚他们各自的作用。

回事 mlp br self xff 算法人工智能

有关mlp原来是这么回事的更多相关文章

Ruby——嵌套类和子类是一回事吗？ - 2
下面例子中的Nested和Child有什么区别？是否只是同一事物的不同语法？classParentclassNested...endendclassChild 最佳答案不，它们是不同的。嵌套:Computer之外的“Processor”类只能作为Computer::Processor访问。嵌套为内部类(namespace)提供上下文。对于ruby解释器Computer和Computer::Processor只是两个独立的类。classComputerclassProcessor#Tocreateanobjectforthisc
ruby - Ruby 中的隐式返回值是怎么回事？ - 2
所以我开始关注ruby，很多东西看起来不错，但我对隐式return语句很反感。我理解默认情况下让所有内容返回self或nil但不是语句的最后一个值。对我来说，它看起来非常脆弱(尤其是)如果你正在使用一个不打算返回某些东西的方法(尤其是一个改变状态/破坏性方法的函数!)，其他人可能最终依赖于一个返回对方法的目的并不重要，并且有很大的改变机会。隐式返回有什么意义？有没有办法让事情变得更简单？总是有返回以防止隐含返回被认为是好的做法吗？我是不是太担心这个了？附言当人们想要从方法中返回特定的东西时，他们是否经常使用隐式返回，这不是让你组中的其他人更容易破坏彼此的代码吗？当然，记录一切并给出
ruby - 任何人都知道字符串开头的 "weird"字符是怎么回事？ - 2
我在尝试从它们的数组中检测某个字符串时遇到了一个奇怪的问题。有人知道这里发生了什么吗？(rdb:1)pmagic_string"TimePeriod"(rdb:1)pmagic_string.classString(rdb:1)pmagic_string=="TimePeriod"false(rdb:1)p"TimePeriod".length11(rdb:1)pmagic_string.length14(rdb:1)pmagic_string[0].chr"\357"(rdb:1)pmagic_string[1].chr"\273"(rdb:1)pmagic_string[2].c
ruby - ruby 的排序方法是怎么回事？ - 2
以下代码中'a'和'b'分别代表什么，又是如何表示的？工作？list=[1,2,3,4,5]list.sort{|a,b|ba}#=>[5,4,3,2,1] 最佳答案 a和b代表一对元素。它可以是从您的原始列表中取出的任意两个。通常被称为宇宙飞船运算符(operator)。如果两项相等，则返回0，如果左边一项较小，则返回-1，如果右边一项较小，则返回1。有关thespaceshipoperatorintheRubyAPIdocs的更多信息.这是Fixnum上的文档，因为那是您的示例中的内容，但您也可以在那里查看Float、Strin
ruby - 为什么 array.min 这么慢？ - 2
我注意到array.min看起来很慢，所以我针对我自己的简单实现做了这个测试:require'benchmark'array=(1..100000).to_a.shuffleBenchmark.bmbm(5)do|x|x.report("lib:"){99.times{min=array.min}}x.report("own:"){99.times{min=array[0];array.each{|n|min=nifn结果:Rehearsal-----------------------------------------lib:1.5310000.0000001.531000(1.5
ruby - 为什么 ruby 需要这么多不同类型的闭合？ - 2
据我所知，Ruby中基本上有三种不同的闭包；方法、过程和lambdas。我知道它们之间存在差异，但是我们不能只是拥有一种可以容纳所有可能用例的类型吗？通过调用self.method(method_name)已经可以像procs和lambdas一样传递方法。，我所知道的procs和lambdas之间的唯一显着区别是当您尝试使用return时，lambdas检查arity和procs会做一些疯狂的事情。.那么我们不能将它们全部合并为一个并完成它吗？最佳答案 AsfarasIcantell,thereareessentiallythre
ruby - 为什么我的 gem 需要这么长时间才能加载？ - 2
我正在开发我的第一个名为t_time_tracker的gem(哇哦!)。一切进展顺利；我尽可能地对其进行了优化，以尽可能减少执行时间:t_time_tracker[master*]%timeruby-Ilib./bin/t_time_trackerYou'renotworkingonanything0.07suser0.03ssystem67%cpu0.141total(这是我的应用程序的“helloworld”——不带参数调用它只会打印出“你没有做任何事情”)大约十分之一秒，使用了我67%的CPU-太棒了，我可以接受。感觉相当瞬间。让我们构建它:$gembuildt_time_tra
ruby-on-rails - 当我运行 `bundle exec rake test:prepare` 时它出错了，但是 `bundle exec rake db:test:prepare` 运行正常但有警告。这是怎么回事？ - 2
所以根据thislink一个是快捷方式包装器(所以我猜它们是一样的)。当我运行bundleexecrakedb:test:prepare时，我得到了这个错误:Don'tknowhowtobuildtask'test:prepare'/Users/aj/.rvm/gems/ruby-2.0.0-p451@railstutorial_rails_4_0/bin/ruby_executable_hooks:15:in`eval'/Users/aj/.rvm/gems/ruby-2.0.0-p451@railstutorial_rails_4_0/bin/ruby_executable_hoo
ruby - "find_all"和 "select"是一回事吗？ - 2
这两个语句给我相同的结果:[1,2,3,4].find_all{|x|x.even?}[1,2,3,4].select{|x|x.even?}find_all只是一个别名吗？有理由使用一个而不是另一个吗？最佳答案 #find_all和#select非常相似；差异非常微妙。在大多数情况下，它们是等价的。这取决于实现它的类。Enumerable#find_all和Enumerable#select在同一代码上运行。Array和Range也是如此，因为它们使用Enumerable实现。在Hash的情况下，#select被重新定义为返回一
震惊！原来查找论文这么简单？再也不用担心组会不知道汇报啥了！计算机硕士新生人手必备不可不看 - 2
目录一种简单上手的暴力论文分析方法——以区块链为例【含项目源码】太长不看版本：最终成果：情况说明论文推荐方面论文投稿方面以下是具体的实现，有其他研究方向想自行确定的请仔细阅读，授人以鱼不如授人以渔第一章、确定对象——研究热点的中国计算机研究生第二章、思路——基于爬虫结合关键字过滤暴力获取所需论文信息第一步：从CCF推荐目录中获取网址01、背景介绍02、数据预处理03、数据写入表格第二步：从中科院分区中获取期刊对应分区第三步：从期刊/会议对应网址中爬取到子网页并进入，获取到其中的标题、年份等信息第四步：针对获取到的表格数据进行分析和整理实际爬取数据量【其实就论文的标题+对应年份】

mlp原来是这么回事

有关mlp原来是这么回事的更多相关文章

随机推荐