机器学习(一)回归算法
有监督算法属性(X)与标签(Y)之间的映射关系,在算法学习过程中试图寻找一个函数$h:R^d->R$使得参数之间的关系拟合性最好。输入值(属性值)是一个d维度的属性/数值向量y ( i ) = θ T x ( i ) + ϵ ( i ) y^{(i)}=\theta^Tx^{(i)}+\epsilon^{(i)} y(i)=θTx(i)+ϵ(i)
误差 ϵ ( i ) ( 1 ≤ i ≤ n ) \epsilon^{(i)}(1\le i \le n) ϵ(i)(1≤i≤n)是独立同分布的,服从均值为0,方差为某定值 δ 2 \delta^2 δ2的高斯分布。
实际问题中,很多随机现象可以看作众多因素的独立影响的综合反应,往往服从正态分布
y ( i ) = θ T x ( i ) + ϵ ( i ) y^{(i)}=\theta^Tx^{(i)}+\epsilon^{(i)} y(i)=θTx(i)+ϵ(i)
p ( ϵ ( i ) ) = 1 δ 2 π e − ( ϵ ( i ) ) 2 2 δ 2 p(\epsilon^{(i)})=\frac{1}{\delta \sqrt{2\pi}}e^{-\frac{(\epsilon^{(i)})^2}{2\delta^2}} p(ϵ(i))=δ2π1e−2δ2(ϵ(i))2
p ( y ( i ) ∣ x ( i ) ; θ ) = 1 δ 2 π e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 δ 2 ) p(y^{(i)}|x^{(i)};\theta)=\frac{1}{\delta \sqrt{2\pi}}exp({-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\delta^2}}) p(y(i)∣x(i);θ)=δ2π1exp(−2δ2(y(i)−θTx(i))2)
L ( θ ) = ∏ i = 0 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 0 m 1 δ 2 π e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 δ 2 ) L(\theta)=\prod \limits_{i=0}^mp(y^{(i)}|x^{(i)};\theta)=\prod \limits_{i=0}^m\frac{1}{\delta \sqrt{2\pi}}exp({-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\delta^2}}) L(θ)=i=0∏mp(y(i)∣x(i);θ)=i=0∏mδ2π1exp(−2δ2(y(i)−θTx(i))2),希望 L ( θ ) L(\theta) L(θ)越大越好
求对数
l
(
θ
)
=
l
o
g
L
(
θ
)
=
l
o
g
∑
i
=
1
m
1
δ
2
π
e
x
p
(
−
(
y
(
i
)
−
θ
T
x
(
i
)
)
2
2
δ
2
)
=
m
l
o
g
1
δ
2
π
−
1
δ
2
⋅
1
2
∑
i
=
1
m
(
y
(
i
)
−
θ
T
x
(
i
)
)
2
\begin{aligned} \mathcal{l}(\theta)=&logL(\theta)\\=&log\sum_{i=1}^m\frac{1}{\delta \sqrt{2\pi}}exp({-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\delta^2}})\\=&mlog\frac{1}{\delta \sqrt{2\pi}}-\frac{1}{\delta^2}\cdot\frac{1}{2}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 \end{aligned}
l(θ)===logL(θ)logi=1∑mδ2π1exp(−2δ2(y(i)−θTx(i))2)mlogδ2π1−δ21⋅21i=1∑m(y(i)−θTx(i))2
l o s s ( y j , y j ^ ) = J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 loss(y_j,\hat{y_j})=J(\theta)=\frac{1}{2}\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2 loss(yj,yj^)=J(θ)=21i=1∑m(hθ(x(i)−y(i))2
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 = 1 2 ( X θ − Y ) T ( X θ − Y ) → m i n θ J ( θ ) \begin{aligned} J(\theta)=&\frac{1}{2}\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2\\=&\frac{1}{2}(X\theta-Y)^T(X\theta-Y) \rightarrow min_{\theta}J(\theta) \end{aligned} J(θ)==21i=1∑m(hθ(x(i)−y(i))221(Xθ−Y)T(Xθ−Y)→minθJ(θ)
∇ J ( θ ) = ∇ θ 1 2 ( X θ − Y ) T ( X θ − Y ) = ∇ θ 1 2 ( ( θ T X T − Y ) ( X θ − Y ) ) = ∇ θ ( 1 2 ( θ T X T X θ − θ T X T Y − Y T X θ + Y T Y ) ) = 1 2 ( 2 X T X θ − X T Y − ( Y T X ) T ) = X T X θ − X T Y \begin{aligned} \nabla J(\theta)=&\nabla_{\theta}\frac{1}{2}(X\theta-Y)^T(X\theta-Y) \\=& \nabla_{\theta}\frac{1}{2}((\theta ^TX^T-Y)(X\theta-Y)) \\=&\nabla_{\theta}(\frac{1}{2}(\theta^TX^TX\theta - \theta^TX^TY - Y^TX\theta+Y^TY))\\ =& \frac{1}{2}(2X^TX\theta-X^TY-(Y^TX)^T) \\=& X^TX\theta-X^TY \end{aligned} ∇J(θ)=====∇θ21(Xθ−Y)T(Xθ−Y)∇θ21((θTXT−Y)(Xθ−Y))∇θ(21(θTXTXθ−θTXTY−YTXθ+YTY))21(2XTXθ−XTY−(YTX)T)XTXθ−XTY
θ = ( X T X ) − 1 X T Y \theta=(X^TX)^{-1}X^TY θ=(XTX)−1XTY
参数解析式
θ = ( X T X ) − 1 X T Y \theta=(X^TX)^{-1}X^TY θ=(XTX)−1XTY
最小二乘法要求矩阵
X
T
X
X^TX
XTX是可逆的;为了防止不可逆或者过拟合的问题存在,可以二外增加二额外数据的影响,导致最终的矩阵是可逆的
θ
=
(
X
X
+
λ
I
)
−
1
X
T
y
\theta=(X^X+\lambda I)^{-1}X^Ty
θ=(XX+λI)−1XTy
最小二乘法直接求解的难点:矩阵逆的求
0-1损失函数
J
(
θ
)
=
{
1
,
Y
≠
f
(
X
)
0
,
Y
=
f
(
X
)
J(\theta)=\left\{ \begin{aligned}1,Y\neq f(X)\\ 0,Y=f(X) \end{aligned} \right.
J(θ)={1,Y=f(X)0,Y=f(X)
感知损失函数
J
(
θ
)
=
{
1
,
∣
Y
−
f
(
X
)
∣
>
t
0
,
∣
Y
−
f
(
X
)
∣
≤
t
J(\theta)=\left\{ \begin{aligned}1,|Y-f(X)|>t\\ 0,|Y-f(X)|\leq t \end{aligned} \right.
J(θ)={1,∣Y−f(X)∣>t0,∣Y−f(X)∣≤t
平方和损失函数
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2 J(θ)=i=1∑m(hθ(x(i)−y(i))2
绝对值损失函数
J ( θ ) = ∑ i = 1 m ∣ h θ ( x ( i ) − y ( i ) ∣ J(\theta)=\sum_{i=1}^m|h_{\theta}(x^{(i)}-y^{(i)}| J(θ)=i=1∑m∣hθ(x(i)−y(i)∣
对数损失函数
J ( θ ) = ∑ i = 1 m ( y ( i ) h θ ( x ( i ) ) ) J(\theta)=\sum_{i=1}^m(y^{(i)}h_{\theta}(x^{(i)})) J(θ)=i=1∑m(y(i)hθ(x(i)))
目标函数
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2 J(θ)=∑i=1m(hθ(x(i)−y(i))2
为了防止数据过拟合,也就是 θ \theta θ值在样本空间中不能过大/国过小,可以在目标函数之上增加一个平方和损失:
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 + λ ∑ i = 1 n θ j 2 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2+\lambda\sum_{i=1}^n\theta_j^2 J(θ)=i=1∑m(hθ(x(i)−y(i))2+λi=1∑nθj2
正则项(norm): λ ∑ i = 1 n θ j 2 \lambda\sum_{i=1}^n\theta_j^2 λ∑i=1nθj2
L2-norm:
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 λ > 0 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2+\lambda\sum_{j=1}^n\theta_j^2\quad\lambda>0 J(θ)=∑i=1m(hθ(x(i)−y(i))2+λ∑j=1nθj2λ>0
L1-norm:
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 + λ ∑ j = 1 n ∣ θ j ∣ λ > 0 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2+\lambda\sum_{j=1}^n|\theta_j|\quad\lambda>0 J(θ)=∑i=1m(hθ(x(i)−y(i))2+λ∑j=1n∣θj∣λ>0
同时使用L1正则和L2正则的线性回归模型就成为Elasitc Net算法(弹性网络算法)
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 + λ ( p ∑ j = 1 n ∣ θ j ∣ + ( 1 − p ) s u m j = 1 n θ j 2 ) J(\theta)=\frac{1}{2}\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2+\lambda(p\sum_{j=1}^n|\theta_j|+(1-p)sum_{j=1}^n\theta_j^2) J(θ)=21∑i=1m(hθ(x(i)−y(i))2+λ(p∑j=1n∣θj∣+(1−p)sumj=1nθj2)
{ λ > 0 p ∈ [ 0 , 1 ] \left\{\begin{aligned}&\lambda > 0\\&p\in [0,1]\end{aligned}\right. {λ>0p∈[0,1]
M S E = 1 m ∑ i = 1 m ( y i − y i ^ ) 2 R M S E = M S E = 1 m ∑ i = 1 m ( y i − y i ^ ) 2 R 2 = 1 − R S S T S S = 1 − ∑ i = 1 m ( y i − y i ^ ) 2 ∑ i = 1 m ( y i − y i ˉ ) 2 \begin{aligned} &MSE=\frac{1}{m}\sum_{i=1}^m(y_i-\hat{y_i})^2\\ &RMSE=\sqrt{MSE}=\sqrt{\frac{1}{m}\sum_{i=1}^m(y_i-\hat{y_i})^2}\\ &R^2=1-\frac{RSS}{TSS}=1-\frac{\sum_{i=1}^m(y_i-\hat{y_i})^2}{\sum_{i=1}^m(y_i-\bar{y_i})^2} \end{aligned} MSE=m1i=1∑m(yi−yi^)2RMSE=MSE=m1i=1∑m(yi−yi^)2R2=1−TSSRSS=1−∑i=1m(yi−yiˉ)2∑i=1m(yi−yi^)2
θ = θ − α ⋅ ∂ J ( θ ) ∂ θ \theta=\theta-\alpha\cdot \frac{\partial J(\theta)}{\partial \theta} θ=θ−α⋅∂θ∂J(θ)
α \alpha α:学习率、步长
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 ( h θ ( x ) − y ) 2 = 2 ⋅ 1 2 ( h θ ( x ) − y ) ⋅ ∂ ∂ θ j ( h θ ( x ) − y ) = ( h θ ( x ) − y ) ∂ ∂ θ j ( ∑ i = 1 n θ i x i − y ) = ( h θ ( x ) − y ) x j \begin{aligned} \frac{\partial}{\partial \theta_j}J(\theta)=&\frac{\partial}{\partial \theta_j}\frac{1}{2}(h_{\theta}(x)-y)^2\\ =&2\cdot \frac{1}{2}(h_{\theta}(x)-y)\cdot \frac{\partial}{\partial \theta_j}(h_{\theta}(x)-y)=(h_{\theta}(x)-y)\frac{\partial}{\partial \theta_j}(\sum_{i=1}^n\theta_ix_i-y)\\ =&(h_{\theta}(x)-y)x_j \end{aligned} ∂θj∂J(θ)===∂θj∂21(hθ(x)−y)22⋅21(hθ(x)−y)⋅∂θj∂(hθ(x)−y)=(hθ(x)−y)∂θj∂(i=1∑nθixi−y)(hθ(x)−y)xj
J ( θ ) = ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 J(\theta)=\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)})^2 J(θ)=i=1∑m(hθ(x(i)−y(i))2
∂ ∂ θ j J ( θ ) = ( h θ ( x ) − y ) x i \frac{\partial}{\partial \theta_j}J(\theta)=(h_{\theta}(x)-y)x_i ∂θj∂J(θ)=(hθ(x)−y)xi
∂ J ( θ ) ∂ θ j = ∑ i = 1 m ∂ ∂ θ j = ∑ i = 1 m ( x j ( i ) ( h θ ( x ( i ) ) − y ( i ) ) ) = ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ) x j ( i ) \frac{\partial J(\theta)}{\partial \theta_j}=\sum_{i=1}^m\frac{\partial }{\partial \theta_j}=\sum_{i=1}^m(x_j^{(i)}(h_{\theta}(x^{(i)})-y^{(i)}))=\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)}))x_j^{(i)} ∂θj∂J(θ)=i=1∑m∂θj∂=i=1∑m(xj(i)(hθ(x(i))−y(i)))=i=1∑m(hθ(x(i))−y(i)))xj(i)
θ j = θ j + α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) ) x j ( i ) \theta_j=\theta_j+\alpha\sum_{i=1}^m(y^{(i)}-h_{\theta}(x^{(i)})))x_j^{(i)} θj=θj+αi=1∑m(y(i)−hθ(x(i))))xj(i)
∂ ∂ θ j J ( θ ) = ( h θ ( x ) − y ) x i \frac{\partial}{\partial \theta_j}J(\theta)=(h_{\theta}(x)-y)x_i ∂θj∂J(θ)=(hθ(x)−y)xi
for i=1 to m,{
θ j = θ j + α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) ) x j ( i ) \theta_j=\theta_j+\alpha\sum_{i=1}^m(y^{(i)}-h_{\theta}(x^{(i)})))x_j^{(i)} θj=θj+αi=1∑m(y(i)−hθ(x(i))))xj(i)
}
w ( i ) w^{(i)} w(i)是权重,它根据要预测的点与数据集中的点的距离来为数据集中的点赋权值。
当某点离要预测的点越远,其权重越小,否则越大。常用公式选择为:
w ( i ) = e x p ( − ( x ( i ) ) − x ˉ ) 2 2 k 2 ) w^{(i)}=exp(-\frac{(x^{(i)})-\bar{x})^2}{2k^2}) w(i)=exp(−2k2(x(i))−xˉ)2)
该函数称为指数衰减函数,其中k为波长参数,它控制了权值随距离下降的速率
使用该方式主要应用到样本之间的相似性考虑,主要内容在SVM中再考虑(核函数)
Logistic/sigmoid函数 p = h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x p=h_{\theta}(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}} p=hθ(x)=g(θTx)=1+e−θTx1
或者写成
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
KaTeX parse error: Expected 'EOF', got '&' at position 2: &̲y=\left\{\begin…
g
′
(
z
)
=
(
1
1
+
e
−
z
)
′
=
e
−
z
(
1
+
e
−
z
)
2
=
1
1
+
e
−
z
⋅
e
−
z
1
+
e
−
z
=
1
1
+
e
−
z
⋅
(
1
−
1
1
+
e
−
z
)
g'(z)=(\frac{1}{1+e^{-z}})'=\frac{e^{-z}}{(1+e^{-z})^2}=\frac{1}{1+e^{-z}}\cdot\frac{e^{-z}}{1+e^{-z}}=\frac{1}{1+e^{-z}}\cdot (1-\frac{1}{1+e^{-z}})
g′(z)=(1+e−z1)′=(1+e−z)2e−z=1+e−z1⋅1+e−ze−z=1+e−z1⋅(1−1+e−z1)
假设: P ( y = 1 ∣ x ; θ ) = h θ ( x ) P(y=1|x;\theta)=h_{\theta}(x) P(y=1∣x;θ)=hθ(x)
P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) P(y=0|x;\theta)=1-h_{\theta}(x) P(y=0∣x;θ)=1−hθ(x)
P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) ( 1 − y ) P(y|x;\theta)=(h_{\theta}(x))^y(1-h_{\theta}(x))^{(1-y)} P(y∣x;θ)=(hθ(x))y(1−hθ(x))(1−y)
| y = 1 y=1 y=1 | y = 0 y=0 y=0 | |
|---|---|---|
| p ( y ∣ x ) p(y|x) p(y∣x) | θ \theta θ | 1 − θ 1-\theta 1−θ |
似然函数: L ( θ ) = p ( y ⃗ ∣ X ; θ ) = ∏ i = 1 m p ( y ( i ) ∣ X ( i ) ; θ ) = ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − ( h θ ( x ( i ) ) ) 1 − y ( i ) \begin{aligned}L(\theta)=&p(\vec{y}|X;\theta)=\prod\limits_{i=1}^mp(y^{(i)}|X^{(i)};\theta)\\=&\prod\limits_{i=1}^m(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-(h_{\theta}(x^{(i)}))^{1-y^{(i)}}\end{aligned} L(θ)==p(y∣X;θ)=i=1∏mp(y(i)∣X(i);θ)i=1∏m(hθ(x(i)))y(i)(1−(hθ(x(i)))1−y(i)
对数似然函数: l ( θ ) = l o g L ( θ ) = ∑ i = 1 m ( y ( i ) l o g h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ) \mathcal{l}(\theta)=logL(\theta)=\sum_{i=1}^m(y^{(i)}logh_{\theta}(x^{(i)})+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))) l(θ)=logL(θ)=∑i=1m(y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i))))
∂ l ( θ ) ∂ θ j = ∑ i = 1 m ( y ( i ) h θ ( x ( i ) ) − 1 − y ( i ) 1 − h θ ( x ( i ) ) ) ⋅ ∂ h θ ( x ( i ) ) ∂ θ j = ∑ i = 1 m ( y ( i ) g ( θ T ( x ( i ) ) − 1 − y ( i ) 1 − g ( θ T ( x ( i ) ) ) ⋅ ∂ ( g ( θ T x ( i ) ) ∂ θ j = ∑ i = 1 m ( y ( i ) g ( θ T ( x ( i ) ) − 1 − y ( i ) 1 − g ( θ T ( x ( i ) ) ) ⋅ g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) ⋅ ∂ θ T x ( i ) ∂ θ j = ∑ i = 1 m ( y ( i ) ( 1 − g ( θ T x ( i ) ) + ( 1 − y ( i ) ) g ( θ T ( x ( i ) ) ) ⋅ x j ( i ) = ∑ i = 1 m ( y ( i ) − g ( θ T ( x ( i ) ) ) ⋅ x j ( i ) \begin{aligned} \frac{\partial \mathcal{l}(\theta)}{\partial \theta_j}=\sum_{i=1}^m(\frac{y^{(i)}}{h_{\theta}(x^{(i)})}-\frac{1-y^{(i)}}{1-h_{\theta}(x^{(i)})})\cdot \frac{{\partial h_{\theta}(x^{(i)})}}{\partial \theta_{j}}\\ =\sum_{i=1}^m(\frac{y^{(i)}}{g(\theta^T(x^{(i)})}-\frac{1-y^{(i)}}{1-g(\theta^T(x^{(i)})})\cdot \frac{{\partial (g(\theta^Tx^{(i)})}}{\partial \theta_{j}}\\ =\sum_{i=1}^m(\frac{y^{(i)}}{g(\theta^T(x^{(i)})}-\frac{1-y^{(i)}}{1-g(\theta^T(x^{(i)})})\cdot g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)}))\cdot \frac{{\partial \theta^Tx^{(i)}}}{\partial \theta_{j}}\\ =\sum_{i=1}^m(y^{(i)}(1-g(\theta^Tx^{(i)})+(1-y^{(i)})g(\theta^T(x^{(i)}))\cdot x_j^{(i)}=\sum_{i=1}^m(y^{(i)}-g(\theta^T(x^{(i)}))\cdot x_j^{(i)} \end{aligned} ∂θj∂l(θ)=i=1∑m(hθ(x(i))y(i)−1−hθ(x(i))1−y(i))⋅∂θj∂hθ(x(i))=i=1∑m(g(θT(x(i))y(i)−1−g(θT(x(i))1−y(i))⋅∂θj∂(g(θTx(i))=i=1∑m(g(θT(x(i))y(i)−1−g(θT(x(i))1−y(i))⋅g(θTx(i))(1−g(θTx(i)))⋅∂θj∂θTx(i)=i=1∑m(y(i)(1−g(θTx(i))+(1−y(i))g(θT(x(i)))⋅xj(i)=i=1∑m(y(i)−g(θT(x(i)))⋅xj(i)
l ( θ ) ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m p y ( i ) ( 1 − p i ) ( 1 − y ( i ) ) \mathcal{l}(\theta)\prod \limits_{i=1}^mp(y^{(i)}|x^{(i)};{\theta})=\prod \limits_{i=1}^mp^{y^{(i)}}(1-p_i)^{(1-y^{(i)})} l(θ)i=1∏mp(y(i)∣x(i);θ)=i=1∏mpy(i)(1−pi)(1−y(i))
p i = h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) p_{i}=h_{\theta}(x^{(i)})=\frac{1}{1+e^{-\theta^Tx^{(i)}}} pi=hθ(x(i))=1+e−θTx(i)1
l ( θ ) = l o g L ( θ ) = ∑ i = 1 m l n [ p y ( i ) ( 1 − p i ) ( 1 − y ( i ) ) ] \mathcal{l}(\theta)=logL(\theta)=\sum_{i=1}^mln[p^{y^{(i)}}(1-p_i)^{(1-y^{(i)})}] l(θ)=logL(θ)=∑i=1mln[py(i)(1−pi)(1−y(i))]
l o s s = l ( θ ) = − ∑ i = 1 m [ y ( i ) l n ( p i ) + ( 1 − y ( i ) ) l n ( 1 − p i ) ] = ∑ i = 1 m [ − y ( i ) l n ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l n ( 1 − h θ ( x ( i ) ) ) ] \begin{aligned}loss=&\mathcal{l}(\theta)\\=&-\sum_{i=1}^m[y^{(i)}ln(p_i)+(1-y^{(i)})ln(1-p_i)]\\=&\sum_{i=1}^m[-y^{(i)}ln(h_{\theta}(x^{(i)}))+(1-y^{(i)})ln(1-h_{\theta}(x^{(i)}))]\end{aligned} loss===l(θ)−i=1∑m[y(i)ln(pi)+(1−y(i))ln(1−pi)]i=1∑m[−y(i)ln(hθ(x(i)))+(1−y(i))ln(1−hθ(x(i)))]
这似乎非常适得其反,因为太多的gem会在window上破裂。我一直在处理很多mysql和ruby-mysqlgem问题(gem本身发生段错误,一个名为UnixSocket的类显然在Windows机器上不能正常工作,等等)。我只是在浪费时间吗?我应该转向不同的脚本语言吗? 最佳答案 我在Windows上使用Ruby的经验很少,但是当我开始使用Ruby时,我是在Windows上,我的总体印象是它不是Windows原生系统。因此,在主要使用Windows多年之后,开始使用Ruby促使我切换回原来的系统Unix,这次是Linux。Rub
目录一.加解密算法数字签名对称加密DES(DataEncryptionStandard)3DES(TripleDES)AES(AdvancedEncryptionStandard)RSA加密法DSA(DigitalSignatureAlgorithm)ECC(EllipticCurvesCryptography)非对称加密签名与加密过程非对称加密的应用对称加密与非对称加密的结合二.数字证书图解一.加解密算法加密简单而言就是通过一种算法将明文信息转换成密文信息,信息的的接收方能够通过密钥对密文信息进行解密获得明文信息的过程。根据加解密的密钥是否相同,算法可以分为对称加密、非对称加密、对称加密和非
目录前言滤波电路科普主要分类实际情况单位的概念常用评价参数函数型滤波器简单分析滤波电路构成低通滤波器RC低通滤波器RL低通滤波器高通滤波器RC高通滤波器RL高通滤波器部分摘自《LC滤波器设计与制作》,侵权删。前言最近需要学习放大电路和滤波电路,但是由于只在之前做音乐频谱分析仪的时候简单了解过一点点运放,所以也是相当从零开始学习了。滤波电路科普主要分类滤波器:主要是从不同频率的成分中提取出特定频率的信号。有源滤波器:由RC元件与运算放大器组成的滤波器。可滤除某一次或多次谐波,最普通易于采用的无源滤波器结构是将电感与电容串联,可对主要次谐波(3、5、7)构成低阻抗旁路。无源滤波器:无源滤波器,又称
最近在学习CAN,记录一下,也供大家参考交流。推荐几个我觉得很好的CAN学习,本文也是在看了他们的好文之后做的笔记首先是瑞萨的CAN入门,真的通透;秀!靠这篇我竟然2天理解了CAN协议!实战STM32F4CAN!原文链接:https://blog.csdn.net/XiaoXiaoPengBo/article/details/116206252CAN详解(小白教程)原文链接:https://blog.csdn.net/xwwwj/article/details/105372234一篇易懂的CAN通讯协议指南1一篇易懂的CAN通讯协议指南1-知乎(zhihu.com)视频推荐CAN总线个人知识总
深度学习部署:Windows安装pycocotools报错解决方法1.pycocotools库的简介2.pycocotools安装的坑3.解决办法更多Ai资讯:公主号AiCharm本系列是作者在跑一些深度学习实例时,遇到的各种各样的问题及解决办法,希望能够帮助到大家。ERROR:Commanderroredoutwithexitstatus1:'D:\Anaconda3\python.exe'-u-c'importsys,setuptools,tokenize;sys.argv[0]='"'"'C:\\Users\\46653\\AppData\\Local\\Temp\\pip-instal
require"socket"server="irc.rizon.net"port="6667"nick="RubyIRCBot"channel="#0x40"s=TCPSocket.open(server,port)s.print("USERTesting",0)s.print("NICK#{nick}",0)s.print("JOIN#{channel}",0)这个IRC机器人没有连接到IRC服务器,我做错了什么? 最佳答案 失败并显示此消息::irc.shakeababy.net461*USER:Notenoughparame
我完全不是程序员,正在学习使用Ruby和Rails框架进行编程。我目前正在使用Ruby1.8.7和Rails3.0.3,但我想知道我是否应该升级到Ruby1.9,因为我真的没有任何升级的“遗留”成本。缺点是什么?我是否会遇到与普通gem的兼容性问题,或者甚至其他我不太了解甚至无法预料的问题? 最佳答案 你应该升级。不要坚持从1.8.7开始。如果您发现不支持1.9.2的gem,请避免使用它们(因为它们很可能不被维护)。如果您对gem是否兼容1.9.2有任何疑问,您可以在以下位置查看:http://www.railsplugins.or
如何学习ruby的正则表达式?(对于假人) 最佳答案 http://www.rubular.com/在Ruby中使用正则表达式时是一个很棒的工具,因为它可以立即将结果可视化。 关于ruby-我如何学习ruby的正则表达式?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/1881231/
深度学习12.CNN经典网络VGG16一、简介1.VGG来源2.VGG分类3.不同模型的参数数量4.3x3卷积核的好处5.关于学习率调度6.批归一化二、VGG16层分析1.层划分2.参数展开过程图解3.参数传递示例4.VGG16各层参数数量三、代码分析1.VGG16模型定义2.训练3.测试一、简介1.VGG来源VGG(VisualGeometryGroup)是一个视觉几何组在2014年提出的深度卷积神经网络架构。VGG在2014年ImageNet图像分类竞赛亚军,定位竞赛冠军;VGG网络采用连续的小卷积核(3x3)和池化层构建深度神经网络,网络深度可以达到16层或19层,其中VGG16和VGG
文章目录1、自相关函数ACF2、偏自相关函数PACF3、ARIMA(p,d,q)的阶数判断4、代码实现1、引入所需依赖2、数据读取与处理3、一阶差分与绘图4、ACF5、PACF1、自相关函数ACF自相关函数反映了同一序列在不同时序的取值之间的相关性。公式:ACF(k)=ρk=Cov(yt,yt−k)Var(yt)ACF(k)=\rho_{k}=\frac{Cov(y_{t},y_{t-k})}{Var(y_{t})}ACF(k)=ρk=Var(yt)Cov(yt,yt−k)其中分子用于求协方差矩阵,分母用于计算样本方差。求出的ACF值为[-1,1]。但对于一个平稳的AR模型,求出其滞