跳至主要内容

deep learning summary1

1. Use proper initialize method to avoid gradient vanish or exploded.
For example: layer=[3,5,1], when initialize the second layer,
W[2]=np.random.rand(5,3)*sqrt(1/5)
It avoid the initial weight too big or to small.
we can short the training time significantly by this way


2. Use regularization to reduce the variance (overfitting).
L2 regularization: add lambda/(2*m) *sigma( all weights) to the cost function. We need to change the backward propagation formula if we use this regularization.

drop out: inverted drop out:  generate a matrix in this way:
mask[l] = np.random.rand( W[l].shape[0], W[l].shape[0] )
mask[l] = (mask<keep_prob). astype(int)

When forward propagation, use W[l]*mask[l]/keep_prob instead of W[l]; When backward propagation, use dW[l]*mask[l]/keep_prob instead of W[l].


3. Use different optimize method to reduce the cost decay time.
 Batch gradient descent: traditional way.
 Mini_batch gradient descent: choose a mini-batch number such as 32 64 128.      Compute forward and backward propagation in each batch, then iterate all batch.
 Stochastic gradient descent: mini_batch number=1
 momentum: when compute W[l]=W[l] - learning rate*dW[l],  replace dW[l] in v[l]=beta*v[l] + (1-beta)*dW[l]
 Adam: replace dW[l] in (corrected v[l])/(corrected s[l] +epsilon),
    where corrected v[l] = (beta1*v[l] + (1-beta1)*dW[l])/(1-beta1^t) ,
             corrected s[l] = (beta2*s[l] + (1-beta2)*(dW[l]^2)) / (1-beta2^t) ,
             t is the backward propagation iteration time,
             epsilon is a small value to avoid zero.
All of those optimize method aim to reduce or balance the gradient direction, to make the gradient more useful.
 Besides, can also use batch normalization to reduce converge time. Batch normalization can reduce the covariance shift

评论

此博客中的热门博文

Estimating VaR with Copula Function and Empirical Research

Estimating VaR of portfolio by conditional copula-GARCH method C ontents 1.     Introduction .. 1 2.     Theory of copula .. 1 2.1       Introduction to copula .. 1 2.2       Copula family .. 2 2.3       Estimation method .. 3 2.4       Estimation of VaR .. 3 3.     Empirical results .. 3 3.1       The data and the marginal distribution .. 3 3.2       Copula modeling .. 5 3.3       Estimation of VaR .. 5 4.     Conclusion .. 7 References .. 8 1.      Introduction Value at Risk (VaR) has become the standard measure used by financial institutions to quantify the market risk of an asset or a portfolio. Estimating VaR with one asset is not difficult, but it becomes complex when the portfolio contains ...