Abstract
This chapter explores modifications and extensions to simple feed-forward neural networks, which can be applied to any other neural network. The problem of local minima as one of the main problems in machine learning is explored with all of its intricacies. The main strategy against local minima is the idea of regularization, by adding a regularization parameter when learning. Both L1 and L2 regularizations are explored and explained in detail. The chapter also addresses the idea of the learning rate and shows how to implement it in backpropagation, both in the static and dynamic setting. Momentum is also explored, as a technique which also helps against local minima by adding inertia to the gradient descent. This chapter also explores the stochastic gradient descent in the form of learning with batches and pure online learning. This chapter concludes with a final view on the vanishing and exploding gradient problems, setting the stage for deep learning.
Keywords
- Pure Online Learning
- Stochastic Gradient Descent
- Simple Feed-forward Neural Network
- Sticky Objects
- Minibatch
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We will be using a modification of the explanation offered by [3]. Note that this book is available online at http://neuralnetworksanddeeplearning.com.
- 2.
We take the idea for this abstraction from Geoffrey Hinton’s courses.
- 3.
This is actually also a technique which is used to prevent overfitting called early stopping.
- 4.
You can use the learning rate to force a gradient explosion, so if you want to see gradient explosion for yourself try with an \(\eta \) value of 5 or 10.
- 5.
We have been clumsy around several things, and this section is intended to redefine them a bit to make them more precise.
- 6.
We could use also a non-random selection. One of the most interesting ideas here is that of learning the simplest instances first and then proceeding to the more tricky ones, and this approach is called curriculum learning. For more on this see [13].
- 7.
This is similar to reinforcement learning, which is, along with supervised and unsupervised learning one of the three main areas of machine learning, but we have decided against including it in this volume, since it falls outside of the the idea of a first introduction to deep learning. If the reader wishes to learn more, we refer her to [14].
- 8.
Suppose for the sake of clarification it is non-randomly divided: the first batch contains training samples 1 to 1000, the second 1001 to 2000, etc.
- 9.
A single hidden layer with two neurons in it. It it was (3, 2, 4, 1) we would know it has two hidden layer, the first one with two neurons and the second one with four.
- 10.
Ok, we have used the adjusted the values to make this statement true. Several of the derivatives we need will become a value between 0 and 1 soon, but it the sigmoid derivatives are mathematically bound between 0 and 1, and if we have many layers (e.g. 8), the sigmoid derivatives would dominate backpropagation.
- 11.
If the regular approach was something like making a clay statue (removing clay, but sometimes adding), the intuition behind initializing the weights to large values would be taking a block of stone or wood and start chipping away pieces.
References
A.N. Tikhonov, On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
A.N. Tikhonov, Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)
M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2015)
R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser B (Methodol.) 58(1), 267–288 (1996)
A. Ng, Feature selection, L1 versus L2 regularization, and rotational invariance, in Proceedings of the International Conference on Machine Learning (2004)
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
J. Wen, J.L. Zhao, S.W. Luo, Z. Han, The improvements of BP neural network learning algorithm, in Proceedings of 5th International Conference on Signal Processing (IEEE Press, 2000), pp. 1647–1649
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors (2012)
G.E. Dahl, T.N. Sainath, G.E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in IEEE International Conference on Acoustic Speech and Signal Processing (IEEE Press, 2013), pp. 8609–8613
N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, New York, NY, USA, (ACM, 2009), pp. 41–48
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998)
S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis, Technische Universität Munich, 1991
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Neural Networks, ed. by S.C. Kremer, J.F. Kolen (IEEE Press, 2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Skansi, S. (2018). Modifications and Extensions to a Feed-Forward Neural Network. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-73004-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73003-5
Online ISBN: 978-3-319-73004-2
eBook Packages: Computer ScienceComputer Science (R0)