A Damped Hessian-Free Newton--Conjugate Gradient Method for Weighted Multiclass Neural Classification
Abstract
Keywords
Full Text:
PDFReferences
[1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. Official book site. No DOI is listed on the official citation page. Cambridge, MA: MIT Press, 2016. https://www.deeplearningbook.org.
[2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning”. In: Nature 521.7553 (2015), pp. 436–444. doi: 10.1038/nature14539.
[3] B. T. Polyak. “Some Methods of Speeding up the Convergence of Iteration Methods”. In: USSR Computational Mathematics and Mathematical Physics 4.5 (1964), pp. 1–17. doi: 10.1016/0041-5553(64)90137-5.
[4] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5—RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning. 2012. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
[5] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. Published as a conference paper at ICLR 2015. 2015. doi: 10.48550/arXiv.1412.6980. arXiv: 1412.6980 [cs.LG]. https://arxiv.org/abs/1412.6980.
[6] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. 2nd ed. Springer Series in Operations Research and Financial Engineering. New York: Springer, 2006. doi: 10.1007/978-0-387-40065-5. https://link.springer.com/book/10.1007/978-0-387-40065-5.
[7] Nikita Doikov, El Mahdi Chayti, and Martin Jaggi. “Second-Order Optimization with Lazy Hessians”. In: Proceedings of the 40th International Conference on Machine Learning. Vol. 202. Proceedings of Machine Learning Research. 2023, pp. 8111–8148.
[8] Satoki Ishikawa and Rio Yokota. “When Does Second-Order Optimization Speed Up Training?” In: The Twelfth International Conference on Learning Representations. Tiny Paper. 2024. https://openreview.net/forum?id=NLrfEsSZNb.
[9] James Martens. “Deep Learning via Hessian-Free Optimization”. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp. 735–742. doi: 10.5555/3104322.3104416. https://dl.acm.org/doi/10.5555/3104322.3104416.
[10] Barak A. Pearlmutter. “Fast Exact Multiplication by the Hessian”. In: Neural Computation 6.1 (1994), pp. 147–160. doi: 10.1162/neco.1994.6.1.147.
[11] Ruichen Jiang et al. “Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate”. In: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. Vol. 238. Proceedings of Machine Learning Research. 2024, pp. 1–20.
[12] Magnus R. Hestenes and Eduard Stiefel. “Methods of Conjugate Gradients for Solving Linear Systems”. In: Journal of Research of the National Bureau of Standards 49.6 (1952), pp. 409–436. doi: 10.6028/jres.049.044.
[13] Mohammad Jamhuri et al. “Inexact Generalized Gauss–Newton–CG for Binary Cross-Entropy Minimization”. In: Jurnal Riset Mahasiswa Matematika 5.2 (2025), pp. 102–122. doi: 10.18860/jrmm.v5i2.34739.
[14] Mohammad Jamhuri et al. “Neural networks optimization via Gauss–Newton based QR factorization on SARS-CoV-2 variant classification”. In: Systems and Soft Computing 7 (2025), p. 200195. doi: 10.1016/j.sasc.2025.200195.
[15] Mohammad Jamhuri, Imam Mukhlash, and Mohammad Isa Irawan. “Performance Improvement of Logistic Regression for Binary Classification by Gauss-Newton Method”. In: Proceedings of the 2022 5th International Conference on Mathematics and Statistics. ACM, 2022, pp. 12–16. doi: 10.1145/3545839.3545842.
[16] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Neural Networks”. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS). Vol. 15. Proceedings of Machine Learning Research. 2011, pp. 315–323. https://proceedings.mlr.press/v15/glorot11a.html.
[17] Kaiming He et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2015, pp. 1026–1034. doi: 10.1109/ICCV.2015.123.
[18] Haibo He and Edwardo A. Garcia. “Learning from Imbalanced Data”. In: IEEE Transactions on Knowledge and Data Engineering 21.9 (2009), pp. 1263–1284. doi: 10.1109/TKDE.2008.239.
[19] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009. doi: 10.1007/978-0-387-84858-7.
[20] Christopher M. Bishop. Pattern Recognition and Machine Learning. New York: Springer, 2006.
[21] Larry Armijo. “Minimization of Functions Having Lipschitz Continuous First Partial Derivatives”. In: Pacific Journal of Mathematics 16.1 (1966), pp. 1–3. doi: 10.2140/pjm.1966.16.1.
DOI: https://doi.org/10.18860/cauchy.v11i1.40243
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Andy Irawan, Zainal Abidin, Mohammad Jamhuri

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







