Inexact Generalized Gauss--Newton--CG for Binary Cross-Entropy Minimization
Abstract
Keywords
Full Text:
PDFReferences
[1] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.
[3] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004.
[4] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. International Conference on Learning Representations (ICLR), 2015.
[5] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. Springer, 2006.
[6] B. A. Pearlmutter, “Fast exact multiplication by the Hessian,” Neural Computation, vol. 6, no. 1, pp. 147–160, 1994, doi: 10.1162/neco.1994.6.1.147.
[7] N. N. Schraudolph, “Fast curvature matrix-vector products for second-order gradient descent,” Neural Computation, vol. 14, no. 7, pp. 1723–1738, 2002.
[8] J. Martens, “Deep learning via Hessian-free optimization,” in Proc. International Conference on Machine Learning (ICML), 2010, pp. 735–742.
[9] A. Botev, “The Gauss-Newton matrix for deep learning models and its applications,” Ph.D. dissertation, University College London (UCL), 2020.
[10] D. Buffelli et al., “Exact, tractable Gauss-Newton optimization in deep reversible architectures reveal poor generalization,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024, pp. 133541–133570, doi: 10.48550/arXiv.2411.07979.
[11] J. Zhao, S. P. Singh, and A. Lucchi, “Theoretical characterisation of the Gauss–Newton conditioning in neural networks,” arXiv preprint arXiv:2411.02139, 2024, doi: 10.48550/arXiv.2411.02139.
[12] Z. Yao, A. Gholami, S. Shen, M. Mustafa, K. Keutzer, and M. W. Mahoney, “AdaHessian: An adaptive second order optimizer for machine learning,” in Proc. AAAI Conference on Artificial Intelligence (AAAI), vol. 35, 2021, pp. 10665–10673, doi: 10.1609/aaai.v35i12.17275.
[13] H. Liu, Z. Li, D. Hall, P. Liang, and T. Ma, “Sophia: A scalable stochastic second-order optimizer for language model pre-training,” arXiv preprint arXiv:2305.14342, 2023, doi: 10.48550/arXiv.2305.14342.
[14] D. Shin, D. Lee, J. Chung, and N. Lee, “Sassha: Sharpness-aware adaptive second-order optimization with stable Hessian approximation,” arXiv preprint arXiv:2502.18153, 2025, doi: 10.48550/arXiv.2502.18153.
[15] M. Jamhuri, M. I. Irawan, I. Mukhlash, M. Iqbal, and N. N. T. Puspaningsih, “Neural networks optimization via Gauss–Newton based QR factorization on SARS-CoV-2 variant classification,” Systems and Soft Computing, vol. 7, p. 200195, 2025, doi: 10.1016/j.sasc.2025.200195.
[16] M. Jamhuri, “Optimasi model deep learning menggunakan metode Gauss-Newton terdistribusi untuk prediksi mutasi sekuen protein spike virus SARS-CoV-2,” Ph.D. dissertation, Institut Teknologi Sepuluh Nopember, 2025.
[17] M. Jamhuri, I. Mukhlash, and M. I. Irawan, “Performance improvement of logistic regression for binary classification by Gauss-Newton method,” in Proc. 2022 5th International Conference on Mathematics and Statistics, 2022, pp. 12–16, doi: 10.1145/3545839.3545842.
[18] X. Li, S. Wang, and Z. Zhang, “Do subsampled Newton methods work for high-dimensional data?” in Proc. AAAI Conference on Artificial Intelligence (AAAI), vol. 34, 2020, pp. 4723–4730, doi: 10.1609/aaai.v34i04.5905.
DOI: https://doi.org/10.18860/jrmm.v5i2.34739
Refbacks
- There are currently no refbacks.



1.png)
.png)




