optimize.minimize handles vector parameters—loss surfaces in ML, calibration of multiple constants, and inverse problems in science and engineering.
Choosing a method
- BFGS — smooth, unconstrained, gradient approximated
- Nelder-Mead — derivative-free, slower on high-D
- L-BFGS-B — large problems with box bounds
- Provide
jacanalytic gradient when available for speed
Rosenbrock toy problem
import numpy as np
from scipy import optimize
def rosen(x):
return sum(100.0 * (x[1:] - x[:-1]**2)**2 + (1 - x[:-1])**2)
x0 = np.array([-1.2, 1.0])
res = optimize.minimize(rosen, x0, method='BFGS')
print(res.x, res.fun)
Diagnostics
Plot loss vs iteration if callback supported; inspect final x and whether constraints were satisfied. Ill-conditioned problems may need scaling of variables.
Important interview questions and answers
- Q: Why scale variables?
A: Optimizers assume similar magnitudes—mixing mm and km hurts convergence. - Q: Nelder-Mead trade-off?
A: No gradients needed but not ideal for high-dimensional ML losses.
Self-check
- Name two minimize methods and when to use each.
- What is the Rosenbrock function used for in teaching?
Tip: Scale variables to similar magnitudes before BFGS or L-BFGS-B.
Interview prep
- BFGS?
Quasi-Newton for smooth unconstrained—approximates Hessian.
- Nelder-Mead?
Derivative-free—slower in high dimensions.