Production pipelines using NumPy need dtype discipline, shape validation, reproducible RNG, and explicit copies at boundaries—especially before Pandas handoff or model serving.
Before shipping numeric code
- Assert expected
shapeanddtypeat API boundaries - Document axis conventions (samples × features)
- Seed RNG (
default_rng) in tests and training - Use
np.save/ versioned artifacts for array checkpoints - Handle NaN/inf explicitly—don't silently propagate
Performance in production
- Avoid object arrays in hot paths
- Minimize copies between Pandas ↔ NumPy ↔ model runtime
- Profile with realistic batch sizes
- Consider float32 where precision sufficient
Testing
import numpy as np
expected = np.array([1, 2, 3])
actual = np.array([1, 2, 3])
assert np.array_equal(expected, actual)
print('arrays match')
Important interview questions and answers
- Q: array_equal vs (a==b).all()?
A: array_equal handles NaN with equal_nan option; raw == fails on NaN. - Q: Train-serve dtype mismatch?
A: float64 train vs float32 serve can shift predictions—standardize.
Self-check
- List five production NumPy checklist items.
- Why assert shapes at boundaries?
- How to compare arrays in tests?
Tip: Assert shapes at every pipeline stage boundary.
Interview prep
- Boundaries?
Assert shape/dtype at API inputs and outputs.
- array_equal?
Test helper with equal_nan for floating comparisons.
- dtype mismatch?
float64 train vs float32 serve can shift predictions.