ML libraries expect feature matrices as 2D float ndarrays: rows = samples, columns = features. Labels are 1D arrays. NumPy is the lingua franca before tensors on GPU.
Shape conventions
X.shape == (n_samples, n_features)y.shape == (n_samples,)for classification/regression- Images:
(n_samples, height, width, channels)
Train matrix example
import numpy as np
X = np.array([[1, 0], [0, 1], [1, 1]], dtype=float)
y = np.array([0, 0, 1])
print(X.shape, y.shape)
From NumPy to frameworks
PyTorch torch.from_numpy shares memory when possible. TensorFlow and JAX wrap similar array concepts with autograd and device placement.
Important interview questions and answers
- Q: Why float64 vs float32 in ML?
A: Training often float32 for speed; NumPy defaults float64—cast before GPU transfer. - Q: Standardize features?
A: Subtract mean, divide std—fit on train only per data science hygiene.
Self-check
- Expected shape of sklearn feature matrix X?
- What dtype is common for neural net training?
Tip: Keep X as (n_samples, n_features) float arrays.
Interview prep
- X shape?
(n_samples, n_features) float matrix.
- Standardize?
Fit mean/std on train only—apply to val/test.