scipy.sparse stores matrices with mostly zeros efficiently—CSR, CSC, COO formats for graphs, finite elements, and text bag-of-words features.
Formats
- CSR — fast row ops and matrix-vector multiply
- CSC — fast column ops
- COO — easy construction from (row, col, data) triplets
- Convert formats with
.tocsr(),.tocsc()
Construction
import numpy as np
import scipy.sparse as sp
row = np.array([0, 0, 1, 2])
col = np.array([0, 2, 1, 2])
data = np.array([1, 2, 3, 4], dtype=float)
M = sp.coo_matrix((data, (row, col)), shape=(3, 3))
print(M.toarray())
Density
If fewer than ~1–5% entries are nonzero, sparse often beats dense in memory and time. sklearn uses sparse for text features.
Important interview questions and answers
- Q: CSR vs CSC?
A: CSR for row-wise matvec; CSC for column-wise—pick format matching hot loops. - Q: toarray() cost?
A: Densifies entire matrix—only for small debugging, not huge systems.
Self-check
- Name three sparse formats.
- When is a sparse matrix worth it?
Pitfall: Calling .toarray() on huge sparse matrices blows RAM—stay sparse until export size is safe.
Interview prep
- CSR?
Compressed sparse row—fast row operations and matvec.
- When sparse?
When nnz much smaller than n²—graphs, grids, text features.