Iterative Closest Point (ICP) - SVD Derivation

Motivation and Use of ICP in Robotics

The Iterative Closest Point (ICP) algorithm is a cornerstone in 3D data alignment, crucial for robotics, SLAM (Simultaneous Localization and Mapping), and 3D reconstruction. The goal is to align two sets of 3D points (usually captured from LIDAR, stereo cameras, or depth sensors) by estimating a rigid transformation (rotation and translation).

ICP is used extensively in:

Mapping: Aligning point clouds from successive frames to build a global map.
Localization: Matching current sensor readings to a pre-built map to estimate the robot's pose.
Loop closure detection: Recognizing revisited places by aligning new scans to older ones.
Registration: Aligning 3D scans in CAD or medical imaging.

The beauty of ICP is that it does not require camera intrinsics or explicit correspondences — just raw 3D data and an initial guess (or not). The most computationally elegant step is solving for the best rigid transform using SVD.

Problem Setup

Given two sets of corresponding 3D points:

\[ \mathcal{P} = \{ \mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_n \}, \quad \mathcal{P}' = \{ \mathbf{p}'_1, \mathbf{p}'_2, \dots, \mathbf{p}'_n \} \]

We seek a rigid transform $(\mathbf{R}, \mathbf{t})$ such that:

\[ \mathbf{p}'_i = \mathbf{R} \mathbf{p}_i + \mathbf{t} \]

Note: No camera intrinsics are required; only 3D correspondences. Applicable to general SLAM setups.

Objective Function

We formulate the problem as minimizing the least-squares error:

\[ \min_{\mathbf{R}, \mathbf{t}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{p}'_i - (\mathbf{R} \mathbf{p}_i + \mathbf{t}) \|^2 \]

Centroid Subtraction

Let:

\[ \bar{\mathbf{p}} = \frac{1}{n} \sum_{i=1}^n \mathbf{p}_i, \quad \bar{\mathbf{p}}' = \frac{1}{n} \sum_{i=1}^n \mathbf{p}'_i \]

Define centered points:

\[ \mathbf{q}_i = \mathbf{p}_i - \bar{\mathbf{p}}, \quad \mathbf{q}'_i = \mathbf{p}'_i - \bar{\mathbf{p}}' \]

Substitute into the cost function:

\[ \min_{\mathbf{R}, \mathbf{t}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i + (\bar{\mathbf{p}}' - \mathbf{R} \bar{\mathbf{p}} - \mathbf{t}) \|^2 \]

Set:

\[ \mathbf{t} = \bar{\mathbf{p}}' - \mathbf{R} \bar{\mathbf{p}} \]

Then the cost reduces to:

\[ \min_{\mathbf{R}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i \|^2 \]

Optimal Rotation via SVD

Given:

\[ \min_{\mathbf{R}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i \|^2 \]

we expand using the identity:

\[ \| \mathbf{a} - \mathbf{b} \|^2 = \| \mathbf{a} \|^2 + \| \mathbf{b} \|^2 - 2 \mathbf{a}^T \mathbf{b} \]

So the cost becomes:

\[ \frac{1}{2} \sum_{i=1}^{n} \left( \| \mathbf{q}'_i \|^2 + \| \mathbf{R} \mathbf{q}_i \|^2 - 2 \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i \right) \]

Note: Since $\mathbf{R}$ is a rotation matrix, it is orthogonal: $\mathbf{R}^T \mathbf{R} = \mathbf{I}$. Therefore:

\[ \| \mathbf{R} \mathbf{q}_i \|^2 = \mathbf{q}_i^T \mathbf{R}^T \mathbf{R} \mathbf{q}_i = \| \mathbf{q}_i \|^2 \]

Thus, the terms $\| \mathbf{q}'_i \|^2$ and $\| \mathbf{q}_i \|^2$ are constant with respect to $\mathbf{R}$, so the optimization reduces to:

\[ \max_{\mathbf{R}} \sum_{i=1}^{n} \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i \]

Why the Trace?

We use the cyclic property of the trace operator:

\[ \mathrm{tr}(\mathbf{A} \mathbf{B}) = \mathrm{tr}(\mathbf{B} \mathbf{A}) \]

and the identity:

\[ \mathbf{a}^T \mathbf{M} \mathbf{b} = \mathrm{tr}(\mathbf{M} \mathbf{b} \mathbf{a}^T) \]

So:

\[ \sum_{i=1}^{n} \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i = \sum_{i=1}^{n} \mathrm{tr}(\mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i) = \mathrm{tr} \left( \mathbf{R} \sum_{i=1}^{n} \mathbf{q}_i \mathbf{q}'_i{}^T \right) \]

Let:

\[ \mathbf{W} = \sum_{i=1}^{n} \mathbf{q}_i \mathbf{q}'_i{}^T \]

Then the problem becomes:

\[ \max_{\mathbf{R} \in SO(3)} \mathrm{tr}(\mathbf{R} \mathbf{W}) \]

Why SVD?

Perform singular value decomposition (SVD) on $\mathbf{W}$:

\[ \mathbf{W} = \mathbf{U} \Sigma \mathbf{V}^T \]

Let:

\[ \mathbf{R} = \mathbf{V} \mathbf{U}^T \]

Then $\mathbf{R}$ is orthogonal because $\mathbf{U}, \mathbf{V}$ are orthogonal (from the properties of SVD):

\[ \mathbf{R}^T \mathbf{R} = (\mathbf{V} \mathbf{U}^T)^T (\mathbf{V} \mathbf{U}^T) = \mathbf{U} \mathbf{V}^T \mathbf{V} \mathbf{U}^T = \mathbf{U} \mathbf{I} \mathbf{U}^T = \mathbf{I} \]

Why $\det(\mathbf{R}) = +1$?

We want a proper rotation (element of the special orthogonal group $SO(3)$):

\[ SO(3) = \{ \mathbf{R} \in \mathbb{R}^{3 \times 3} \mid \mathbf{R}^T \mathbf{R} = \mathbf{I},\ \det(\mathbf{R}) = +1 \} \]

Reflections (which have $\det(\mathbf{R}) = -1$) do not preserve orientation or chirality of the coordinate system. In rigid body motion (e.g., robotics, SLAM), we only want rotations that preserve handedness.

Fixing $\det(\mathbf{R}) = -1$: If the computed $\mathbf{R} = \mathbf{V} \mathbf{U}^T$ has determinant $-1$, negate the last column of $\mathbf{V}$ (or $\mathbf{U}$) and recompute:

\[ \tilde{\mathbf{V}} = [\mathbf{v}_1, \mathbf{v}_2, -\mathbf{v}_3], \quad \Rightarrow \mathbf{R} = \tilde{\mathbf{V}} \mathbf{U}^T \]

Conclusion

We obtain the optimal rotation:

\[ \mathbf{R}^* = \mathbf{V} \mathbf{U}^T \quad \text{with} \quad \det(\mathbf{R}) = +1 \]

This solution maximizes the alignment between centered point sets using SVD of their cross-covariance matrix.

Summary of Key Points

ICP estimates rigid transform $(\mathbf{R}, \mathbf{t})$ that minimizes point-to-point distance.
Centroid subtraction simplifies the translation estimation.
Optimal rotation is obtained by maximizing $\mathrm{tr}(\mathbf{R} \mathbf{W})$.
The optimal rotation $\mathbf{R}$ is derived from the SVD of cross-covariance matrix $\mathbf{W} = \sum \mathbf{q}_i \mathbf{q}'_i{}^T$.
Ensure $\det(\mathbf{R}) = +1$ to preserve chirality (handedness).