Iterative Closest Point (ICP) - SVD Derivation

Motivation and Use of ICP in Robotics

The Iterative Closest Point (ICP) algorithm is a cornerstone in 3D data alignment, crucial for robotics, SLAM (Simultaneous Localization and Mapping), and 3D reconstruction. The goal is to align two sets of 3D points (usually captured from LIDAR, stereo cameras, or depth sensors) by estimating a rigid transformation (rotation and translation).

ICP is used extensively in:

The beauty of ICP is that it does not require camera intrinsics or explicit correspondences — just raw 3D data and an initial guess (or not). The most computationally elegant step is solving for the best rigid transform using SVD.

Problem Setup

Given two sets of corresponding 3D points:

\[ \mathcal{P} = \{ \mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_n \}, \quad \mathcal{P}' = \{ \mathbf{p}'_1, \mathbf{p}'_2, \dots, \mathbf{p}'_n \} \]

We seek a rigid transform $(\mathbf{R}, \mathbf{t})$ such that:

\[ \mathbf{p}'_i = \mathbf{R} \mathbf{p}_i + \mathbf{t} \]

Note: No camera intrinsics are required; only 3D correspondences. Applicable to general SLAM setups.

Objective Function

We formulate the problem as minimizing the least-squares error:

\[ \min_{\mathbf{R}, \mathbf{t}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{p}'_i - (\mathbf{R} \mathbf{p}_i + \mathbf{t}) \|^2 \]

Centroid Subtraction

Let:

\[ \bar{\mathbf{p}} = \frac{1}{n} \sum_{i=1}^n \mathbf{p}_i, \quad \bar{\mathbf{p}}' = \frac{1}{n} \sum_{i=1}^n \mathbf{p}'_i \]

Define centered points:

\[ \mathbf{q}_i = \mathbf{p}_i - \bar{\mathbf{p}}, \quad \mathbf{q}'_i = \mathbf{p}'_i - \bar{\mathbf{p}}' \]

Substitute into the cost function:

\[ \min_{\mathbf{R}, \mathbf{t}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i + (\bar{\mathbf{p}}' - \mathbf{R} \bar{\mathbf{p}} - \mathbf{t}) \|^2 \]

Set:

\[ \mathbf{t} = \bar{\mathbf{p}}' - \mathbf{R} \bar{\mathbf{p}} \]

Then the cost reduces to:

\[ \min_{\mathbf{R}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i \|^2 \]

Optimal Rotation via SVD

Given:

\[ \min_{\mathbf{R}} \frac{1}{2} \sum_{i=1}^{n} \| \mathbf{q}'_i - \mathbf{R} \mathbf{q}_i \|^2 \]

we expand using the identity:

\[ \| \mathbf{a} - \mathbf{b} \|^2 = \| \mathbf{a} \|^2 + \| \mathbf{b} \|^2 - 2 \mathbf{a}^T \mathbf{b} \]

So the cost becomes:

\[ \frac{1}{2} \sum_{i=1}^{n} \left( \| \mathbf{q}'_i \|^2 + \| \mathbf{R} \mathbf{q}_i \|^2 - 2 \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i \right) \]

Note: Since $\mathbf{R}$ is a rotation matrix, it is orthogonal: $\mathbf{R}^T \mathbf{R} = \mathbf{I}$. Therefore:

\[ \| \mathbf{R} \mathbf{q}_i \|^2 = \mathbf{q}_i^T \mathbf{R}^T \mathbf{R} \mathbf{q}_i = \| \mathbf{q}_i \|^2 \]

Thus, the terms $\| \mathbf{q}'_i \|^2$ and $\| \mathbf{q}_i \|^2$ are constant with respect to $\mathbf{R}$, so the optimization reduces to:

\[ \max_{\mathbf{R}} \sum_{i=1}^{n} \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i \]

Why the Trace?

We use the cyclic property of the trace operator:

\[ \mathrm{tr}(\mathbf{A} \mathbf{B}) = \mathrm{tr}(\mathbf{B} \mathbf{A}) \]

and the identity:

\[ \mathbf{a}^T \mathbf{M} \mathbf{b} = \mathrm{tr}(\mathbf{M} \mathbf{b} \mathbf{a}^T) \]

So:

\[ \sum_{i=1}^{n} \mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i = \sum_{i=1}^{n} \mathrm{tr}(\mathbf{q}'_i{}^T \mathbf{R} \mathbf{q}_i) = \mathrm{tr} \left( \mathbf{R} \sum_{i=1}^{n} \mathbf{q}_i \mathbf{q}'_i{}^T \right) \]

Let:

\[ \mathbf{W} = \sum_{i=1}^{n} \mathbf{q}_i \mathbf{q}'_i{}^T \]

Then the problem becomes:

\[ \max_{\mathbf{R} \in SO(3)} \mathrm{tr}(\mathbf{R} \mathbf{W}) \]

Why SVD?

Perform singular value decomposition (SVD) on $\mathbf{W}$:

\[ \mathbf{W} = \mathbf{U} \Sigma \mathbf{V}^T \]

Let:

\[ \mathbf{R} = \mathbf{V} \mathbf{U}^T \]

Then $\mathbf{R}$ is orthogonal because $\mathbf{U}, \mathbf{V}$ are orthogonal (from the properties of SVD):

\[ \mathbf{R}^T \mathbf{R} = (\mathbf{V} \mathbf{U}^T)^T (\mathbf{V} \mathbf{U}^T) = \mathbf{U} \mathbf{V}^T \mathbf{V} \mathbf{U}^T = \mathbf{U} \mathbf{I} \mathbf{U}^T = \mathbf{I} \]

Why $\det(\mathbf{R}) = +1$?

We want a proper rotation (element of the special orthogonal group $SO(3)$):

\[ SO(3) = \{ \mathbf{R} \in \mathbb{R}^{3 \times 3} \mid \mathbf{R}^T \mathbf{R} = \mathbf{I},\ \det(\mathbf{R}) = +1 \} \]

Reflections (which have $\det(\mathbf{R}) = -1$) do not preserve orientation or chirality of the coordinate system. In rigid body motion (e.g., robotics, SLAM), we only want rotations that preserve handedness.

Fixing $\det(\mathbf{R}) = -1$: If the computed $\mathbf{R} = \mathbf{V} \mathbf{U}^T$ has determinant $-1$, negate the last column of $\mathbf{V}$ (or $\mathbf{U}$) and recompute:

\[ \tilde{\mathbf{V}} = [\mathbf{v}_1, \mathbf{v}_2, -\mathbf{v}_3], \quad \Rightarrow \mathbf{R} = \tilde{\mathbf{V}} \mathbf{U}^T \]

Conclusion

We obtain the optimal rotation:

\[ \mathbf{R}^* = \mathbf{V} \mathbf{U}^T \quad \text{with} \quad \det(\mathbf{R}) = +1 \]

This solution maximizes the alignment between centered point sets using SVD of their cross-covariance matrix.

Summary of Key Points