Estimation Fundamentals

A comprehensive guide to probability theory and statistical estimation for robotics

Last Updated: July 2023 Reading Time: 15 minutes

Introduction

In this tutorial, I will go over some of the fundamental statistics that every roboticist will encounter at some point in their life. I am assuming the audience will know basic high school statistics and probability theory.

Motivation

No sensor is perfect! Imagine Rami bot, a rumba-like robot with a sonar distance sensor attached. Using the sensor we wanted to predict where the obstacles are with respect to the world frame (start point of the Rami). Since the sensor is not perfect, it might be noisy and predict the obstacle distance to be 1ft short than what it is.

To help improve Rami's obstacle estimates, being able to quantify the location of the obstacle is important to avoid crashes. For example, it would be helpful to Rami if we can say that the obstacle is ±0.5 ft with a confidence of 90%. This is much more useful for predicting and planning a possible route compared to the deterministic value we get from noisy measurements. This framework can be achieved using probability theory.

Note: Throughout this tutorial, we'll use Rami bot as our running example to illustrate how these statistical concepts apply to real-world robotics problems.

Axiomatic Probability Theory

Sample space: Universe of all possible outcomes of the experiments
\(S = \cup s_i\)

Let \(s_i\) denote a possible outcome of an experiment. The collection of \(s_i\) denoted by \(S\) should be such that \(s_i\) are:

  • Mutually Exclusive: \(A_i \cap A_j = \emptyset; i \neq j\)
  • Exhaustive: Need to cover all possibilities of the outcome
Event: Any subset of the sample space \(S\)

Example: Coin flip
Possible samples in the experiment are either heads or tails

\(S =\) { tails, heads}

\(A =\) event heads → {heads}

Random Variable: Mapping from a sample space to a number. Can take multiple values
\(X : S \rightarrow \mathbb{R}\)

Notation: In literature, it is conventional to use capital letters to indicate random variables.

Example: Coin flip

\(X =\) {heads, tails}

If it takes a specific value, we indicate by \(P(X=\) heads) i.e., probability of getting heads.

Properties:

  • The probability distribution must sum to 1
    \[\sum_{x} P(X = x) = 1\]
  • Always non-negative
    \[P(X = x) \geq 0\]

For simplicity, instead of using \(P(X=x)\) from here, I will use \(P(x)\)

Probability Density

The probability density function is defined as:

\[f(x) = \frac{\partial F(x)}{\partial x}\] (1)

where \(F(x)\) is the probability distribution function.

In the robotics community, the Gaussian/normal density function is widely used.

Parametrized by \(\mu\): mean, \(\sigma^{2}\): variance

\[p(X=x)=p(x)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2 \sigma^{2}}}\] (2)

Multi-variate case:
Here \(\boldsymbol{x}\) is a vector instead of a scalar

\[P(\boldsymbol{x})=\frac{1}{\sqrt{(2 \pi)^{N} \operatorname{det}(\mathbf{\Sigma})}} \exp \left(-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right)\] (3)

\(\boldsymbol{\Sigma}\) is called the covariance matrix.
Properties:

  • Positive semi-definite
  • Symmetric

Don't worry if the above equation looks daunting. You'll see it in various books and will get comfortable with it over time. The key takeaways are the properties of the covariance matrix, and that the two parameters—mean and variance—are all that's needed to define a normal distribution for a state \(x\).

Conditional Probability

Often random variables carry information about other random variables. For example, LIDAR measurements can carry information about the robot's position.

\(p(x\mid y) = p(X = x\mid Y = y)\) i.e., probability of x given y

\[p(x\mid y) = \frac{p(x,y)}{p(y)}, \quad p(y) > 0\] (4)

If x and y are independent:

\[p(x,y) = p(x) \cdot p(y)\] (5)

Therefore:

\[p(x\mid y) = \frac{p(x) \cdot p(y)}{p(y)} = p(x)\] (6)

This makes sense, right? If \(y\) has no information about \(x\), then querying \(y\) won't contribute anything.

Most Used Theorems

Below are the most used theorems in solving probabilistic proofs and problems in various books:

Total Probability

\[p(x) = \sum_{Y} p(x\mid y) p(y)\] (7)
Total probability disjoint sets
Total probability disjoint sets

\(B\) is the disjoint union of the intersection \((B \cap A_{i})\) i.e.,

\[B = \cup_{i = 3}^{6} A_{i}\] (8)
\[p(B) = p(\cup_{i = 3}^{6} (B \cap A_{i}))\] (9)
\[= \sum_{i = 3}^{6} p(B, A_{i})\] (10)
\[= \sum_{i = 3}^{6} p(B\mid A_{i}) p(A_{i})\] (11)

Chain Rule

\[p\left(x_{1}, \ldots, x_{n}\right)=p\left(x_{n} \mid x_{n-1}, \ldots, x_{1}\right) p\left(x_{n-1}, \ldots, x_{1}\right)\] (12)

Generally, we can formulate as follows:

\[p\left(\cap_{k=1}^{n} x_{k}\right)=\prod_{k=1}^{n} p\left(x_{k} \mid \cap_{j=1}^{k-1} x_{j}\right)\] (13)

Bayes Rule

\[p(x\mid y) = \frac{p(y\mid x) \cdot p(x)}{p(y)} = \frac{p(y\mid x) \cdot p(x)}{\sum_{x'} p(y\mid x') p(x')} = \frac{\text{likelihood} \cdot \text{prior}}{\text{total probability}}\] (14)

Sometimes the above can be written as follows:

\[p(x\mid y) = \eta \cdot p(y\mid x) \cdot p(x)\] (15)

The Bayes rule can also be used for more than one random variable:

\[p(x \mid y, z)=\frac{p(y \mid x, z) p(x \mid z)}{p(y \mid z)}=\frac{p(z \mid x, y) p(x \mid y)}{p(z \mid y)}\] (16)

Conditional Independence

\[p(x, y \mid z) = \frac{p(x, y, z)}{p(z)} = p(x\mid z) \cdot p(y \mid z)\] (17)

Expectation

\[E[X] = \sum_{x} x \cdot p(x)\] (18)

Property:

  • Linearity
    \[E[aX + b] = a \cdot E[X] + b\] (19)

Variance

Measures squared expected deviation \(\sigma\)

\[\begin{array}{c} \boldsymbol{\Sigma}(X, Y)=E[(X-E[X])(Y-E[Y])] \\ \Sigma(X, Y)=\left[\begin{array}{cc} \sigma^{2}(x) & \sigma(x, y)=\sigma(x) \sigma(y) \\ \sigma(y, x)=\sigma(y) \sigma(x) & \sigma^{2}(y) \end{array}\right] \end{array}\] (20)

Properties:

  • Bilinear: \(\sigma(ax + by, z) = a\sigma(x,z) + b\sigma(y,z)\)
  • Symmetric: \(\sigma(x,y) = \sigma(y,x)\) (very handy)
  • Positive semi-definite: \(\sigma^2(x) = \sigma(x,x) \geq 0\) \(\forall\) random variable

If \(\sigma(x,x) = 0\) then the random variable is constant.

If a vector of random variables is transformed by a linear transform \(A\), then the covariance is also transformed as \(\Sigma(Ax) = A \Sigma(x) A^T\)

Entropy

Measures information content

\[H(X) = E[ -\log_{2} p(X)]\] (21)
\[= -\sum_{x} p(x) \log_{2} p(x)\] (22)

This can be very useful for estimating the gain of information when our Rami takes a specific action. Usually, we seek to minimize the entropy to increase the gain of information.

Bayes Filter

Complete state: \(x_t\) is a complete state if it encompasses the necessary information about the past states in order to predict future states. Therefore, \(x_t\) will be independent from past events like past measurements \(Z_{0:t-1}\), actions \(u_{0:t-1}\), and previous states \(x_{0:t-1}\)
Belief: \[bel(x_t) = p(x_t \mid z_{1:t}, u_{1:t})\]

The Bayes filter is the foundation for many robotics algorithms including Kalman filters, particle filters, and more. We'll explore these in future tutorials.

Sources