Inferring the characteristics of the surface from optical data

J Gómez-Dans (NCEO & UCL)

  • RT theory allows us to explain the scattering & absorption of photons
  • ... by describing the optical properties and structure of the scene
  • However, we want to find out about the surface from the data!
  • E.g. we want to infer LAI, chlorophyll, ... from reflectrance measurements
  • The inverse problem....

The inverse problem

  • An RT model $\mathcal{H}$ predicts directional reflectance factor, $\vec{\rho}_{m}(\Omega, \Omega')$
    • $\dots$ as a function of a set of input parameters: LAI, chlorophyll concentration, equivalent water thickness...
  • $\mathcal{H}$ combination leaf RT model, canopy RT model and soil RT model
    • PROSPECT (Liberty?)
    • SAIL (ACRM, Semidiscrete, ...)
    • Linear mixture of spectra assuming Lambertian soil (Walthall, Hapke, ...)
  • For this lecture, we'll refer to $\mathcal{H}$ as the observation operator
  • Stack input parameters into vector $\vec{x}$.
  • Other information e.g. illumination geometry, etc
$$ \mathcal{H}(\mathbf{x}, I) = \vec{\rho}_m(\Omega, \Omega') $$
  • Our task is to infer $\vec{x}$ given observations $\vec{\rho}(\Omega, \Omega')$
  • The model couples the observations and our parameters
  • In some cases, we might be able to provide an analytic inversion
  • However, we have ignored observational uncertainties
  • We have also ignored the model uncertainty (inadequacy): a model is not reality
  • These uncertainties will translate into uncertainty into our inference of $\vec{x}$
  • Is there a framework for this?
  • Least squares: minimise data-model mismatch
  • Assuming observational noise,
$$ \mathcal{H}(\mathbf{x}, I) = \vec{R}_m(\Omega, \Omega') $$
  • Any solution that falls within the error bars cannot be discarded
$$ \|\mathcal{H}(\vec{x}, I) - \vec{R}\|^2 \le \epsilon_{obs} $$
  • Moreover, models very linear
  • Haven't even considered model inadequacy!
  • Eg simulate some healthy wheat canopy (LAI=4.2), and then run our RT model with random inputs
  • Accept solutions that fall within the observational error... ($\Rightarrow$ brute force montecarlo!)

Synthetic MC example

  • Large uncertainty in estimate of state.
  • $\Rightarrow$ limited information content in the data.
  • Need more information..
    • Add more observations
    • Decrease observational uncertainty
  • Need to consider sensitivity of the model.

Reverend Bayes to the rescue

  • We assume that parameter uncertainty can be encoded if we treat $\vec{x}$ as a probability density function (pdf), $p(\vec{x})$.
    • $p(\vec{x})$ encodes our belief in the value of $\vec{x}$
    • Natual treatment of uncertainty
  • We are interested in learning about $p(\vec{x})$ conditional on the observations $\vec{R}$, $p(\vec{x}|\vec{R})$.
  • Bayes' Rule states how we can learn about $p(\vec{x}|\vec{R})$
  • In essence, Bayes' rule is a statement on how to update our beliefs on $\vec{x}$ when new evidence crops up
$$ p(\vec{x} | \vec{R}, I ) =\frac{ p (\vec{R} | \vec{x}, I)\cdot p(\vec{x},I)}{p(\vec{R})}\propto p (\vec{R} | \vec{x}, I)\cdot p(\vec{x},I) $$
  • $p(\vec{R}|\vec{x},I)$ is the likelihood function
    • encodes the probability of $\vec{R}$ given $\vec{x}$, and any other information ($I$)
  • $p(\vec{x})$ is our a priori belief in the pdf of $\vec{x}$
  • $p(\vec{R}$ can be thought of as normalisation constant, and we'll typically ignore it
  • A way to picture Bayes' rule:
$$ p(\textsf{Hypothesis} | \textsf{Data},I) \propto p(\textsf{Data} | \textsf{Hypothesis},I) \times p(\textsf{Hypothesis} | I) $$

The prior $p(\vec{x})$

  • Encodes everything we know about $\vec{x}$ before we even look at the data
  • In some cases, we can have uninformative priors...
  • ... but the real power is that it allows us to bring understanding, however weak to the problem!

The likelihood $p(\vec{R}|\vec{x})$

  • The likelihood states is our data generative model
  • It links the experimental results with the quantity of inference
  • It includes our observations, their uncertainties, but also the model and its uncertainties
  • Assume that our model is able perfect (ahem!), so if $\vec{x}$ is the true state, the model will predict the observation exactly
  • Any disagreement has to be due to experimental error. We'll assume it's additive:
$$ \vec{R}=\mathcal{H}(\vec{x}) + \epsilon $$
  • Assume that $\epsilon \sim \mathcal{N}(\vec{\mu},\mathbf{\Sigma}_{obs})$
    • For simplicity, $\vec{\mu}=\vec{0}$
  • The likelihood is then given by $$ p(\vec{R}|\vec{x})\propto\exp\left[-\frac{1}{2}\left(\vec{R}-\mathcal{H}(\vec{x})\right)^{\top}\mathbf{\Sigma}_{obs}^{-1}\left(\vec{R}-\mathcal{H}(\vec{x})\right)\right] $$

  • Have you ever seen this function?

    • Maybe its univariate cousin...
  • Can you say something about (i) the shape of the function and (ii) interesting points?

The posterior

  • We can simply multiply the likelihood and prior to obtain the posterior
    • In most practical applications with a non-linear $\mathcal{H}$, there is no closed solution
    • However, if $\mathcal{H}$ is linear ($\mathbf{H}$), we can solve directly
  • Simple 1D case, assuming Gaussian likelihood & prior: Simple 1d case
  • What can you say about the information content of the data in this synthetic example?

Some simple 1D Gaussian maths...

  • Try to infer $x$ from $y$, using an identity observation operator (i.e., $x=y$, so $\mathcal(x)=1$ and Gaussian noise:
$$ p(y|x) = \frac{1}{\sqrt{2\pi}\sigma_{obs}}\exp\left[-\frac{1}{2}\frac{(x-y)^2}{\sigma_{obs}^2}\right]. \qquad\text{Likelihood} $$
  • Assume that we only know that $x$ is Gaussian distributed with $\mu_p$ and $\sigma_0$:
$$ p(x) = \frac{1}{\sqrt{2\pi}\sigma_0}\exp\left[-\frac{1}{2}\frac{(x-\mu_p)^2}{\sigma_0^2}\right]\qquad\text{Prior} $$$$ p(x|y) \propto \frac{1}{\sigma_{obs}\sigma_0}\exp\left[-\frac{1}{2}\frac{(x-\mu_p)^2}{\sigma_0^2}\right]\cdot \exp\left[-\frac{1}{2}\frac{(x-y)^2}{\sigma_{obs}^2}\right].\qquad\text{Posterior} $$
  • The posterior distribution is indeed a Gaussian, and its mean and std dev can be expressed as an update on the prior values
    • Weighted by the relative weighting of the uncertainties
$$ \mu_p = \mu_0 + \frac{\sigma_0^2}{\sigma_0^2 + \sigma_{obs}^2}(y - \mu_0). $$$$ \sigma_p^2 = \sigma_0^2\cdot \left( 1- \frac{\sigma_0^2}{\sigma_0^2 + \sigma_{obs}^2} \right) $$
  • If we now had a new measurement made available, we could use the posterior as the prior, and it would get updated!

Ill posed problems

  • Stepping back from Bayes, consider the logarithm of the likelihood function:
$$ \log_{e}\left(p(\vec{R}|\vec{x})\right)\propto-\frac{1}{2}\left(\vec{R}-\mathcal{H}(\vec{x})\right)^{\top}\mathbf{\Sigma}_{obs}^{-1}\left(\vec{R}-\mathcal{H}(\vec{x})\right) $$
  • Maximum occurs when the model matches the observatios
  • So we can just maximise the model-data mismatch and be done, right?
  • Remember our generative model, and think what this implies:
$$ \|\mathcal{H}(\vec{x}, I) - \vec{R}\|^2 \le \epsilon_{obs} $$
  • Formally, any solution that falls within the error bars is a reasonable solution

Priors

  • Simplest prior constraints might encode
    • range of parameters
    • physical limits of parameters (e.g. a mass must be $>0$)
  • More sophisticated priors might encode more subtle information such as
    • expected values from an expert assessment
    • expected values from a climatology
  • It is usually hard to encode the prior information as a pdf
    • We tend to use Gaussians as easy to parameterise

Variational solution

  • If prior(s) & likelihood Gaussian...
  • And if all terms are linear
  • $\Rightarrow$ can solve for posterior by minimising $J(\vec{x})$:
$$ J(\vec{x})= \overbrace{\frac{1}{2}\left[\vec{x}-\vec{\mu_{x}}\right]^{\top}\mathbf{C}_{prior}^{-1}\left[\vec{x}-\vec{\mu_{x}}\right]}^{\textrm{Prior}} + \underbrace{\frac{1}{2}\left[\vec{R}-\mathcal{H}(\vec{x})\right]^{\top}\mathbf{C}_{obs}^{-1}\left[\vec{R}-\mathcal{H}(\vec{x})\right]}^{\textrm{Observation}} + \cdots $$
  • You can add more constraints (more observations, more prior constraints...)
  • Uncertainty given by inverse of Hessian @ minimum of $J(\vec{x})$
    • Matrix of second derivatives
  • Also works OK for weakly non-linear systems!

More subtle priors

  • What about the expectation of
    • spatial correlation of the state
    • temporal correlation of the state?
  • Can encode expectation of smoothness in the prior
  • What if we have a model of e.g. LAI evolution with e.g. thermal time?
  • ... Or a full-blown vegetation/crop model?

The concept of a model

  • Smoothness constraint is a simple vegetation model
    • LAI today = LAI tomorrow
  • This temporal evolution model is WRONG
  • More complex mode likely to also be wrong $\Rightarrow$ uncertainty!
  • Encode "closeness to model" $\mathcal{M}(\vec{x})$ constraint as a Gaussian pdf: $$ p(\vec{x}|\mathcal{M},I_m)\propto -\frac{1}{2}\exp\left[ - (\vec{x} - \mathcal{M}(I_m)^{\top}\mathbf{C}_{model}^{-1}(\vec{x} - \mathcal{M}(I_m)\right] $$

Simplest model

  • Smoothness $$ x_{k+1} = x_{k} + \mathcal{N}(0,\sigma_{model}) $$
  • We can encode this as a matrix, if e.g. we stack $x$ over time in a vector...: $$ \begin{pmatrix} 1 &-1& 0 & 0 & \cdots\\ 0 &1 &-1 & 0 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \cdots \\ \end{pmatrix}\vec{x} $$

  • It's a linear form!

  • Model can also be applied in space ($\Rightarrow$) Markov Random Field

Spatial example

  • Agricultural irrigation area near Córdoba
  • Landsat TM NDVI map

riegos

  • Vertical (red) and horizontal (blue) differences between neighbouring pixels
  • Over a given threshold edges

eoldas_ng

  • Python tool that allows you to build variational EO problems
  • In development
  • Main bottleneck: slow RT models
    • Emulation of models

Final remarks

  • Inversion of a model to infer input parameters is ill posed
  • You need to add more information
    • priors
    • more observations
  • Track all the uncertainties
  • Physically consistent way of merging observations
  • Variational solution can be efficient
  • Apply smoothness and model constraints as priors
  • eoldas_ng tool