Breaking news: NL-means is local!

The Non-Local Means algorithm 1 is a very popular image denoising algorithm.

Besides its efficiency at denoising, it’s also one of the main causes to the non-locality trend that has arisen in Image Processing conferences ever since. Yet, there is this recent paper on Arxiv 2: Non-Local means is a local image denoising algorithm by Postec, Froment and Vedel.

Some background on NL-means

NL-means is a very simple, yet highly efficient image denoising algorithm that preserves well image textures even with severe noise. In a nutshell, it computes the denoised version of a pixel as a weighted average of all the other pixels in a learning neighborhood around it. The weights are proportional to the similarity of the Gaussian patches extracted around the pixel to denoise and the current candidate from the learning neighborhood.

You can find an introduction to NL-means in these blog posts: 1, 2, 3.

Since the learning neighborhood can be as big as the whole image, the result of the weighted average is said to be a non-local mean. However, since this greedy search process has to be applied to each pixel of an image, the size of the learning neighborhood is usually kept much smaller than the image 3.

Most of the literature since has focused on accelerating NL-means via some pre-selection process in order to consider in the learning neighborhood only the patches that have a significant contribution, thus ensuring that CPU time is spent only doing useful computations. However, the consequence is that the validity of the non-local approach of NL-means has never been fully studied.

Summary of the paper

The goal of this paper is to check the validity of the following claim:

Small learning neighborhood sizes are just a computational artifact. A non-local algorithm should explore the whole image to get the best results.

This is achieved by measuring the average PSNR of the denoising result over 50 images for different sizes of the learning neighborhood. he patch size is set to 7x7 pixels. The test images are corrupted by Gaussian noise of standard deviation 204.

Tested algorithms

Three variants of NL-means are compared in the experiments:

  • the original one. Let’s call it NL-means;
  • a modified version where only the 80 most similar patches are used to denoise a pixel, instead of all the patches in the learning neighborhood. Let’s call this one VIP5-NL-means;
  • an ideal version of NL-means where the weights of the patches are given by the true, noiseless image. This one is called Oracle NL-means in the paper.

The last two algorithms are also tested in another variant where the value of the central pixel is not used when computing the weight between two patches. This small modification diminishes the bias introduced by the value of the central pixel in a patch: since the patches are modulated by a 2D Gaussian before computing the weight, the central pixel value has more importance than the others.

Experimental results

The average PSNR curve (Figure 1 of the paper) then refutes the initial claim. It clearly shows that when the radius of the learning neighborhood is bigger than a 5, the average PSNR decreases for both NL-means and VIP-NL-means. This is not changed by taking into account or nit the centrak pixel of the patches. Oracle-NL-means, on the other hand, does not exhibit this behaviour.

Some observations

Not quite unexpected

It is actually not the first time that the local behavior of NL-means is observed. It is however the first time that a study is dedicated to this particular point.

For example, it was noted before by J. Salmon. On my side, I observed it several years ago (circa 2006) while working wit Jean-Michel Morel on a super-fast NL-means that was completely non-local, and it does also have a dedicated figure in my PhD thesis. 6.

A probable cause

The common explanation for this phenomenon is that many patches within a small distance from the reference patch are wrong matches. For example, patches that include pixels on both side of an edge can be slightly shifted without an important weight penalty, in particular when there is a lot of noise! 7.

By enlarging the learning neighborhood, more of these spurious matches are found, thus yielding this apparent paradox:

The more non-local NL-means is, the worst are the results.

Postec et al.’s paper backs this explanation by showing experimentally that the results of oracle NL-means do actually improve with the size of the learning neighborhood, unlike genuine NL-means. Thus, when the patch weights are perfectly known a priori, there is no penalty implied by being truly non-local.

How comes it’s never written?

It is seldom remarked by researchers for two major reasons:

  • people tend to stick to small learning neighborhood sizes to keep the computations fast enough, thus avoiding (on purpose or not) bad sizes. For 7-by-7 patches, Postec et al. measure an optimal learning neighborhood size of 4 pixels;
  • most of the effort that is put in speeding-up NL-means aims at finding a good pre-selection function. The goal of this function is to kick out of the weighted mean non-relevant patches, thus limiting this adverse phenomenon.

What can we do with this information?

The full consequences of Postec et al.’s paper still have to be found, but there’s already one positive side effect. It’s probably a good vindication of BM3D and likes’s 2-stage denoising process:

  • the first stage is an aggressive non-local denoising procedure that is too strong and has undesirable side effects;
  • the second stage computes a better non-local denoising result by using the intermediate output as the basis for the weights estimation.

Since the first stage applies a harsh denoising function, it estimates an image with very sharp edges and flat regions. This avoids the small variations of patch distances caused by the noise and makes the geometric and textural features more important.

Conclusion: is it the end of non-locality?

Clearly, it’s not the end of NL-means. Besides the fact that it’s still hard to beat it in pure denoising tasks, NL-means had the invaluable interest of provoking this huge non-local trend and has lead to the study of new approaches to Image Processing tasks, some of which are clearly promising8.

It is also remarkable that such a simple algorithmic scheme can still have unexpected behaviors almost 10 years after its initial disclosure! This clearly shows our poor understanding of Image Processing and image models and bids us to stay humble.

In a next post, we’ll check the results of Postec’s paper, hopefully using an IPython notebook9. Stay tuned!

If you don’t want to miss the next post, you can register to the blog’s feed or follow me on Twitter !


  1. Let’s say NL-means for short in the remainder of this post. ^
  2. In French, published first on Nov. 15th, 2013 ^
  3. Hey remember, NL-means was published in 2005 in CVPR’05 / SIAM MMS. That’s two years before the first iPhone, and at that time CPU and RAM were still expensive. ^
  4. The initial dynamic range of the images is the classical [0,255] range. ^
  5. Very Important Patches ^
  6. I’m not telling you where you’ll have to read it :-p ^
  7. This is very similar to the adherence phenomenon in block matching. ^
  8. I think of the non-local dictionary based algorithms for example here. ^
  9. Or on github if I don’t manage to do it. ^

Related