What is HDR Deghosting?


HDR (or HDRI) is short for "high dynamic range image". Images we see on our computer screens range in brightness from 0 to 255, but there are many more intensities of light in the real world. This is why when we take photographs of magnificent scenes, like sunsets, it often turns out dull. High dynamic range images contain a higher range of light. The values HDR images can be processed by tone mapping to be viewed on regular monitors.

To capture an HDR image with a regular camera, one can take several pictures of the same scene, adjusting the shutter speed with each exposure, resulting in a series of pictures from dark to light. An HDR image can be made by combining the well-exposed pixels from the source images. However, there are frequently moving objects (such as people) in the scene, causing each picture to be inconsistent with the rest. This results in ghosting - objects appear semi-transparent. The process of removing the semi-transparency is called deghosting.

Favor Choosing Pixels from the Same Image

Since Khan's algorithm works on a pixel level instead of an object level, there are often times when objects get sliced because parts of them stayed in a consistent location while other parts moved. In other words, if part of the object consistantly overlaps with a pixel, then it is chosen, but if part of it isn't in a consistent location (think: stand still and shake your head), then it is not chosen.

To encourage staying within the same image, for each pixel, I took the previous weight of the neighbors on the same layer and multiplied the newly calculated weight by that. This way, if a pixel's neighbors had high weights then this pixel is more likely to have high weights.

Below is the basic result from 5 iterations, with multi-scaling, joint bilateral upscaling, and all-or-none weighting (choosing from a single image as opposed to blending). As you can see, the people on the right are being partially eliminated because they don't cover the floor or the wall consistently enough. Even the guy in the middle of the left group has some ghosting because he didn't stay in the same position.


Here is the new result: 5 iterations, with all of the above, while favoring staying in the same image


It seems that favoring the same image solves some problems, and creates others. The lady on the right is now almost complete, and the massive ghosts that appeared before are mostly gone. However, there are a lot more people in front of the church door, and the guy in the red shirt seems to have been duplicated multiple times. Also, there is a very clear "dent" to the man in the middle of the 3-person group. And he seems to have developed 4 legs.

The above result used a 3x3 neighborhood, and it would probably would benefit from having a larger number of neighbors, but there are 2 issues that make that harder than it sounds. One is that right now favoring the same image is embedded into each iteration of the Khan algorithm. If the neighborhood for this was different than the neighborhood used for Khan, then it will have to be processed separately, thus adding greatly to the run time of the program. Another problem, or rather a lack of incentive, is that Khan algorithms results differ very minimally whether we use a 3x3 neighborhood or a 9x9 neighborhood, but the run time increases significantly. Therefore I don't really think it'll be worth it to increase the neighborhood size to the whole algorithm.

Choosing a Pixel with the Largest Weight

Edit: I made a mistake in my code, so that actually wasn't the real result. Below is the real result.


If you zoom in, you can see that the leaves don't look right. They are being cut off before the edges, although it alleviated the ghosty branches a little. That sounds reasonable because the pixels near the center of the leaves are more likely to still be within the leaves even with some displacement, but the edges are probably invading on space that's mostly sky.

The mistake I was making was pretty simple, and stupid. Instead of "result = tmp", I wrote "tmp = result". So each pixel in the noisy image you see below is the last well-exposed pixel in the stack, which is mostly the last image in the stack.

/Edit

The simplest way to eliminate the stubborn traces of ghosting is to just ignore the lower weighted pixels, and this is the result of using only the pixels with the largest weights:



Not only did this eliminate the ghost person, it also completely eliminated the blurriness of the branches (click image for zoom in). However, the price to pay for this is a lot of noise. One of Khan algorithm's side benefits is that it eliminates noise by blending different images, so choosing a single image's pixel no longer gives that benefit. In anticipation of this, I tilted the initial weights to favor pixels near the brightness of 220/255, but that didn't seem to help. Maybe I didn't tilt the correct weights.

Using the Pyramid on the Khan Algorithm

One neat trick that can improve most image processing algorithms is rescaling the source to several powers of 2 fraction of the original image and iteratively going through the images from small to large. For example, I would use 1/16, 1/8, 1/4, and 1/2 of the original size. This is faster because there are less pixels to iterate through in the small image and a small neighborhood in the small image is equal to a larger neighborhood in the original image. Each later iteration would only need to search through a neighborhood big enough to refine what had been lost through the last downsizing.

However, when this approach is used on the Khan algorithm, it actually produced worse results. Below is the result of the regular 5-iteration process:


Here is the result of 5 iterations with pyramids. Obviously worse.


But it's even worse when compared to a regular 3-iteration process:


Hm, that's puzzling... My first thought was that maybe downsizing is causing the target pixel to be mixed with the neighbors, thus making it more similar and higher weighted than it should be, but upon further investigation, that doesn't seem to be the main cause. Below is the upscaled result of an iteration on 1/16 of the image:



Can you see the edges where each (gigantic 16x16) weight meets another? The problem was that the ghost was so "thin" that downsizing it either made it non-existent, or made the surrounding "good" areas be considered ghosts. The next iteration would recover the good pixels where they were incorrectly considered ghosts. But for the ghost pixels that were incorrectly considered good, the combined result of those 2 iterations (1 small and 1 regular) is practically the equivalent of 1 regualr iteration.

I'm currently trying to implement a joint bilateral upscaling, which takes into consideration both the radiance and spatial proximity. Hopefully this will make the result of the small iterations less erroneous, but I'm still not sure how it would fair if the ghost was pixel-thin. Maybe if I can quickly go through an interation of Khan if the pixel is too different than its neighbors...

More Khan Results (the bad one)

There are some cases where Khan's algorithm doesn't work at all. In the previous post (the fountain scene), the tree branches could not be deghosted because the wind movement caused each image to be different, preventing the algorithm from finding a background. Even with large movements, as long as the images are different enough, the algorithm will have trouble. One example of such a case is when taking pictures of crowded places.

Source images:






Even though this scene is far from crowded, there's a lot of overlap in where the movements occur.

After 1 iteration:


After 5 iterations:


After 12 iterations:


The only difference between the first iteration and the 12th iteration is.... the error was more apparent in the latter one =P. One way to attempt at a solution is to encourage choosing from the same image when the weights are indecisive, but then the order in which the pixels are processed becomes very important, and when will we know it's ok to switch to a different image? We can't simply choose the input with the largest weight, because that could cut into some objects. For example, the body of the man on the right has a relatively high weight, while his head has a lower weight than the wall, and we don't want to decapitate him. This will require calculating what's the most coherent "edge" for switching images, which is very slow. We could use the calculated weights to help make the decision (similar weights = better seams), but I haven't sorted out the details yet...

Khan Algorithm Results

Most deghosting algorithms use radiance differences to determine the position of objects, then they choose the best exposure of that object, and use that for that part of the picture. The "Khan" algorithm (Ghost Removal in High Dynamic Range Images) operates on a per pixel level instead. It assumes that there are more images that depict the background than those that depict the foreground, and that if a pixel's neighbors are in the background, then the pixel has a high probability of being the background. In each iteration, each pixel is given a weight based on how likely it is to be in the background. At the end, instead of picking the pixel with the most weight, it takes all the inputs and combines them according to the weight. This produces some noise removal effects in addition to deghosting.

I implemented the algorithm as a C++ program, and the results look similar to those on the algorithm's webpage.

These were the source images:






After 1 iteration:


After 5 iterations:


After 12(phew) iterations:









Even though the ghost is pretty much non-existent after 12 iterations, there's still traces of the head (it's darker than the rest of the bridge) if you look really really hard. Also, the small branch movements did not get deghosted very well, expected, because the assumption that most images represent the background and are identical do not hold for these branches.



The branches would need some additional constraints to deal with, but the small leftover of the person could be removed if I just ignored the lowest weight. But, if I just ignore an image, the algorithm would lose it's noise removal qualities. So I decided to distribute the final weights logarithmically (I think) instead of linearly, causing the smaller weights to become even smaller without changing the large weights much. If I do this by a factor of 5 , then the ghost is completely gone, but the "do not enter" sign starts to show signs of noise (it's hard to see because it's jpeg, maybe I should upload another image some time...). If I weigh the final weights by a factor of 2, however, the ghost still remains. Finding the distribution nirvana is on my to-do list, after I get pyramid implementation done (will blog about that soon).


Factor of 5:




Factor of 2: