Image acquisition

Image and video processing in multi-view scenarios

Today’s available computing power enables use of computer vision techniques in manifold industrial and end-user devices including embedded systems. A core discipline here is stereoscopic reconstruction of a three-dimensional model of a scene recorded by multiple cameras. Such models not only allow us to create immersive interactive applications for virtual reality. 3D information of an observed scene can also be of great use for subsequent processing tasks such as object recognition, counting, separation, or classification, which are used, e.g., in industrial process automation and surveillance.

In such scenarios, the main focus of 3D reconstruction is shifted from accuracy and detailed display to robustness and scalability of the system, as well as real-time capabilities, which is particularly challenging in multi-view setups. Moreover, such use-cases differ from consumer applications in the use of different camera models, like fish-eye or catadioptric cameras, for example. A scalable, efficient solution for obtaining a dense point cloud from multiple views is the “plane sweep” approach. Here, the photo-consistency between camera images is evaluated on their back-projections on pre-defined plane hypotheses, directly within object space.

The following example depicts a simulation of the plane sweep approach. In this experiment, 3D structure has been reconstructed from 49 input fish-eye views on a voxel grid with a resolution of 10 vx/m. (Please click on the images for zoomed versions)

Exemplary images of the recorded scene:

Computed point cloud:


Image reconstruction

Reconstruction of Non-Regularly Sampled Data

The pixels of a low resolution image sensor are masked to obtain a non-regular sampling pattern. By assuming sparsity and using Frequency Selective Extrapolation (FSE), which already gives good results in error concealment, a high resolution image can be reconstructed.

A camera with high resolution does not need a high resolution sensor anymore and therefore power consumption, bandwidth and complexity can be reduced during acquisition. Although only 25% of the pixels are recorded, a similar visual quality as using a high resolution sensor can be achieved, which can be seen in the following example.


Example for reconstruction of non-regular sampled image data

Low resolution image Sampled image Reconstructed image
Best to be viewed enlarged due to additional aliasing that may be caused by scaling!

In order to evaluate the performance of the Selective Reconstruction yourself, we provide the algorithm here as a Gitlab project.



Image Resampling

Digital images can be assumed to be regular two-dimensional grids of pixels. Transforming a digital image in any way, the pixel positions are transformed as well. Assuming an arbitrary image transform, pixels located on a regular integer grid before transform will be lying on arbitrary non-integer positions after transform. Pixel coordinates at arbitrary positions can neither be stored efficiently nor displayed on a digital screen. Therefore, image resampling is used to interpolate the scattered data onto a regular grid of pixel positions. We use Frequency-Selective Mesh-to-Grid Resampling (FSMR). As well as above-mentioned Frequency-Selective Reconstruction, FSMR takes advantage of spatially sparse matrices. Selecting and overlying suitable basis functions, an image can be resampled onto arbitrary positions and hence, can be displayed on a digital screen. FSMR can be used for affine and projective transforms, see the rotation below, for Super-Resolution, Frame-Rate Up-Conversion and many more.

Example for the rotation of an image by an arbitrary angle.


Image enhancement

Error Concealment of Image Data

If images or video sequences are transmitted over wireless channels or the internet, the risk of transmission errors is ubiquitous. This results in the problem that individual regions cannot be decoded and displayed correctly. But it is possible to estimate these lost areas from the other correctly received regions. To achieve this, we developed the Selective Extrapolation. This algorithm is able to reconstruct arbitrary image contents and can be applied to images as well as video sequences.

For a detailed description of Selective Extrapolation, please refer to 2008-10 and 2005-20.


Examples for concealment of distorted image data

Original image
Distorted image
Concealed image

In order to evaluate the performance of the Selective Reconstruction yourself, we provide the algorithm here as a Gitlab project.


Error concealment of corrupted video data

If video sequences have to be concealed instead of imaged, correctly received previous frames can be used for model generation in addition to the correctly received regions from the actual frame. With that one obtains a three-dimensional data volume. For this a model is generated using three-dimensional basis functions. As the model generation uses correctly received regions from the actual frame as well as from previous frames, a very high quality of the concealed sequence can be achieved. In the case that Fourier basis functions are used for model generation, the algorithm is called 3D Frequency Selective Extrapolation (3D-FSE). A detailed description of 3D-FSE can be found in 2007-30.

The concealment quality can be further improved if the motion of the sequence is compensated prior to the model generation. In doing so, the different layers of the volume sonsist similar image contents which leads to a more precise model generation. This extension is called Motion Compensated Frequency Seletive Extrapolation (MC-FSE). For a detailed description of Selective Extrapolation, please refer to 2008-25.

Below, some example sequenzes to prove the abilities of 3D-FSE and MC-FSE:

Sequence “Discovery City” Sequence “Discovery Orient”
Concealed by 3D-FSE
Concealed by MC-FSE


Image restauration by Selective Extrapolation

Besides concealment of distortions resulting from transmission errors, Selective Extrapolation can also be used for image restauration. In doing so, defects or disturbing objects can be removed from images. To achieve this, the regions to be extrapolated are marked manually in a first step. For this a binary mask with zero at the regions to replace is generated. Then, the image is divided into blocks and the 2D Selective Extrapolation is applied to all blocks that contain regions to be extrapolated.

Two examples for image restauration by Selective Extrapolation: