Part 1 - Encoding rate control: what it does and why is it important?
The goal of video encoding is to reduce the size of video – to make it as small as possible, while achieving the highest quality possible. Alternatively, the goal is to achieve as high quality as possible under bitrate and other constraints, such as compute complexity, etc. One of the most important tools in a video encoder is the rate control module, which balances the bitrate, the compute complexity, and the quality of the compressed video output, among other aspects.
There are three primary ways a video encoder is typically able to compress content anywhere from 100 to 1000 times:
- Exploit redundancy in pixels: An encoder aims to find similar patterns in the same frame and neighboring frames, so that it can just tell the corresponding decoder to start putting the pixels together by copying the similar pixels already reconstructed by the decoder. The more the variation in content, spatially and temporally, the harder it is to tell the corresponding decoder how to reconstruct the video.
- Reduce information: The part of content that cannot be predicted, or is brand new information, is modified to make it highly compressible by removing information. This is the only lossy part of an encoder. The information that is lost in this part cannot be brought back; therefore the pixels are modified so that the loss of information is as inconspicuous as possible, while meeting the bitrate constraints in place. High frequency information is typically compressed more than low frequency information as the sensitivity of the human visual system to the loss of high frequency information is relatively lower. Transforms are used to convert pixel data to frequencies to achieve the objective. Rate control mechanisms are responsible for reducing information based on the approach mentioned above.
- Exploiting redundancy in bits: The transformed data (coefficients) are converted to bits by making sure that more bits are used for frequently repeating coefficients. This typically results in an additional 5-10x reduction in data size with lossless compression. This process is called entropy coding.
A simplified workflow of a typical block-based video encoding process is depicted in figure 1. The quantization stage is the only lossy part of the architecture, during which the transformed residual coefficients are quantized into a coarse level to make it easier to compress using entropy coding. The level of quantization is controlled by the quantization parameter (QP), which determines the step size used to represent the transformed coefficients (Discrete Cosine Transform is used most commonly) with a finite set of steps. The higher value of QP means a larger step size is used, which eventually leads to more compression because most of the pixels can be represented by only a few coefficients, and vice versa. More details are in the examples below.
Fig. 1: A simplified video encoding workflow
As most of the rate control models make use of the QP to estimate and decide bitrate allocation, let’s see an example of how QP works. Assuming we have a frame, A, with 3x3 pixels, and QP = 5, the quantized frame, B = round(A/QP), see figure 2.
Fig. 2: Illustration of quantization
It can be observed that there are originally 8 different pixels in frame A, but after quantization, there are 5 unique numbers in B only, which reduces the total amount of information. Depending on the adopted entropy encoding method, this may translate into (8-5)/8 = 37.5% bandwidth savings. Please note that this is for illustrative purposes only – in the practical compression, things are usually much more complicated.
Of course, the cost of bandwidth savings is the sacrifice of quality. On the decoder side, an inverse quantization will be applied to reconstruct the original signal. Figure 3 shows an illustration of the inverse quantization process.
Fig. 3 Illustration of inverse quantization
Using peak-signal-noise-ratio (PSNR) as the quality metric to calculate the sacrifice of the frame quality, and assuming it is an 8-bit image with a maximum value of a pixel to be 28 = 256, we can obtain that the PSNR between A and C is 46.92 dB, as shown below.
If we use QP = 10 to go through the process again, we will have:
with bandwidth saving to be (8-4)/4 = 50% and PSNR = 42.0252 dB.
It can be seen that the value of QP determines the bandwidth savings and compression video quality. In practical video encoding, by making use of the Rate-Distortion and Rate-QP model together, a rate control model is capable of estimating how much bitrate will be needed for a particular block with certain characteristics and will determine the value of QP to be used.
Depending on the target application, an effective rate control model can be used to
- Achieve content-adaptive encoding (CAE), in which different bitrate levels can be allocated for scenes with different “complexities” (more on this later).
- Maintain a constant average bitrate every second throughout the whole asset irrespective of the content complexity, a common requirement from the network communication side.
- Accommodate other constraints, such as the maximum average bitrate across the whole video on a second-by-second basis, and the limitation of bitrate fluctuation (also called V2V Buffer), while reaching any one of the above targets.
- Maintain a constant average quality throughout the whole asset (often hard to achieve due to the use of inaccurate perceptual quality metrics).
- Achieve a target quality of the whole transcoded asset given effective rate control models. This is the holy grail as viewers are served what they expect while making sure content storage and delivery costs are minimized.
On the other hand, because the international video coding standards, such as H.264/AVC, H.265/HEVC, VP9, and AV1, didn’t specify the rate control module which takes effect on the encoder side only. Considering its big impact on the encoding performance, a large number of industry and academic researchers had put significant effort to find the optimal way of bitrate allocation for various applications.
We’ll talk about different rate control methodologies and their pros and cons in the follow-up blog in this series. We’ll also use examples to quantitatively analyze the differences among them and their application scenario. This series of blogs will be wrapped up by introducing and analyzing SSIMWAVE’s Video Quality Dial from the rate control perspective. Subscribe to our blog to catch part 2.