Video encoders compress raw video data for efficient storage or transmission. Video encoders are used in applications such as digital TV transmitters, video conferencing systems, video servers, and DVD recorders. MPEG2 is the most popular encoding scheme and works well for standard definition (SD) video. However, the video industry’s move towards high-definition (HD) video creates a need for more efficient compression schemes to handle the much higher data rate requirements of HD. In response, the Moving Picture Experts Group (MPEG) and the ITU Video Coding Experts Group created a next-generation compression scheme called H.264/MPEG4 AVC, which handles HD data rates more effectively.
MPEG Encoder
An MPEG encoder receives raw video in digitized red-green-blue (RGB) or serial digital interface (SDI) format and outputs compressed video. The video compression schemes take advantage of the fact that humans cannot notice the differences in colors from frame-to-frame and there is very little change between consecutive frames in a video sequence. Figure 1 shows the components of a typical MPEG encoder.
Figure 1. H.264 MPEG Encoder Block Diagram

Motion estimation/compensation blocks identify parts of the frame that have changed or moved from the preceding frame and codes only the changed parts. Starting this process requires a frame entirely coded within itself and is not dependent on other frames. Such a frame is called an Intra-coded frame (I-frame). The following frames are predicted from the I-frame using motion compensation:
- Predicted frame (P-frame)
- Bidirectional predictive frame (B-frame)
An I-frame typically requires more bits for encoding than a P-frame or a B-frame as it needs to be independently encoded/decoded. A group of pictures (GOP) is a sequence of pictures from one I-frame to the next. The GOP size defines the overall bit rate required for transmission of a video sequence. A longer GOP means fewer I-frames and hence lesser bandwidth is required and vice versa. However, longer GOP-sized video sequences can be more vulnerable to transmission errors.
The discrete cosine transform (DCT) recognizes that adjacent pixel values within a frame are similar and hence redundant for transmission. DCT typically takes 8 x 8 blocks of pixels within a frame called macro blocks and transforms them into the frequency domain. The newer H.264 standard allows different sized blocks of pixels ranging from 16 x 16 to 4 x 4 pixels. The output of the DCT is an 8 x 8 matrix of frequency coefficients. By converting the pixel values to the frequency domain, the data redundancy is more easily observed. The average of these coefficients, called DC or zero coefficients, is then calculated. The remaining coefficients are then expressed with respect to the zero coefficient. The adjacent pixel values are similar within a macro block and therefore most of these coefficients become zero, making it redundant to transmit or store these values.
The next step is the quantization of the transformed coefficients in which each sample coefficient is rounded off to the closest digital value from a set of discrete digital values.
Encoding follows quantization. Compression schemes typically use entropy encoding techniques which are lossless coding techniques to code the quantized transform coefficients. Variable length coding (VLC) or binary arithmetic coding (BAC) are two types of entropy coding techniques used commonly in the MPEG4/H.264 schemes. VLC assigns different length codes to data elements. More frequently occurring elements are assigned shorter length codes while less frequently occurring elements are assigned longer codes, reducing overall data bits. BAC codes the whole pattern or message instead of each element or symbol. The H.264 standard uses both context adaptive VLC (CAVLC) and context adaptive BAC (CABAC) techniques.
The H.264/MPEG4 compression scheme has different profiles for different applications. The H.264 standard's main profile is best suited for broadcast applications where quality is maintained with limited transmission bandwidth. Conversely, video conference applications require low latency, so the H.264 standard baseline is preferred.
Related Altera and Partner IP Solutions
Altera has partnered with intellectual property (IP) developers who are experts in video coder/decoder (CODEC) solutions and offer solutions for different application markets such as video broadcast, video surveillance, and conferencing systems. Altera also offers the video and image processing library, which includes a number of functions required for video coding and processing.
Altera's Solutions for Video Encoders
The feature-rich architecture of Altera's Stratix® series FPGAs provides an excellent solution for developing digital video production and delivery equipment. Stratix series FPGAs offer a highly integrated solution that is ideal for applications that have demanding video processing and I/O functions.
