2012年8月31日 星期五

GOP, I/P/B frame


I noticed some confusion about B frames, so I thought I would write down a quick explaination of an MPEG "GOP", or, "Group Of Pictures", as explained to me by a professor

GOP - Begins with an "I" frame, followed usually by a number of "P" 
and "B" frames (divx5 only uses B frames I believe)

- each GOP is independant: all frames needed for predictions are contained within each GOP

- GOP's can be as small as a single I frame, or as large as desired, but usually no more than 15 frames in length.

- the longer the GOP, the more efficient, but less rubust the coding

I frame - "Intra-coded" frames : average 7:1 reduction.

- like JPEG, every video frame is broken into blocks of 8x8 pixels of Y, R-Y, and B-Y (although, I am not sure how this "1/4 pixels" divx5 has plays into all this)

- blocks are grouped into "macroblocks" of 16x16

- macroblocks are grouped horizontally into slices which have similar average block levels.

- multiple slices form a frame, and these frames are the resulting "I" frames.

P frame - P frames are predicted based on prior I or P frames plus the addition of data for changed macroblocks.

- average about 20:1 reduction, or about half the size of I frames

- I don't think divx5 uses these, MPEG2 does though.

B frame - Bidirectionally predicted frames based on appearance and positions of past and future frames macroblocks.

- B frames require less data than P frames, averaging about 50:1 reduction.

- B frames require more decoder buffer memory because 2 frames are compared during the reconstruction process.

- B frames also require manipulation of the coding order: frames moving from the coder to the decoder are NOT in presentation sequence. 

basically, the the B frame will say something like "this frame is the same as the GOP's "I" frame except this one part, I will only contain the data needed to encode this one part, and combine it with the info from the I frame", in laymen's terms of course. This give DivX5 it's optimal reduction capability.

This also means of course, that your P3500 media box in you living room might struggle with decoding a high rate D5 encode (not sure about that, but D5 is a more intense encoding/decoding process, but DVD's use I, P, and B frames, sooooooo...

Oh, BTW, in MPEG2 at least, a GOP order is always IPBBPBBPBBIPBBPBB etc etc. (pending on your GOP size), but it is always 1 I, 1 P, and 2 B's, then you can stack more groups of "PBB"'s in that one GOP if needed (usually up to 15 total frames.

I frame: 參考圖像, 內容是一個完整圖像
P frame: 非完整圖像, 內容記錄其與前一個I frameP frame之間的差異. (只參考前面的frame)
B frame: 非完整圖像, 內容記錄其與前一個以及後一個I frameP frame之間的差異. (前後frame都參考)



There are three types of pictures (or frames) used in video compression: I‑frames, P‑frames, and B‑frames.
An I‑frame is an 'Intra-coded picture', in effect a fully specified picture, like a conventional static image file. P‑frames and B‑frames hold only part of the image information, so they need less space to store than an I‑frame, and thus improve video compression rates.
P‑frame ('Predicted picture') holds only the changes in the image from the previous frame. For example, in a scene where a car moves across a stationary background, only the car's movements need to be encoded. The encoder does not need to store the unchanging background pixels in the P‑frame, thus saving space. P‑frames are also known asdelta‑frames.
B‑frame ('Bi-predictive picture') saves even more space by using differences between the current frame and both the preceding and following frames to specify its content.

資料來源: http://forum.doom9.org/archive/index.php/t-19436.html
http://en.wikipedia.org/wiki/Video_compression_picture_types

沒有留言:

張貼留言

我的網誌清單