Blog of roxlu, co-founder of Apollo Media. Contact info[shift+2]apollomedia.nl.

Decoding H264 and YUV420P playback

The code below shows a minimal example of how to create a video player using libav and openGL. The libav implementation is pretty basic. We first make sure to register all the codecs using the avcodec_register_all() function.

Then we find the suitable decoder using avcodec_find_decoder(AV_CODEC_ID_H264). When we found a decoder we create a codec context which keeps track of the general state of the decoding process. We open the file and initialize the h264 parser using av_parser_init(AV_CODEC_ID_H264). Note that libav provides other solutions to decode a video stream, though I'm using libav to decode a h264 stream I get directly from a Logitech C920 webcam and therefore don't need one of the other approaches. Once opened the file and initailized the parser, we use av_parser_parse2() to decode the h264 bitstream. When it finds a complete packet we decode the frame using avcodec_decode_video2() and call the callback function which is passed to the constructor of the H264_Decoder class.

When we're ready with decoding all the frames (note that this code doesn't decode any postponed packets), we cleanup using av_parser_close() and avcodec_close(). Of course we need to free any allocated memory using av_free().

See below for the H264 decoding process. Note that it's just a quick experiment and it might be not the most ideal way of parsing a h264 video stream; It works fine though. See the openGL code I used to decode/playback the YUV420P buffers I receive from libav.

H264_Decoder.h

/*
 
  H264_Decoder
  ---------------------------------------
 
  Example that shows how to use the libav parser system. This class forces a 
  H264 parser and codec. You use it by opening a file that is encoded with x264 
  using the `load()` function. You can pass the framerate you want to use for playback.
  If you don't pass the framerate, we will detect it as soon as the parser found 
  the correct information.
 
  After calling load(), you can call readFrame() which will read a new frame when
  necessary. It will also make sure that it will read enough data from the buffer/file
  when there is not enough data in the buffer.
 
  `readFrame()` will trigger calls to the given `h264_decoder_callback` that you pass
  to the constructor. 
 
 */
#ifndef H264_DECODER_H
#define H264_DECODER_H
 
#define H264_INBUF_SIZE 16384                                                           /* number of bytes we read per chunk */
 
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <vector>
#include <tinylib.h>
 
extern "C" {
#include <libavcodec/avcodec.h>
#include <libavutil/avutil.h>
}
 
typedef void(*h264_decoder_callback)(AVFrame* frame, AVPacket* pkt, void* user);         /* the decoder callback, which will be called when we have decoded a frame */
 
class H264_Decoder {
 
 public:
  H264_Decoder(h264_decoder_callback frameCallback, void* user);                         /* pass in a callback function that is called whenever we decoded a video frame, make sure to call `readFrame()` repeatedly */
  ~H264_Decoder();                                                                       /* d'tor, cleans up the allocated objects and closes the codec context */
  bool load(std::string filepath, float fps = 0.0f);                                     /* load a video file which is encoded with x264 */
  bool readFrame();                                                                      /* read a frame if necessary */
 
 private:
  bool update(bool& needsMoreBytes);                                                     /* internally used to update/parse the data we read from the buffer or file */
  int readBuffer();                                                                      /* read a bit more data from the buffer */
  void decodeFrame(uint8_t* data, int size);                                             /* decode a frame we read from the buffer */
 
 public:
  AVCodec* codec;                                                                        /* the AVCodec* which represents the H264 decoder */
  AVCodecContext* codec_context;                                                         /* the context; keeps generic state */
  AVCodecParserContext* parser;                                                          /* parser that is used to decode the h264 bitstream */
  AVFrame* picture;                                                                      /* will contain a decoded picture */
  uint8_t inbuf[H264_INBUF_SIZE + FF_INPUT_BUFFER_PADDING_SIZE];                         /* used to read chunks from the file */
  FILE* fp;                                                                              /* file pointer to the file from which we read the h264 data */
  int frame;                                                                             /* the number of decoded frames */
  h264_decoder_callback cb_frame;                                                        /* the callback function which will receive the frame/packet data */
  void* cb_user;                                                                         /* the void* with user data that is passed into the set callback */
  uint64_t frame_timeout;                                                                /* timeout when we need to parse a new frame */
  uint64_t frame_delay;                                                                  /* delay between frames (in ns) */
  std::vector<uint8_t> buffer;                                                           /* buffer we use to keep track of read/unused bitstream data */
};
 
#endif

H264_Decoder.cpp

#include "H264_Decoder.h"
 
H264_Decoder::H264_Decoder(h264_decoder_callback frameCallback, void* user) 
  :codec(NULL)
  ,codec_context(NULL)
  ,parser(NULL)
  ,fp(NULL)
  ,frame(0)
  ,cb_frame(frameCallback)
  ,cb_user(user)
  ,frame_timeout(0)
  ,frame_delay(0)
{
  avcodec_register_all();
}
 
H264_Decoder::~H264_Decoder() {
 
  if(parser) {
    av_parser_close(parser);
    parser = NULL;
  }
 
  if(codec_context) {
    avcodec_close(codec_context);
    av_free(codec_context);
    codec_context = NULL;
  }
 
  if(picture) {
    av_free(picture);
    picture = NULL;
  }
 
  if(fp) {
    fclose(fp);
    fp = NULL;
  }
 
  cb_frame = NULL;
  cb_user = NULL;
  frame = 0;
  frame_timeout = 0;
}
 
bool H264_Decoder::load(std::string filepath, float fps) {
 
  codec = avcodec_find_decoder(AV_CODEC_ID_H264);
  if(!codec) {
    printf("Error: cannot find the h264 codec: %s\n", filepath.c_str());
    return false;
  }
 
  codec_context = avcodec_alloc_context3(codec);
 
  if(codec->capabilities & CODEC_CAP_TRUNCATED) {
    codec_context->flags |= CODEC_FLAG_TRUNCATED;
  }
 
  if(avcodec_open2(codec_context, codec, NULL) < 0) {
    printf("Error: could not open codec.\n");
    return false;
  }
 
  fp = fopen(filepath.c_str(), "rb");
 
  if(!fp) {
    printf("Error: cannot open: %s\n", filepath.c_str());
    return false;
  }
 
  picture = av_frame_alloc();
  parser = av_parser_init(AV_CODEC_ID_H264);
 
  if(!parser) {
    printf("Erorr: cannot create H264 parser.\n");
    return false;
  }
 
  if(fps > 0.0001f) {
    frame_delay = (1.0f/fps) * 1000ull * 1000ull * 1000ull;
    frame_timeout = rx_hrtime() + frame_delay;
  }
 
  // kickoff reading...
  readBuffer();
 
  return true;
}
 
bool H264_Decoder::readFrame() {
 
  uint64_t now = rx_hrtime();
  if(now < frame_timeout) {
    return false;
  }
 
  bool needs_more = false;
 
  while(!update(needs_more)) { 
    if(needs_more) {
      readBuffer();
    }
  }
 
  // it may take some 'reads' before we can set the fps
  if(frame_timeout == 0 && frame_delay == 0) {
    double fps = av_q2d(codec_context->time_base);
    if(fps > 0.0) {
      frame_delay = fps * 1000ull * 1000ull * 1000ull;
    }
  }
 
  if(frame_delay > 0) {
    frame_timeout = rx_hrtime() + frame_delay;
  }
 
  return true;
}
 
void H264_Decoder::decodeFrame(uint8_t* data, int size) {
 
  AVPacket pkt;
  int got_picture = 0;
  int len = 0;
 
  av_init_packet(&pkt);
 
  pkt.data = data;
  pkt.size = size;
 
  len = avcodec_decode_video2(codec_context, picture, &got_picture, &pkt);
  if(len < 0) {
    printf("Error while decoding a frame.\n");
  }
 
  if(got_picture == 0) {
    return;
  }
 
  ++frame;
 
  if(cb_frame) {
    cb_frame(picture, &pkt, cb_user);
  }
}
 
int H264_Decoder::readBuffer() {
 
  int bytes_read = (int)fread(inbuf, 1, H264_INBUF_SIZE, fp);
 
  if(bytes_read) {
    std::copy(inbuf, inbuf + bytes_read, std::back_inserter(buffer));
  }
 
  return bytes_read;
}
 
bool H264_Decoder::update(bool& needsMoreBytes) {
 
  needsMoreBytes = false;
 
  if(!fp) {
    printf("Cannot update .. file not opened...\n");
    return false;
  }
 
  if(buffer.size() == 0) {
    needsMoreBytes = true;
    return false;
  }
 
  uint8_t* data = NULL;
  int size = 0;
  int len = av_parser_parse2(parser, codec_context, &data, &size, 
                             &buffer[0], buffer.size(), 0, 0, AV_NOPTS_VALUE);
 
  if(size == 0 && len >= 0) {
    needsMoreBytes = true;
    return false;
  }
 
  if(len) {
    decodeFrame(&buffer[0], size);
    buffer.erase(buffer.begin(), buffer.begin() + len);
    return true;
  }
 
  return false;
}

The code below implements a YUV420P decoder which can playback the buffers we recieve from the h264 decoding. Note that this means we don't have to perfrom a YUV to RGB conversion on the CPU.

YUV420P_Player.h

/*
 
  YUV420P Player
  --------------
 
  This class implements a simple YUV420P renderer. This means that you 
  need to feed planar YUV420 data to the `setYPixels()`, `setUPixels()`
  and `setVPixels()`. 
 
  First make sure to call setup() with the video width and height. We use
  these dimensions to allocate the Y, U and V textures. After calling setup
  you call the zset{Y,U,V}Pixels()` everytime you have a new frame that
  you want to render. With the `draw()` function you draw the current 
  frame to the screen.
 
  If you resize your viewport, make sure to  call `resize()` so we can 
  adjust the projection matrix.
 
 */
#ifndef ROXLU_YUV420P_PLAYER_H
#define ROXLU_YUV420P_PLAYER_H
 
#define ROXLU_USE_MATH
#define ROXLU_USE_PNG
#define ROXLU_USE_OPENGL
#include <tinylib.h>
#include <stdint.h>
 
static const char* YUV420P_VS = "" 
  "#version 330\n"
  ""
  "uniform mat4 u_pm;"
  "uniform vec4 draw_pos;"
  ""
  "const vec2 verts[4] = vec2[] ("
  "  vec2(-0.5,  0.5), "
  "  vec2(-0.5, -0.5), "
  "  vec2( 0.5,  0.5), "
  "  vec2( 0.5, -0.5)  "
  ");"
  ""
  "const vec2 texcoords[4] = vec2[] ("
  "  vec2(0.0, 1.0), "
  "  vec2(0.0, 0.0), "
  "  vec2(1.0, 1.0), "
  "  vec2(1.0, 0.0)  "
  "); "
  ""
  "out vec2 v_coord; "
  ""
  "void main() {"
  "   vec2 vert = verts[gl_VertexID];"
  "   vec4 p = vec4((0.5 * draw_pos.z) + draw_pos.x + (vert.x * draw_pos.z), "
  "                 (0.5 * draw_pos.w) + draw_pos.y + (vert.y * draw_pos.w), "
  "                 0, 1);"
  "   gl_Position = u_pm * p;"
  "   v_coord = texcoords[gl_VertexID];" 
  "}"
  "";
 
static const char* YUV420P_FS = ""
 "#version 330\n"
  "uniform sampler2D y_tex;"
  "uniform sampler2D u_tex;"
  "uniform sampler2D v_tex;"
  "in vec2 v_coord;"
  "layout( location = 0 ) out vec4 fragcolor;"
  ""
  "const vec3 R_cf = vec3(1.164383,  0.000000,  1.596027);"
  "const vec3 G_cf = vec3(1.164383, -0.391762, -0.812968);"
  "const vec3 B_cf = vec3(1.164383,  2.017232,  0.000000);"
  "const vec3 offset = vec3(-0.0625, -0.5, -0.5);"
  ""
  "void main() {"
  "  float y = texture(y_tex, v_coord).r;"
  "  float u = texture(u_tex, v_coord).r;"
  "  float v = texture(v_tex, v_coord).r;"
  "  vec3 yuv = vec3(y,u,v);"
  "  yuv += offset;"
  "  fragcolor = vec4(0.0, 0.0, 0.0, 1.0);"
  "  fragcolor.r = dot(yuv, R_cf);"
  "  fragcolor.g = dot(yuv, G_cf);"
  "  fragcolor.b = dot(yuv, B_cf);"
  "}"
  "";
 
class YUV420P_Player {
 
 public:
  YUV420P_Player();
  bool setup(int w, int h);
  void setYPixels(uint8_t* pixels, int stride);
  void setUPixels(uint8_t* pixels, int stride);
  void setVPixels(uint8_t* pixels, int stride);
  void draw(int x, int y, int w = 0, int h = 0);
  void resize(int winW, int winH);
 
 private:
  bool setupTextures();
  bool setupShader();
 
 public:
  int vid_w;
  int vid_h;
  int win_w;
  int win_h;
  GLuint vao;
  GLuint y_tex;
  GLuint u_tex;
  GLuint v_tex;
  GLuint vert;
  GLuint frag;
  GLuint prog;
  GLint u_pos;
  bool textures_created;
  bool shader_created;
  uint8_t* y_pixels;
  uint8_t* u_pixels;
  uint8_t* v_pixels;
  mat4 pm;
};
#endif

YUV420P_Player.cpp

#include "YUV420P_Player.h"
 
YUV420P_Player::YUV420P_Player()
  :vid_w(0)
  ,vid_h(0)
  ,win_w(0)
  ,win_h(0)
  ,vao(0)
  ,y_tex(0)
  ,u_tex(0)
  ,v_tex(0)
  ,vert(0)
  ,frag(0)
  ,prog(0)
  ,u_pos(-1)
  ,textures_created(false)
  ,shader_created(false)
  ,y_pixels(NULL)
  ,u_pixels(NULL)
  ,v_pixels(NULL)
{
}
 
bool YUV420P_Player::setup(int vidW, int vidH) {
 
  vid_w = vidW;
  vid_h = vidH;
 
  if(!vid_w || !vid_h) {
    printf("Invalid texture size.\n");
    return false;
  }
 
  if(y_pixels || u_pixels || v_pixels) {
    printf("Already setup the YUV420P_Player.\n");
    return false;
  }
 
  y_pixels = new uint8_t[vid_w * vid_h];
  u_pixels = new uint8_t[int((vid_w * 0.5) * (vid_h * 0.5))];
  v_pixels = new uint8_t[int((vid_w * 0.5) * (vid_h * 0.5))];
 
  if(!setupTextures()) {
    return false;
  }
 
  if(!setupShader()) {
    return false;
  }
 
  glGenVertexArrays(1, &vao);
 
  return true;
}
 
bool YUV420P_Player::setupShader() {
 
  if(shader_created) {
    printf("Already creatd the shader.\n");
    return false;
  }
 
  vert = rx_create_shader(GL_VERTEX_SHADER, YUV420P_VS);
  frag = rx_create_shader(GL_FRAGMENT_SHADER, YUV420P_FS);
  prog = rx_create_program(vert, frag);
 
  glLinkProgram(prog);
  rx_print_shader_link_info(prog);
 
  glUseProgram(prog);
  glUniform1i(glGetUniformLocation(prog, "y_tex"), 0);
  glUniform1i(glGetUniformLocation(prog, "u_tex"), 1);
  glUniform1i(glGetUniformLocation(prog, "v_tex"), 2);
 
  u_pos = glGetUniformLocation(prog, "draw_pos");
 
  GLint viewport[4];
  glGetIntegerv(GL_VIEWPORT, viewport);
  resize(viewport[2], viewport[3]);
 
  return true;
}
 
bool YUV420P_Player::setupTextures() {
 
  if(textures_created) {
    printf("Textures already created.\n");
    return false;
  }
 
  glGenTextures(1, &y_tex);
  glBindTexture(GL_TEXTURE_2D, y_tex);
  glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, vid_w, vid_h, 0, GL_RED, GL_UNSIGNED_BYTE, NULL); 
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
 
  glGenTextures(1, &u_tex);
  glBindTexture(GL_TEXTURE_2D, u_tex);
  glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, vid_w/2, vid_h/2, 0, GL_RED, GL_UNSIGNED_BYTE, NULL);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
 
  glGenTextures(1, &v_tex);
  glBindTexture(GL_TEXTURE_2D, v_tex);
  glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, vid_w/2, vid_h/2, 0, GL_RED, GL_UNSIGNED_BYTE, NULL);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
 
  textures_created = true;
  return true;
}
 
void YUV420P_Player::draw(int x, int y, int w, int h) {
  assert(textures_created == true);
 
  if(w == 0) {
    w = vid_w;
  }
 
  if(h == 0) {
    h = vid_h;
  }
 
  glBindVertexArray(vao);
  glUseProgram(prog);
 
  glUniform4f(u_pos, x, y, w, h);
 
  glActiveTexture(GL_TEXTURE0);
  glBindTexture(GL_TEXTURE_2D, y_tex);
 
  glActiveTexture(GL_TEXTURE1);
  glBindTexture(GL_TEXTURE_2D, u_tex);
 
  glActiveTexture(GL_TEXTURE2);
  glBindTexture(GL_TEXTURE_2D, v_tex);
 
  glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
 
void YUV420P_Player::resize(int winW, int winH) {
  assert(winW > 0 && winH > 0);
 
  win_w = winW;
  win_h = winH;
 
  pm.identity();
  pm.ortho(0, win_w, win_h, 0, 0.0, 100.0f);
 
  glUseProgram(prog);
  glUniformMatrix4fv(glGetUniformLocation(prog, "u_pm"), 1, GL_FALSE, pm.ptr());
}
 
void YUV420P_Player::setYPixels(uint8_t* pixels, int stride) {
  assert(textures_created == true);
 
  glBindTexture(GL_TEXTURE_2D, y_tex);
  glPixelStorei(GL_UNPACK_ROW_LENGTH, stride);
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vid_w, vid_h, GL_RED, GL_UNSIGNED_BYTE, pixels);
}
 
void YUV420P_Player::setUPixels(uint8_t* pixels, int stride) {
  assert(textures_created == true);
 
  glBindTexture(GL_TEXTURE_2D, u_tex);
  glPixelStorei(GL_UNPACK_ROW_LENGTH, stride);
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vid_w/2, vid_h/2, GL_RED, GL_UNSIGNED_BYTE, pixels);
}
 
void YUV420P_Player::setVPixels(uint8_t* pixels, int stride) {
  assert(textures_created == true);
 
  glBindTexture(GL_TEXTURE_2D, v_tex);
  glPixelStorei(GL_UNPACK_ROW_LENGTH, stride);
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vid_w/2, vid_h/2, GL_RED, GL_UNSIGNED_BYTE, pixels);
}