Blog of roxlu, co-founder of Apollo Media. Contact info[shift+2]apollomedia.nl.

OpenGL Instanced Rendering

When working on visualisations, interactive installations etc.. I often use particles that respond to user interaction. A common way of doing this is to create a particle system and iterate over the particles and calling e.g. glDrawArrays() after updating the model matrix for the particle.

But one of the goals of modern openGL is to move as much as you can to the GPU and limit the number of calls to openGL itself. Imagine having thousands of particles and you're using glDrawArrays() in a loop like:

for(std::vector<Particle>::iterator it = particles.begin(); it != particles.end(); ++it) {
  Particle& particle = *it;                                   
  glUniformMatrix4fv(matrix_loc, 1, GL_FALSE, particle.model_matrix.getPtr());
  glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);                             
}

It means you're making a lot of calls to openGL which is far from optimal. This can be a real bottleneck in simulations. But luckily there is a better, faster way called instanced rendering. All you have to do when using instanced rendering is:

glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, particles.size());

Instanced Rendering

The goal of instanced rendering is making less calls to openGL. In this situation of a particle system it's minor change to your existing code which will give you a huge boost. For a system I'm working on at the moment, I'm just uploading my complete vector that contains the particles. This means that we probably need to transfer a bit more memory to openGL, but this is outweighted by the fact that we can draw thousands of particles with just one call.

The key element in making this work is to use glVertexAttribDivisor(). When using instanced rendering you basically tell openGL something like: draw N elements using e.g. GL_TRIANGLES and draw K vertices per instance. So if you have 1000 particles and you want to draw a square you can tell it to repeat a draw with 4 elements (4 elements will make a square when using GL_TRIANGLE_STRIP) and do to that 1000 times.

You use glVertexAtrribDivisor() to step through your VBO data, which will contain the vector of you particles and you tell openGL to only change the vertex attributes every X-instance. So glVertexAttribDivisor(0, 1) means that it will step through the data once per particle. The 0 here is referring to the vertex attribute location and the 1 is the number of times it should be changed per instance.

Using glVertexAttribDivisor() together with a fixed triangle strip, that is stored in your vertex shader, is an amazing way to draw a lot of particles easily. You can use the variable gl_VertexID in your shader which is automatically incremented for each vertex you draw. So when you tell openGL to repeat 4 vertices for each instance, this number will be 0, 1, 2, 4. By creating a array with position data in your vertex shader you can simply pick the right vertex position for your shape, in this case a square. See the vertex shader below

#version 150
 
uniform mat4 u_pm;
in vec4 a_pos;
in float a_size;
 
const vec2 pos[] = vec2[4](
  vec2(-0.5,  0.5),
  vec2(-0.5, -0.5),
  vec2(0.5,   0.5),
  vec2(0.5,  -0.5)
);
 
void main() {
  vec2 offset = pos[gl_VertexID];
 
  gl_Position = u_pm * vec4(a_pos.x + (offset.x * a_size) ,
                            a_pos.y + (offset.y * a_size) ,
                            0.0,
                            1.0);
}

Example

The code below implements a very simple particle system that makes use of this technique:

WaterBall.h

#ifndef WATER_BALL_H
#define WATER_BALL_H
 
#define ROXLU_USE_ALL
#include <tinylib.h>
 
class WaterDrop {
 public:
  WaterDrop();
  ~WaterDrop();
 
 public:
  vec2 position;     // 8 bytes
  vec2 forces;       // 8 bytes  - offset 8
  vec2 velocity;     // 8 bytes  - offset 16
  float mass;        // 4 bytes  - offset 24
  float inv_mass;    // 4 bytes  - offset 28 
  float size;        // 4 bytes  - offset 32
};
 
class WaterBall {
 
 public:
  WaterBall();
  ~WaterBall();
  bool setup(int w, int h);
  void update(float dt = 0.016f);
  void draw();
  void addDrop(vec2 position, float mass);
 public:
  int win_w;
  int win_h;     
  Program prog;                               /* shaders / prog */
  GLuint vbo;                                 /* the vbo that holds the water drop data */
  GLuint vao;                                 /* vertex array object */
  size_t bytes_allocated;                     /* number of bytes we allocted on gpu */
  std::vector<WaterDrop> drops;               /* the water drop particles */
  mat4 pm;                                    /* projection matrix; ortho */
};
 
#endif

WaterBall.cpp

#include <assert.h>
#include "WaterBall.h"
 
// ------------------------------------
 
WaterDrop::WaterDrop()
  :mass(0.0f)
  ,inv_mass(0.0f)
{
}
 
WaterDrop::~WaterDrop() {
}
 
// ------------------------------------
 
WaterBall::WaterBall() 
  :win_w(0)
  ,win_h(0)
  ,vbo(0)
  ,vao(0)
  ,bytes_allocated(0)
{
}
 
WaterBall::~WaterBall() {
}
 
bool WaterBall::setup(int w, int h) {
  assert(w && h);
  win_w = w;
  win_h = h;
 
  pm.ortho(0, w, h, 0, 0.0f, 100.0f);
 
  // create shader 
  const char* atts[] = { "a_pos", "a_size" } ;
  prog.create(GL_VERTEX_SHADER, rx_to_data_path("waterdrop.vert"));
  prog.create(GL_FRAGMENT_SHADER, rx_to_data_path("waterdrop.frag"));
  prog.link(2, atts);
  glUseProgram(prog.id);
  glUniformMatrix4fv(glGetUniformLocation(prog.id, "u_pm"), 1, GL_FALSE, pm.ptr());
 
  float cx = w * 0.5;
  float cy = h * 0.5;
  int num = 10;
  for(int i = 0; i < num; ++i) {
    addDrop(vec2(rx_random(0, w), rx_random(0, h)), 1.0f);
  }
 
  glGenVertexArrays(1, &vao);
  glBindVertexArray(vao);
  glGenBuffers(1, &vbo);
  glBindBuffer(GL_ARRAY_BUFFER, vbo);
 
  glEnableVertexAttribArray(0); // pos
  glEnableVertexAttribArray(1); // size
  glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, sizeof(WaterDrop), (GLvoid*) 0);
  glVertexAttribPointer(1, 1, GL_FLOAT, GL_FALSE, sizeof(WaterDrop), (GLvoid*) 32);
 
  glVertexAttribDivisor(0, 1);
  glVertexAttribDivisor(1, 1);
 
  return true;
}
 
void WaterBall::update(float dt) {
 
  if(!drops.size()) {
    return ;
  }
 
  vec2 force(16.0, 0.0);
 
  for(size_t i = 0; i < drops.size(); ++i) {
    WaterDrop& d = drops[i];
    d.forces += force;
    d.forces *= d.inv_mass * dt;
    d.velocity += d.forces * dt;
    d.position += d.velocity;
    d.velocity *= 0.99;
    d.forces = 0;
  }
 
  glBindBuffer(GL_ARRAY_BUFFER, vbo);
 
  size_t bytes_needed = sizeof(WaterDrop) * drops.size();
  if(bytes_needed > bytes_allocated) {
    glBufferData(GL_ARRAY_BUFFER, bytes_needed, drops[0].position.ptr(), GL_STREAM_DRAW);
    bytes_allocated = bytes_needed;
  }
  else {
    glBufferSubData(GL_ARRAY_BUFFER, 0, bytes_needed, drops[0].position.ptr());
  }
}
 
void WaterBall::draw() {
  glBindVertexArray(vao);
  glUseProgram(prog.id);
  glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, drops.size());
}
 
void WaterBall::addDrop(vec2 position, float mass) {
  if(mass < 0.01) {
    mass = 0.01;
  }
 
  WaterDrop drop;
  drop.mass = mass;
  drop.inv_mass = 1.0f / mass;
  drop.position = position;
  drop.size = 10 ;
 
  drops.push_back(drop);
}