Nov 05 15:38

OpenCL Particles at OKGo's Design Miami 2009 gig

For last years Design Miami (2009) I created realtime visuals for an OKGo performance where they were using guitars modded by Moritz Waldemeyer, shooting out lasers from the headstock. I created software to track the laser beams and project visuals onto the wall where they hit.

This video is an opensource demo - written with openframeworks - of one of the visualizations from that show, using an OpenCL particle system and the macbook multitouch pad to simulate the laser hit points. The demo is audio reactive and is controlled by my fingers (more than one) on the macbook multitouch pad (each 'attractor' is a finger on the multitouch pad). It runs at a solid 60fps on a Macbook Pro, but unfortunately the screen capture killed the fps - and of course half the particles aren't even visible because of the video compression.

The app is written to use the MacbookPro multitouch pad, so will not compile for platforms other than OSX, but by simply removing the multitouch pad sections (and hooking something else in), the rest should compile and run fine (assuming you have an OpenCL compatible card and implementation on your system).

Uses ofxMultiTouchPad by Jens Alexander Ewald with code from Hans-Christoph Steiner and Steike.
ofxMSAfft uses core from Dominic Mazzoni and Don Cross.

Source code (for OF 0062) is included and includes all necessary non-OFcore addons (MSACore, MSAOpenCL, MSAPingPong, ofxMSAFFT, ofxMSAInteractiveObject, ofxSimpleGuiToo, ofxFBOTexture, ofxMultiTouchPad, ofxShader) - but bear in mind some of these addons may not be latest version (ofxFBOTexture, ofxMultiTouchPad, ofxShader), and are included for compatibility with this demo which was written last year.

More information on the project at

Most of the magic is happening in the opencl kernel, so here it is (or download the full zip with xcode project at the bottom of this page)

typedef struct {
    float2 vel;
    float mass;
    float life;
} Particle;
typedef struct {
    float2 pos;
    float spread;
    float attractForce;
    float waveAmp;
    float waveFreq;
} Node;
#define kMaxParticles       512*512
#define kArg_particles          0
#define kArg_posBuffer          1
#define kArg_colBuffer          2
#define kArg_nodes              3
#define kArg_numNodes           4
#define kArg_color              5
#define kArg_colorTaper         6
#define kArg_momentum           7
#define kArg_dieSpeed           8
#define kArg_time               9
#define kArg_wavePosMult        10
#define kArg_waveVelMult        11
#define kArg_massMin            12
float rand(float2 co) {
    float i;
    return fabs(fract(sin(dot(co.xy ,make_float2(12.9898f, 78.233f))) * 43758.5453f, &i));
__kernel void update(__global Particle* particles,      //0
                     __global float2* posBuffer,        //1
                     __global float4 *colBuffer,        //2
                     __global Node *nodes,              //3
                     const int numNodes,                //4
                     const float4 color,                //5
                     const float colorTaper,            //6
                     const float momentum,              //7
                     const float dieSpeed,              //8
                     const float time,                  //9
                     const float wavePosMult,           //10
                     const float waveVelMult,           //11
                     const float massMin                //12
                     ) {                
    int     id                  = get_global_id(0);
    __global Particle   *p      = &particles[id];
    float2  pos                 = posBuffer[id];
    int     birthNodeId         = id % numNodes;
    float2  vecFromBirthNode    = pos - nodes[birthNodeId].pos;                         // vector from birth node to particle
    float   distToBirthNode     = fast_length(vecFromBirthNode);                            // distance from bith node to particle
    int     targetNodeId        = (id % 2 == 0) ? (id+1) % numNodes : (id + numNodes-1) % numNodes;
    float2  vecFromTargetNode   = pos - nodes[targetNodeId].pos;                        // vector from target node to particle
    float   distToTargetNode    = fast_length(vecFromTargetNode);                       // distance from target node to particle
    float2  diffBetweenNodes    = nodes[targetNodeId].pos - nodes[birthNodeId].pos;     // vector between nodes (from birth to target)
    float2  normBetweenNodes    = fast_normalize(diffBetweenNodes);                     // normalized vector between nodes (from birth to target)
    float   distBetweenNodes    = fast_length(diffBetweenNodes);                        // distance betweem nodes (from birth to target)
    float   dotTargetNode       = fmax(0.0f, dot(vecFromTargetNode, -normBetweenNodes));
    float   dotBirthNode        = fmax(0.0f, dot(vecFromBirthNode, normBetweenNodes));
    float   distRatio           = fmin(1.0f, fmin(dotTargetNode, dotBirthNode) / (distBetweenNodes * 0.5f));
    // add attraction to other nodes
    p->vel                      -= vecFromTargetNode * nodes[targetNodeId].attractForce / (distToTargetNode + 1.0f) * p->mass;
    // add wave
    float2 waveVel              = make_float2(-normBetweenNodes.y, normBetweenNodes.x) * sin(time + 10.0f * 3.1416926f * distRatio * nodes[birthNodeId].waveFreq);
    float2 sideways             = nodes[birthNodeId].waveAmp * waveVel * distRatio * p->mass;
    posBuffer[id]               += sideways * wavePosMult;
    p->vel                      += sideways * waveVelMult * dotTargetNode / (distBetweenNodes + 1);
    // set color
    float invLife = 1.0f - p->life;
    colBuffer[id] = color * (1.0f - invLife * invLife * invLife);// * sqrt(p->life);    // fade with life
    // add waviness
    p->life -= dieSpeed;
    if(p->life < 0.0f || distToTargetNode < 1.0f) {
        posBuffer[id] = posBuffer[id + kMaxParticles] = nodes[birthNodeId].pos;
        float a = rand(p->vel) * 3.1415926f * 30.0f;
        float r = rand(pos);
        p->vel = make_float2(cos(a), sin(a)) * (nodes[birthNodeId].spread * r * r * r);
        p->life = 1.0f;
//      p->mass = mix(massMin, 1.0f, r);
    } else {
        posBuffer[id+kMaxParticles] = pos;
        colBuffer[id+kMaxParticles] = colBuffer[id] * (1.0f - colorTaper);  
        posBuffer[id] += p->vel;
        p->vel *= momentum;

Oct 30 00:24

OpenCL in openFrameworks example - 1 milion particles @ 100-200fps

Recently I've been playing a lot with OpenCL, the new API / framework designed to handle cross-platform parallel computing (i.e. a simple way of running code simultaneously on all cores of your CPU, GPU or other processors). Implementations have been cropping up this year in NVidia drivers or ATI drivers, but most famously it's included with Mac OSX 10.6 Snow Leopard.

To cut a long story short I've been working on a simple-to-use C++ wrapper for some of the most common functions, imaginatively called ofxOpenCL and here is a little demo of 1 million particles running at 100-200fps.

NOTE: The Vimeo compression destroys most of the particles, so I suggest downloading the quicktime directly from the vimeo page at

This is 1,000,000 particles being interacted on by mouse, updated on GPU (with springy behaviours ) via an OpenCL kernel, data written straight to a VBO and rendered - without ever coming back to host (i.e. main memory + cpu etc.)

Frame-rate is around 100-200fps running on a macbook pro with GF 9600GT. That's 100-200fps on a laptop! (albeit a pretty decent one), but I'm dying to try this on a GF 285 GTX - which has 7.5x the number of cores, 2.5x the fillrate and 3.5x the memory bandwidth - for only £250!!

The kernel for this is surprisingly simple:

__kernel void updateParticleWithoutCollision(__global Particle* pIn, __global float2* pOut, const float2 mousePos, const float2 dimensions){
	int id = get_global_id(0);
	__global Particle *p = &pIn[id];
	float2 diff = mousePos - pOut[id];
	float invDistSQ = 1.0f / dot(diff, diff);
	diff *= 300.0f * invDistSQ;
	p->vel += (dimensions*0.5 - pOut[id]) * CENTER_FORCE2 - diff* p->mass;
	pOut[id] += p->vel;
	p->vel *= DAMP2;
	float speed2 = dot(p->vel, p->vel);
	if(speed2<MIN_SPEED2) pOut[id] = mousePos + diff * (1 + p->mass);

This example is based on Rui's opencl example at

Discussion on the matter at

source code for ofxOpenCL and the above example at
(the SVN is likely to be more recent).