It’s been a few weeks since I started experimenting in AGAL and Stage3D. The bad news is… I didn’t figure out how to make a kick-ass 3D Engine with post-processing effects – BUT, the good news is… I made a 2D Particle System!
Before I go on, here’s the final result of what we will achieve here:
How it works…
I’ll warn you right now – the code shown below is greatly trimmed-down to show you just the bare minimum. However, the sources are available here and if you need more clarification in any particular area, please send your questions in the comment form below this article.
Here’s an overview of the project files included in this demo:

You will notice that I’m using Adobe’s AGALMacroAssembler compiler to process my shader in this demo. Instead of requiring you to download it from Adobe, I’ve compiled a SWC file for you (and already added to the project). If you would prefer to use its source files, you can find it here.
Without further ado, here is the Document Class for this project:
Test_Particles extends BP3D
package {
import bigp.BP3D;
import bigp.gpu.BPIndexBuffer;
import bigp.gpu.BPVertexBuffer;
import bigp.utils.ByteArrayUtils;
import bigp.utils.XColor;
import flash.display3D.Context3DBlendFactor;
import flash.display3D.Context3DTriangleFace;
import flash.display3D.Context3DVertexBufferFormat;
import flash.events.MouseEvent;
import flash.ui.Mouse;
import flash.utils.ByteArray;
/**
* Create a Particle System!
*
* @author Pierre Chamberlain
*/
[SWF(frameRate="60")]
public class Test_Particles extends BP3D {
//...
You’ll notice that this Document class extends BP3D (stands for BigP 3D), which in turn extends Main3D (another abstract class I’ve started a while ago in my first AGAL tutorials). I generally add crucial code in Main3D and when I create new helper methods I generally add those to BP3D, that is – if I see an opportunity for code reuse in future Stage3D projects.
The important methods to remember in these abstract classes are the following…
The Overriden Methods _main(), _calc(), and _draw()
protected override function _main():void { /* ... */ }
The _main() method is triggered when Stage3D has initialized. Therefore, this is the perfect place to begin initializing:
- Background Color;
- Precalculated values / Mathematical constants;
- VertexBuffers, IndexBuffers and their data;
- Context3D Vertex and/or Fragment constants;
- Context3D’s current VertexBuffer (and which vertex attribute it will use);
- Event listeners for user input;
protected override function _calc():void { /* ... */ }
The _calc() method is triggered by a Timer object. Why should you use a Timer? Well, from my first experimentation with animating particles, it ran poorly when everything was being executed inside an ENTER_FRAME loop. Separating the game logic from the rendering phase helps to keep things smoothly. Game logic might still involve manipulating GPU resources (such as uploading more vertex-data, modifying a vertex constant to change the position of a sprite, or offsetting the texture UVs to play an animation sequence), but the actual drawing would happen in the ENTER_FRAME loop.
protected override function _draw():void { /* ... */ }
The _draw() method is triggered on an ENTER_FRAME event. Technically, all you should do here is call drawTriangles(…), but nothing stops you from doing crazier things such as:
- Alternating between different IndexBuffers and/or VertexBuffers;
- Changing Program3D for drawing objects in a different style, or;
- Applying different blend modes between multiple drawing phases;
Alright, now that you have an understanding of those 3 key overrideable methods, let’s look at the rest of the document class.
[Embed(source="Test_Particles.macro", mimeType="application/octet-stream")] private static const _MACRO:Class;
You’ll notice at the beginning of the file that there is an [Embed] tag. This indicates that a file is compiled as part of the resulting SWF. In this case, it’s just a simple raw text file. I also customized its extension to “*.macro”. This is the VertexShader and FragmentShader code, combined in one file.
I will refer to this embedded text file as a Macro file (even though “macro” might be the term used for one single function that wraps up several commands). We will get back to this Macro file a little later.
[still requires editing / explaining content of each methods, to be continued...]
Test_Particles.as (Completed Code Example)
package {
import bigp.BP3D;
import bigp.gpu.BPIndexBuffer;
import bigp.gpu.BPVertexBuffer;
import bigp.utils.ByteArrayUtils;
import bigp.utils.XColor;
import flash.display3D.Context3DBlendFactor;
import flash.display3D.Context3DTriangleFace;
import flash.display3D.Context3DVertexBufferFormat;
import flash.events.MouseEvent;
import flash.ui.Mouse;
import flash.utils.ByteArray;
/**
* Create a Particle System!
*
* @author Pierre Chamberlain
*/
[SWF(frameRate="60")]
public class Test_Particles extends BP3D {
[Embed(source="Test_Particles.macro", mimeType="application/octet-stream")]
private static const _MACRO:Class;
private static const _COUNTER_LIMIT:int = 10000;
private var _numOfQuads:int;
private var _totalVertices:int;
private var _particlePointers:BPVertexBuffer;
private var _particleColor:uint = 0xff0000ff;
private var _particleVertices:BPVertexBuffer;
private var _particleIndexes:BPIndexBuffer;
private var _counter:int = 0;
private var _counterSpeed:int = 20;
private var _currentQuarter:int = -1;
private var _currentRandom:Number;
private var _averageRadius:Number;
private var _isColorMode:Boolean = true;
private var _view360Hue:Number;
protected override function _main():void {
super._main();
backgroundColor = 0;
timerRateIdeal = 5;
programMacro( _MACRO );
context3D.setCulling(Context3DTriangleFace.NONE);
context3D.setBlendFactors(Context3DBlendFactor.SOURCE_ALPHA, Context3DBlendFactor.ONE_MINUS_SOURCE_ALPHA);
//Setup Vertex Constants static values:
getVertexConst(0).setAll(0, 1, 1.2, 0);
getVertexConst(1).setAll(1/viewWidth, -1/viewHeight, 0, 0);
particleColor = 0xffffffff;
//Create TONS of particles
_numOfQuads = 10000;
_totalVertices = _numOfQuads * 4;
var particleSize:int = 1;
//Vertex-Format = X, Y, Random Angle, Time Step.
var va0:ByteArray = ByteArrayUtils.times(_numOfQuads, [
-particleSize, -particleSize, 0, 0,
particleSize, -particleSize, 0, 0,
particleSize, particleSize, 0, 0,
-particleSize, particleSize, 0, 0,
]);
//Store start-indices and index-offset variables
var va0_w:int = 4 * 3; // 4 bytes * index #3
var va0_z:int = 4 * 2; // 4 bytes * index #2
var skipBytes:int = 4 * 3; // Skip every 12 bytes
ByteArrayUtils.forEvery( va0_w, skipBytes, writeTimeFraction, va0, va0.writeFloat);
ByteArrayUtils.forEvery( va0_z, skipBytes, writeRandomAngle, va0, va0.writeFloat);
//Create VertexBuffer and IndexBuffer:
_particleVertices = BPVertexBuffer.make(context3D, _totalVertices, 4, va0);
_particleIndexes = BPIndexBuffer.fromQuads( context3D, _numOfQuads );
//Create a decent-size radius to pass in the constant #3:
_averageRadius = Math.sqrt(viewWidth * viewWidth + viewHeight * viewHeight);
_view360Hue = 360 / viewWidth;
context3D.setVertexBufferAt(0, _particleVertices.buffer, 0, Context3DVertexBufferFormat.FLOAT_4);
addLabel('Move mouse around. Click to change Mouse *MODE*\n<font size="12">Affects the color & speed.</font>', true);
stage.addEventListener(MouseEvent.MOUSE_DOWN, onStageClick);
Mouse.hide();
}
private function onStageClick(e:MouseEvent):void {
//Toggle the color-mode
_isColorMode = !_isColorMode;
if (!_isColorMode) {
getVertexConst(2).setAll(1, .5, .5, 1);
}
}
protected override function _calc():void {
super._calc();
var nowTime:Number = ((_counter+=_counterSpeed) / _COUNTER_LIMIT) % 1;
if (_isColorMode) {
var nowX:Number = stage.mouseX * 2 - viewWidth;
var nowY:Number = stage.mouseY * 2 - viewHeight;
getVertexConst(3).setAll(nowX, nowY, _averageRadius, nowTime);
} else {
getVertexConst(3).setAll(0, 0, _averageRadius, nowTime);
particleColor = XColor.hsvToRGB(stage.mouseX * _view360Hue) << 8 | 0xFF;
_counterSpeed = int(stage.mouseY - (viewHeight>>1)) >> 2; //A good moderate speed range
}
}
protected override function _draw():void {
super._draw();
context3D.drawTriangles( _particleIndexes.buffer );
}
private function writeTimeFraction( pCounter:int ):Number {
return int(pCounter*.25) / _numOfQuads;
}
private function writeRandomAngle( pCounter:int ):Number {
var quarter:int = int(pCounter * .25);
if (_currentQuarter != quarter) {
_currentRandom = Math.random() * Math.PI * 2;
_currentQuarter = quarter;
}
return _currentRandom;
}
public function get particleColor():uint { return _particleColor; }
public function set particleColor(value:uint):void {
_particleColor = value;
var ff:uint = 0xFF;
getVertexConst(2).setAll(
((value >> 24) & ff) / ff,
((value >> 16) & ff) / ff,
((value >> 8) & ff) / ff,
(value & ff) / ff
);
}
}
}
Back to the Macro file…
Let’s now take a look at what AGAL operations we’ve told the GPU to perform with the vertex data.
alias op, vertexOut; alias vc0, const0; alias vc1, const1; alias vc2, const2; alias vc3, const3; alias vt0, temp0; alias vt1, temp1; alias vt2, temp2; alias vt3, temp3;
One of the benefits of using the AGALMacroAssembler is the use of aliases. I haven’t used names as meaningful as I should of, but it does help a little to distinguish a constant from a temporary register.
//Set the Position of the Vertex: temp0.zw = const0.xy; temp0.xy = va0.xy + const3.xy;
As the comment suggest, this is what sets the position of the vertex. In reality, every vertex is a spot in 3D space defined by X, Y, Z and W fields. Since I’m building a 2D world instead of 3D, I know that Z and W will remain the same throughout the execution of this demo. If you remember, these two values were assigned 0 and 1 (respectively) in ActionScript. This means that each vertex will always be at zero on the Z axis (depth), and W is always 1 (I’m not 100% sure of this field’s role, but it seems to affect scale, so it’s important to leave it as 1).
Each vertex’s X and Y are set with the Vertex Attribute 0‘s XY fields plus the offset given by the Vertex Constant 3, stored in it’s XY fields also. In layman terms, think of it as a playground with a sandbox on wheels full of children. The sandbox (particle emitter) is the container that moves the children together as a whole, each child (particle) can move around the sandbox. So if you have a child at coordinates (10,10) and the sandbox is at (-5,2), it’s final position of the child on the playground would be (5,12).
//Calculate time offset temp1.z = va0.w + const3.w; //Only keep fraction portion (always stays between 0.0 to 1.0) frc temp1.z, temp1.z;
Remember those time-steps we’ve assigned to each Quads’ vertices? All four vertices in one Quad share the same time-step value, but one Quad to the next will have a different value. By adding the elapsed time (from const3.w) to the vertex’s individual time-step value, we’re basically preparing it to be animated!
The fraction extraction part tells it to store only whatever would be left after the decimal point. Imagine we’re calculating the 4 vertices in the last Quad (time-step = 0.99) and the current time is near one full cycle (time = 0.90). If we do the sum, it is going to exceed 1.0 by a long shot (at least in this context)! So, by simply extracting only the portion we need, we’re looping around the clock. It restricts our possible values between 0.0 and 1.0 which is what we’ll need to move the positions of our vertices.
//Multiply by number of pixels to move temp1.x = const3.z * temp1.z;
For the moment, we store a value that will represent the distance multiplied by our time. It’s a product that will be used in the trigonometric calculations below.
//Change the X&Y based on the angle (good ol' trigonometry) cos temp3.x, va0.z; sin temp3.y, va0.z;
Next, another temporary register is used to convert the random angle (va0.z) to a usable X and Y factor with the use of some trigonometry. You may have already seen something like this similar in ActionScript – but for those of which this is new, what this is doing is essentially giving us a value from -1.0 to 1.0 for X (with cos) and Y (with sin) that we can later amplify by using a bigger number (ex: a radius of 100px).
//Scale the cos & sin values by the desired distance temp3.xy *= temp1.x; temp0.xy += temp3.xy;
The product that we’ve found earlier got multiplied once more to our X and Y factors. So instead of being restricted by the -1.0 to 1.0 range by cos() and sin(), we will have a wider range defined by our distance given by the time-step calculations.
Let’s do a little recap now. Earlier, we’ve assigned temp0.xy the values of va0.xy, which are simply the size of the quad. This would leave the entire Quad position at the origin of the emitter (in its center). By adding our previous calculations to it, we’ve moved it away from the origin.
So far, we’ve dealt with values way too large to work with. If you’ve ever tried this on your own, you know that 0.0 is the center of your canvas and 1.0 is the extreme edge (-1.0 would be the opposite edge). So a value of 100 for X would be WAY to the right off-screen. We need to scale these values down quite a bit!
//Scale to viewport proportions (inversed viewport dimensions) temp0.xy *= const1.xy;
The supplied values are the inverse of our view’s width and height. In most intensive computing calculations, multiplying should be favored over dividing whenever possible, so this is why I invert those values ONCE when passing them via ActionScript, and then I multiply them within AGAL.
This is very helpful to use pixels as units in your vertex attributes or your emitter’s distance radius.
//Pass color to the vertex: v0.rgb = const2.rgb; //Change the alpha based on the distance travelled temp1.z *= const0.z; v0.a = const0.y - temp1.z; vertexOut = temp0;
Last but not least, the color is set by const2.rgb (which may vary in the other mode when you click anywhere on the screen). This isn’t complete without assigning our variant register an alpha value. So next, I scale the time-step we’ve calculated previously to whatever const0.z holds (this affects how steep or gradual the alpha
fades over time). If we tried to assign this directly to the variant’s alpha, we would have an opposite effect (transparent near the emitter, opaque further away from it), so we have to use 1.0 minus our alpha value before we assign it to the variant register.
Finally, we pass our position to our vertex output register!
The FragmentShader is fairly self-explanatory if you’ve understood everything up to this point. All it does is directly take the color of our variant (which isn’t necessarily interpolated~ since all 4 vertices in our Quad share the same color) and assigns it to the fragment output register.
Test_Particles.macro (Completed Code Example)
//VERTEX SHADER STARTS HERE: alias op, vertexOut; alias vc0, const0; alias vc1, const1; alias vc2, const2; alias vc3, const3; alias vt0, temp0; alias vt1, temp1; alias vt2, temp2; alias vt3, temp3; //Set the Position of the Vertex: temp0.zw = const0.xy; temp0.xy = va0.xy + const3.xy; //Calculate time offset temp1.z = va0.w + const3.w; //Only keep fraction portion (always stays between 0.0 to 1.0) frc temp1.z, temp1.z; //Multiply by number of pixels to move temp1.x = const3.z * temp1.z; //Change the X&Y based on the angle (good ol' trigonometry) cos temp3.x, va0.z; sin temp3.y, va0.z; //Scale the cos & sin values by the desired distance temp3.xy *= temp1.x; temp0.xy += temp3.xy; //Scale to viewport proportions (inversed viewport dimensions) temp0.xy *= const1.xy; //Pass color to the vertex: v0.rgb = const2.rgb; //Change the alpha based on the distance travelled temp1.z *= const0.z; v0.a = const0.y - temp1.z; vertexOut = temp0; ### //FRAGMENT SHADER STARTS HERE: alias oc, fragmentOut; fragmentOut = v0;
I’ve seperated the VertexShader from the FragmentShader by using “###” as a delimiter. The reasoning behind choosing that delimiter? Hmmm… I really couldn’t tell you! Perhaps I should have chosen “3 or more # signs” as the rule, but for now this is how it works.



Hi, thanks for the post. I tweaked the pixel size to about 5 ~ 8, the effect looks even stunning. I notice also that the quads are fixed in orientation, so one question: say I wanted the add a local origin to each of the quads, can I tell agal to do that? Or do I need to do all calculations in AS and pass the results to render in agal?
Shiu, although I am no expert at this stuff (yet!), if I would take a guess… this is my recommendation:
Since each vertex doesn’t know really anything about eachother in the huge list of vertices (VertexBuffer), in order to provide one point of reference to all 4 vertices forming the quad… you will need to tell in AS to supply a few more pieces of information that they can relate to. It will require a bit of extra work in AGAL as well of course to make sense of the new vertex data information.
There is multiple ways to go about this to achieve similar results. If I would do this, I would first think about how I can cut down on the amount of data required to change.
Good questions to ask yourself:
- Can I reuse some of the existing values?
- Could I supply some information as a new vertex-constant?
- At which part of the execution of AGAL would it be better to inject the rotation manipulation?
- What algorithm will I use to move the vertices around the origin?
- Do I want individual rotation “speeds” for each quads?
As far as GPU resources, if you find yourself updating a long set of vertex-data on each frame, try at least to isolate that data into its own “columns” (another VertexBuffer), or perhaps have them at fixed random values and do a similar approach with time interpolation (like I did for the position).
I hope that clarifies your problem and that you can achieve your effect! If you need a handy set of trigonometric operations in AGAL, David (creator of EasyAGAL) notified me that he has well-written and optimized atan & atan2 methods that may be crucial for your calculations.
However you pull it off, I’d love to hear back from you (and see the results too!)
Good luck!
Thanks for the tip. I’ll try this out!
Hi again! I scanned through Starling forum and found our question was almost alike. Have you already experimented on it with batch calls? I was just thinking: wont your approach be more optimised? Let’s assume that all sprite’s color are the same, like that in this post. We do the following,
-upload a matrix into a constant on initiation(to orient sprites)
-upload some parameters of the kinematic equation (linear motion) into registers on initiation
-just supply a time factor for each sprite in over frames to differentiate their path & orientation
Im guessing I sound a little naive trying to outsmart the experts, but theoretically that’s kinda logical. All calculations done on GPU. Perhaps I shall do a little experiment on it. Im guessing there’s gonna be a lot of AGAL written for all the math calculation.
Hi again Shiu, great to hear back from you.
I’ve never experimented with Batch calls (yet), but as far as I understand, that technique exists in case you rely heavily on making the adjustments Frame-by-Frame on vertex-constants to give each quads a very unique set of parameters. But where vertex-constants are limited to 128 (even less if you already use a few for mathematical constants, such as PI, 1, 0 or other useful numbers for your AGAL purpose…), that is why it needs to be iterated in “batches” to reach around the same output achieved with a set of large vertex-buffers as I demonstrated.
The more I looked into “batching” my calls, the more I encountered some varying levels of… well… disappointment. What I mean by that, is for instance – to have each vertex in the vertex-buffer have a reference to which vertex-constant they belong with (where to get the common information to one quad, essentially), you can only pass it in the vertex-attribute register. For example:
You can use this:
mov vt0, vc[va1.x];
But you can’t use any other type of registers as the key, not even temporary registers! (Say what!?)
mov vt0, vc[vt1.x]; //This doesn’t work! (at least from the few attempts I tried)
Other contributing factors for me to not go towards the “batching” way – it just looks like… in the long run, in a real world application and/or game, if you were required to render sprites in a certain order (such as: background elements first, middle ground next, characters + enemies + collectible items next, foreground next, camera post-process “shader” and finally… HUD last), I think it would require a much more complex system to alternate between various Program3D instances (for each types of rendering you’d have to do) to draw batches of each of those sprite categories.
On the opposite side, using large vertex-buffers with interpolation trickery means you will have to come up with clever ways of using the vertices’ time factors (and possibly some other random seeds unique to each vertices / quads), and use only a small amount of vertex-constants to pass in the current time step and other values.
Personally, I’d prefer to keep on researching and experimenting with ways of optimizing (as you’ve mentioned ;)) the use of Context3D’s resources and minimize the amounts of trips / quantity of data the CPU needs to update on the GPU. Even with the resources limitations that AGAL has in its documentation, I think we have more than enough to attempt creating a rendering system that will use even fewer draw calls than existing “batch” calls libraries.
The approach you proposed is somewhat similar to that used by Actionscript Physics Engine (APE), using timesteps to update location, orientation, collision, etc of individual sprites (not vertices). Perhaps there’s a chance to model after it.
Hi, Im a little confused on the explanation given. Now each vertex shader only outputs 1 processed vert (eg 1 & only 1 op). If we batch together, that means there’ll be multiple outputs in one run. I doubt you’re suggesting we can write multiple op. How else can we output these processed verts? Assume we have processed 5 in the first run (1 + 4 extra for batch), how to tell agal to skip processing to the 6th on the second run? As I read from Marco’s explanation, shaders are processed for each vert sequentially. Do correct me if Im wrong
I’ll try to keep this one shorter haha!
True, you can only write to one “op” at a time in your vertex-shader.
Batch-calls means that you will be rendering multiple “drawTriangles(…)” one after an another during one frame cycle. You may also potentially alternate between Program3D objects (but likely not, if you’re simply drawing the same type of particles / objects)
To offset your rendering passes, you will need to use another IndexBuffer3D that points to the next “batch” of vertices. OR, you can reuse the same, but only call a certain quantity at a time in “drawTriangles(…)”. I also believe it allows you to specify a start-offset parameter.
You just have to keep track of those indices, vertices, and which draw call your currently running. Whether you use a while-loop, for-loop, or even a recursive function… that’s entirely up to you :)