During my PhD at the
EPFL VRLAB, I have worked for 4 years on a real-time full-featured crowd engine from scratch. Its name is YaQ.
This work has been achieved in close collaboration with two amazing people:
- Jonathan Maim, Yaq's daddy, and my closest friend, who also obtained his PhD from the VRLAB in 2009, and
- Mireille Clavien, the lab's 3D designer who used our engine to many different ends. She always had two lists for us: one of bugs to solve and one of new features to add ;-)
Here are some examples of the domains we have tackled with YaQ
The idea behind crowd patches is to provide a solution for populating very large-scale scenes with crowds of virtual humans.
Indeed, simulating interactive crowds in such environments is usually
too expensive to be performed in real time. On the other hand, the
solution of completely pre-computing paths for every character is too
memory demanding.
We have developed a solution that pre-computes the simulation task for a small patch of environment. By interconnecting such patches together, like dominos, we can simulate very large crowds at low cost: indeed, collision avoidance is pre-computed in the patches, and thus, the remaining work to achieve at run-time is much reduced.
The simulation can last as long as wished, for patches are implemented
to be periodic: the computed trajectories can be endlessly and
seamlessly replayed over time.
Using these patches, we can populate vast environments, of a potentially
infinite size: patches are only set where the camera is looking. Their
assembling is very fast and can be achieved at runtime, depending on the
camera position.
Finally, since trajectories are pre-computed, local collision avoidance can be achieved with highly accurate, time consuming methods to provide excellent results at run-time.
Check out our publication on crowd patches:
I3D09 (video)

Rendering crowds to make them realistic is a challenging problem that has many different aspects.
The first one is scalability, or how to be able to render large crowds while keeping individuals as realistic as possible. To that end, we've developed 3 levels of detail in YaQ: deformable meshes close to the camera, rigid meshes at farther distances, and impostors, for the background mass population.
Deformable meshes are composed of a skeleton, skinned with a mesh. At runtime, a skeletal animation is played, and the mesh is deformed accordingly in the GPU. Deformable meshes are costly to animate, for the mesh skinning is performed at runtime, but they are cheap in memory and flexible, allowing for procedural animations and blending.
Rigid meshes are pre-computed postures of a deformable mesh: given a skeletal animation, skinning is performed offline, and the set of deformed vertices is stored for each keyframe, to be reused online. Rigid meshes are fast to render, because the mesh deformation is pre-computed. On the other hand, they are more expensive in terms of memory, since all the mesh vertices and normals have to be stored for each keyframe of animation. They also make online procedural animations or blending impossible.
Last but not least, an
impostor, or billboard, is represented by 2 triangles, forming a quad, textured with an image of a virtual human mesh performing an animation. To create impostors offline, for each keyframe of an animation, images are taken from different points of view and stored in associated textures. This is the fastest representation and allows to display massive crowds. However, their memory storage is very expensive and only a minimal set of animations are so sampled. Also, rendering quality is usually poor, for the textures do not have a very high resolution. For this reason, impostors are used in the background only.
Check out publications on YaQ and levels of details:
VirPop09 CG&A09 (video)
Another problem inherent in crowd rendering is the appearance variety of the individuals.
The actors composing a crowd are often many instances of a small set of templates. Thus, if no further action is taken, many individuals in the crowd look like clones.
In YaQ, we've developed several techniques to make each crowd member look unique.
First of all, we've used segmentation maps, or additional textures with the same parametrization as the human template original texture. We use each channel of a segmentation map (R,G,B,A) to differentiate the template's body parts. At initialization, a color is chosen for each body part of each character. At runtime, these colors are retrieved and applied to the character in the GPU and used to render the character. As a result, two instances of a same template look different, for their clothes, skin, eyes, and hair color are different.
To further vary instances, we have also implemented the fastening of accessories to a skeleton joint. Accessories are small meshes representing various elements that people wear or carry, e.g., hats, wigs, bags, glasses, cellphones. Simple accessories, like wigs or hats can be attached to a joint of the skeleton without editing the played animations. Some other, more complex accessories, like bags, require the played animations to be modified at runtime. Depending on the accessory and the way it is held, we edit the animation at runtime to either freeze a joint, like, for instance, the elbow to hold a balloon or a bunch of flowers, or constrain the orientation of a joint, in cases like holding a bag. To do so, we've exploited exponential maps.
Check out our latest publication on crowd variety:
CG&A09 (video)
Moving characters need to plan the paths they will take, navigate
towards their goal, and avoid collisions with the environment and with
each other. These aspects are important challenges when dealing with
large crowds in real-time: the solutions need to be fast as well as
realistic.
To handle crowd navigation, we have introduced into Yaq a hybrid
architecture that handles real-time motion planning of thousands of
pedestrians with a level-of-detail approach.
In regions of high interest, i.e., close to the camera, or where an
important event is occurring, we use a long-term potential field-based
approach. This approach is particularly efficient, since the potential
field computation is limited to zones of high interest only.
In zones of lower interest, e.g., farther from the camera, the motion
planning technique we use is based on Navigation Graphs [Pettré et al. 2006] and a
short-term avoidance technique. This approach provides less realistic
results, but is faster than the potential field technique. The results
obtained are sufficiently realistic to be displayed in places of low
interest.
Finally, in regions of no interest, i.e., outside the view frustum,
navigation is ruled by a Navigation Graph. In such zones,
inter-pedestrian collisions do not need to be avoided, since they are
invisible.
Tests and comparisons show that our architecture is able to
realistically plan motion for many groups of characters, for a total of
several thousands of people in real time, and in varied environments.
Check out publications about this subject:
VC08 (video) CW07 (video)