SIGGRAPH 2019 Conference Content
Course: An Introduction to PhysicsBased Animation
Course: My Favorite Samples
Demoscene Worldwide (BOF)
Frontiers: How Computer Graphics Expertise Will Further … Machine Learning
Frontiers: Imagining a Black Hole with the Event Horizon Telescope
Frontiers: Metric Telepresence
Frontiers: Speculative Futures
Frontiers: Visual Strategy
Keynote
Opening Ceremony / Awards
Production Session: “Space Explorers: Life in Orbit”
RealTime Live!
Technical Papers Fast Forward
Technical Papers: AR and VR
A Deep Dive Into Universal Scene Description and Hydra
Advances in RealTime Rendering in Games
Are We Done With Ray Tracing?
Capture4VR: From VR Photography to VR Video
CreativeAI: Deep Learning for Computer Graphics
Computational Fabrication
Differentiable Graphics With Tensorflow 2.0
Geometric Algebra for Computer Graphics
Geometric Computing With Python
Introduction to RealTime Ray Tracing
My Favorite Samples (+ extended abstract)
On Hybrid LagrangianEulerian Simulation Methods…
Open Problems in RealTime Rendering (to appear)
Path Guiding in Production
Path Tracing in Production
Perception of Virtual Characters
Practical Course on Computing Derivatives in Code
RTX Accelerated Ray Tracing With OptiX (+ code)
See KeSen Huang’s comprehensive papers list.
A Generative Model for Volume Rendering
A LowDiscrepancy Sampler that Distributes Monte Carlo Errors as a Blue Noise in Screen Space
A Scalable RealTime ManyShadowedLight Rendering System (+ video)
A Practical Guide to Thin Film and Drips Simulation
A Vector Field Design Approach to Animated Transitions (TVCG)
Autofocals: Evaluating GazeContingent Eyeglasses for Presbyopes
DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality
(CVPR)
Firefly: Illumination Drones for Interactive Visualization (TVCG)
Foveated Displays: Toward Classification of the Emerging Field
GazeContingent Ocular Parallax Rendering for Virtual Reality
Light Attenuation Display: Subtractive SeeThrough NearEye Display via Spatial Color Filtering (TVCG)
Mesh Wrap Based on AffineInvariant Coordinates
Mortal Kombat 11: High Fidelity Cached Simulations in RealTime
Motion parallax for 360° RGBD video (TVCG)
Practical Dynamic Lighting for LargeScale Game Environments
Shrinking Circles: Adaptation to Increased Curvature Gain in Redirected Walking (TVCG)
Taming the Shadow Terminator
Effectiveness of Facial Animated Avatar and Transformed Voice in eLearning Programming Course
Glove Puppetry Cloud Theater Through a Virtual Reality Network
Layered Reconstruction of Stippling Art.
Learning From HumanRobot Interactions in Modeled Scenes
MagicPAPER: Tabletop Interactive Projection Device Based on Tangible Interaction
Meet in Rain: A Serious Game to Help the Better Appreciation of Chinese Poems
Puppeteered Rain: Interactive Illusion of Levitating Water Drops by PositionDependent Strobe Projection
VRPropNet: Realtime Interaction with Virtual Props
A Design for Optical Cloaking Display
A StretchSensing Soft Glove for Interactive Hand Pose Estimation
A Transparent Display With PerPixel Color and Opacity Control
Active Textile Tailoring
Arque: Artificial BiomimicryInspired Tail for Extending Innate Body Functions
Demonstrating Preemptive Action: Accelerating Human Reaction…
EyeHacker: GazeBased Automatic Reality Manipulation
HAPTIC PLASTeR: Soft, Thin, Light, and Flexible Haptic Display…
Liquid to Air: Pneumatic Objects
MagniFinger: FingertipMounted Microscope for Augmenting Human Perception
Matching Visual Acuity and Prescription: Towards AR for Humans
Melody Slot Machine
PickHits: Hitting Experience Generation With Throwing Motion via a Handheld Mechanical Device
PinocchioVR (+ video)
Shading Atlas Streaming
ShapeSense: A 2D ShapeRendering VR Device With Moving Surfaces…
Space Walk: A Combination of Subtle, Redirected Walking Techniques…
TeeVR: Spatial TemplateBased Acquisition, Modeling, and Rendering of LargeScale Indoor Spaces
TeleSight: Enabling asymmetric collaboration in VR between HMD user and NonHMD users
Academy Software Foundation: Open Source Day (recordings)
Demoscene Worldwide (recording)
Khronos Group (+ recordings: morning, afternoon)
VFX Reference Platform…
Autodesk Vision Series
Intel
Maxon Cinema 4D: day 1, day 2, day 3 (recordings)
NVIDIA
RenderMan Art & Science Fair (recordings)
SideFX: Houdini HIVE (recordings)
SideFX: Solaris (recording)
Unity (recordings)
Unreal Engine User Group (recording)
Last time around, I showed that we could improve upon Imageworks’ multiplescattering approximation by precalculating a new term, $\Ems$, from the Heitz model itself. This is, of course, a welltrodden path in rendering: if you can’t use something directly then try to tabulate or fit it instead!
To make this more practical, I also outlined a multiplescattering variant of the split integral trick already in common use in realtime rendering, which allows the specular colour, $\Fr$, to be factored out.
I’ll now go into the details of this precomputation process, and also show how, with further crunching, we can reduce the storage and runtime cost even more.
A practical approach for precomputing the directional albedo of a singlescattering BRDF is via Monte Carlo integration with BRDF importance sampling.^{1} We can calculate $\Ems$ from the Heitz model in a similar way via its randomwalk sampling process.
Conceptually, each random walk starts with a ray coming from the camera, which bounces one or more times on the microsurface before finally escaping it:
Figure 1: An illustration of the randomwalk sampling process. (Courtesy of Eric Heitz.)
At each bounce, some energy is lost due to absorption and the rest is reflected.^{2} In the case of our copper conductor test subject, the final energy (or energy throughput) is simply the product of Fresnel reflection at each bounce.
If we perform this process many times and average the final energy of every multibounce walk – i.e., treating singlescattering walks as having zero energy – then we obtain $\Ems$, as shown in the last post:
Figure 2: $\Ems$ for copper GGX material.
As I previously discussed, we don’t want to precompute $\Ems$ directly, since it bakes in $\Fr$. Instead, we’ll precalculate factors, $\wa \ldots \wnm$, which can be used to reconstruct $\Ems$ at run time for any $\Fr$:
To achieve this, we’re going to make two very simple modifications the random walk process:
The walk will now keep track of a vector of energy throughputs, $\ea \ldots \enm$, rather than a single value. At the beginning of the walk, $\ea$, is initialised to 1 and $\eb \ldots \enm$ to 0.
At each scattering event, the energy throughput was previously multiplied by a Fresnel factor, $F(\omega_m, \omega_r)$, where $\omega_h$ is a sampled normal and $\omega_r$ is the outgoing ray direction. If we use the Schlick Fresnel approximation for $F$, this is
$\quad$ We will now use $\ei$ to track the fraction of energy scaled by $\Fr^i$.
From Equation $\ref{eq:fres}$, we can see that that after the first bounce, a factor $\sia$ of the energy is multipled by $\Fr$ and the rest, $\ssa$, is left untinted, so we set $\ea = \ssa$ and $\eb = \sia$. On the next bounce, with a new Fresnel factor $\ssb$, we will have $\ea = \ssa \ssb$, $\eb = \sia \ssb + \ssa \sib$ and $\ec = \sia \sib$. More energy has moved from order 0 to order 1, and some from order 1 to order 2.
Working things though, the update to our energy throughput vector at each bounce can be written as
Or in code:
1 2 3 

The final $\wa \ldots \wnm$ factors that we’re after are simply the averages of $\ea \ldots \enm$ over multiple runs of the random walk process (again, ignoring single scattering):
Figure 3: Multiplescattering Fresnel factors for GGX.
You can see the whole process in action in this WebGL demo, which reproduces Figure 2 as well as the decomposition shown in Figure 3.
Note: there’s quite a bit going on in this demo, but I didn’t want to get into the weeds in this post, so I’ll write up some separate notes at a later date.
Although it may be hard to tell from Figure 3, there is some residual energy in the bottom right corner^{3} of $\wc$, and this is true for a few higher orders as well. This implies that we’d need two or three textures to store all of the factors for fully accurate results, as well as a fair amount of maths in the shader. It would be nice to slim this down!
Fortunately, since most of the energy is in the lower orders, the source polynomials for $\Ems$ can be accurately refitted to lowerorder cubic curves. For instance, here’s a plot of $\Ems$ for the bottom right corner, together with the refit:
This means that, in practice, we only need a single fourchannel texture and three MAD
instructions (when Equation $\ref{eq:ms_sum}$ is written in Horner form) to reconstruct $\Ems$ in the shader:
1 2 3 4 5 

I did this refitting in Mathematica using MiniMaxApproximation
to minimise relative error, which proved important for accurately reproducing the lower end of the curves.
You can see the end result for yourself in this second WebGL demo, which compares Imageworks along with the improvements discussed so far (left half), against the Heitz reference (right half).
Note: I realise that not everyone has ready access to Mathematica, so I’m planning to rewrite the refitting step in either Python or C++ before publishing the complete endtoend process on GitHub.
Up until now, I’ve been probing Imageworks’ energypreservation solution, which approximates microfacet multiple scattering with a diffuselike lobe. However, there are other potentially valid options here. For instance, concurrent to Imageworks’ investigations, Emmanuel Turquin developed a different solution while at Industrial Light & Magic, which instead approximates the multiple scattering via a rescaled single scattering lobe. This approach has since been adopted within Unity’s High Definition Render Pipeline^{4} as well Google’s Filament.
With permission, I have the privilege of hosting a technical report from Manu that goes into all of the details of his approach and various shortcuts, along with some comparisons to Imageworks and Heitz. That said, in conclusion he also states:
“Comparisons in this report have been largely of a qualitative nature. A more thorough and quantitative analysis of the differences between the three main methods is left for future work.”
In the next post, I hope to go some way towards providing this more detailed analysis. In the meantime, you should read Manu’s TR!
For an example of this, see the ImageBased Lighting section of [Karis 2013].↩
Since the focus is still on conductors, I’m ignoring refraction and transmission for now.↩
Corresponding to low view angles and high roughness values.↩
See the Multiple Scattering GGX section of [Lagarde and Golubev 2018].↩
SIGGRAPH 2018 Conference Content
Cesium: 3D Globes on the Web
Color Mavens Advise on Digital Media Creation and Tools
Deep Learning: A Crash Course
Demoscene Worldwide
Fluids 2: Vortex Boogaloo
Introduction to DirectX Raytracing
Keynote Address
Khronos Group BOF Sessions
Ohooo Shiny!
RealTime Live!
SideFX: Houdini HIVE
The Present and Future of RealTime Graphics for Film, Games, Production
Tripping the Light VR
VR@50: Celebrating Ivan Sutherland’s 1968 HeadMounted 3D Display System
Women in CG
3D User Interfaces for Virtual Reality and Games
Advances in RealTime Rendering in Games course
Applications of Vision Science to Virtual and Augmented Reality
Deep Learning: A Crash Course (recording)
Digital Typography Rendering
Getting Started with WebGL and Three.js (slides^{1})
Introduction to DirectX Raytracing
Introduction to the Vulkan Graphics API
Machine Learning and Rendering
Monte Carlo Methods for Physically Based Volume Rendering
Moving Mobile Graphics
Pathtracing in Production
Realistic Rendering in Architecture and Product Visualization
See KeSen Huang’s comprehensive papers list.
A Compact Representation for Multiple Scattering in Participating Media…
ActiVis: Visual Exploration of IndustryScale Deep Neural Network Models
Adaptive Environment Sampling on CPU and GPU
Automatic PhotofromPanorama for Google Maps
Augmented Reality for Virtual Set Extension
Automating the Handmade: Shading Thousands of Garments for ‘Coco’
Bidirectional Path Tracing Using Backward Stochastic Light Culling
ChromaGlasses: Computational Glasses for Compensating Colour Blindness
Classified Texture Resizing for Mobile Devices
Clean Cloth Inputs: Removing Character SelfIntersections with Volume Simulation
Creating the Unreal: Speculative Visions for Future Living Structures
DataInk: Direct and Creative DataOriented Drawing
Deep Thoughts on Deep Image Compression
Denoising at Scale for Massive Animated Series
Digital Albert Einstein, a Case Study
Engineering FullFidelity Hair for ‘Incredibles 2’
Fast Product Importance Sampling of Environment Maps
Fractal Multiverses in VR
PatchBased Surface Relaxation
Practical Denoising for VFX Production Using Temporal Blur
Regularization of Voxel Art
Robust Skin Simulation in ‘Incredibles 2’
Synthesising Panoramas for NonPlanar Displays: A Camera Array Workflow
Taming the Swarm: Rippers on ‘Pacific Rim Uprising’
The Making of ‘Welcome to Light Fields VR’
Zero to USD in 80 Days: Transitioning Feature Production to USD at DreamWorks
3D Content Creation Exploiting 2D Character Animation
Beckett in VR: Exploring Narrative Using Free Viewpoint Video
BOLCOF: Base Optimization for Middle Layer Completion of 3DPrinted Objects…
Conservative ZPrepass for FrustumTraced Irregular ZBuffers
CRISPR/Cas9NHEJ: Action in the Nucleus
Deep Motion Transfer Without Big Data
Depth Assisted Full Resolution Network for Single ImageBased View Synthesis
Design Method of Digitally Fabricated Spring Glass Pen
Efficient Multispectral Facial Capture With Monochrome Cameras
Make Your Own Retinal Projector: Retinal NearEye Displays Via Metamaterials
MegaParallax: 360° Panoramas with Motion Parallax
Progressive RealTime Rendering of Unprocessed Point Clouds
Solar Projector
Stitch: An Interactive Design System for HandSewn Embroidery
Which BSSRDF Model is Better for Heterogeneous Materials?
A FullColor SingleChipDLP Projector…
AerialBiped: A New Physical Expression by the Biped Robot Using a Quadrotor
FairLift: Interaction with Midair Images on Water Surface
Fusion: Full Body Surrogacy for Collaborative Communication
GumGum Shooting
HapCube: A Tactile Actuator…
HeadLight: Egocentric Visual Augmentation by Wearable Wide Projector
Human Support Robot (HSR)
LevioPole: MidAir Haptic Interactions Using Multirotor
Make Your Own Retinal Projector: Retinal NearEye Displays via Metamaterials
SEER: Simulative Emotional Expression Robot
Steerable ApplicationAdaptive NearEye Displays
Transcalibur: Weight Moving VR Controller…
Transmissive Mirror Device Based NearEye Displays with Wide Field of View
VPET  Virtual Production Editing Tools
WindBlaster: A Wearable PropellerBased Prototype…
ACESNext: Charting the Future of ACES
Demoscene Worldwide: Making an animation 18 bytes at a time
Gaffer
Khronos Group BOFs (glTF, Vulkan, WebGL, …)
MaterialX
OpenColorIO
Open Shading Language: OSL in 3ds Max
USD and OpenSubdiv: Tilt Brush
Unity: Customizing a Production Render Pipeline
Unity: Scriptable Render Pipeline From Scratch
Allegorithmic: Substance Day
Intel
NVIDIA
AR Transformation Mask
Character Adventures  Rigging, Animation and Crowds
Delayed Load Rendering Workflow
Design Through FX
Does your modelling department suck?
H17 Sneak Peek: PolyDraw
Houdini Concepts for the Maya Artist
Houdini Foundations  Dynamics
Hypercubes for VR Noobs
Procedural Workflows with Houdini and Unity…
The Lazy Artist’s Guide to Realtime VFX
What Makes a CG Superstar?
Academy Software Foundation
Boundary First Flattening
Walter
Use arrow keys to navigate.↩
In the last post, I’d shown how a small tweak to Imageworks’ multiplescattering Fresnel term, $\color{brown}{F_\mathrm{ms}}$, brought their approximation closer to the reference solution of Heitz et al. However, there was still a niggling difference at high roughness, most noticeable under uniform lighting:
Figure 1: Furnace test, with roughness $\in [\frac{1}{8}, \frac{2}{8}, \ldots 1]$ (left halves: Imageworks, right halves: Heitz).
At the end of the day, $\color{brown}{F_\mathrm{ms}}$ is based on a simple diffuse model, and its limitations become more apparent as multiple scattering increases with roughness.
A more accurate alternative would be to precalculate a multiplescattering directional albedo LUT that incorporates Fresnel, directly from the Heitz model. This new term, which I’ll call $\color{teal}{E_\mathrm{Fms}}$, leads to the following minor change to the multiplescattering lobe:
The good news is that this version produces results that match Heitz in a furnace environment:
Figure 2: Furnace test, with roughness $\in [\frac{1}{8}, \frac{2}{8}, \ldots 1]$ (left halves: with $\Ems$, right halves: Heitz).
The bad news is that we need a 3D LUT for $\Ems$, since it depends not only on the view angle ($\mu_o$) and roughness, but also $\Fr$ (or specular reflectance). Worse, for coloured metals – such as our lovely copper example – we’d need to do three separate 3D lookups for R, G and B. The only silver lining is that this cost can potentially be amortised across lights, but it’s still less than ideal for realtime applications.
Note: a secondary concern is that this change also breaks the recipriocity of the multiplescattering model, since the results will be different if $\mu_o$ and $\mu_i$ are swapped. While this isn’t a practical issue for realtime rendering, it could be a problem in other contexts (e.g. bidirectional path tracing).
If we’re willing to restrict ourselves to Schlick’s Fresnel approximation^{1}, then we can adapt a popular approach that’s been used with environmental lighting in games for a number of years [Drobot 2013; Karis 2013; Lazarov 2013]. In this context, the roughness and directionaldependent effects of microfacet shadowing and Fresnel reflection are factored out and preintegrated ahead of time for a given BRDF. At run time, this term is combined with separately prefiltered environmental maps to produce a cheap but effective approximation of the real integral of the BRDF and the lighting.
This idea goes back further (see the Ambient BRDF of [Gotanda 2010]), but the newer variants more compact as they exploit the linearity of Schlick Fresnel, $\Fs$, to factor out $\Fr$, which reduces the dimensionality of the preintegrated table (or fit, in the case of [Lazarov 2013]). I’ll quickly recap how this works, as it naturally extends to a solution for $\Ems$.
What these approaches are effectively calculating is a version of the singlescattering directional albedo, $\E$, that incorporates $\Fs$. Let’s call this $\Ess$:
where $\fssp$ is the singlescattering BRDF without Fresnel.
The key observation is that Schlick Fresnel’s additive form
allows $\Ess$ to be split into two parts:
one that will be tinted by $\Fr$ of the material at run time, and another that’s left untinted. This means that we can precompute a 2D LUT containing these two terms, rather than needing a 3D LUT.
A little more formally, we can view this as decomposing $\E$ into two factors, $\wa$ and $\wb$:
which are then multiplied by two orders of $\Fr$ and summed to form $\Ess$:
Apologies if I have laboured the point, but hopefully you can see where this is going: we can do a similar decomposition with the multiplescattering albedo, $1  \E$.
This time I’ll present things visually, since I think we’ve seen enough integrals for now. First, here’s $1  \E$ for GGX, which you may recognise from Imageworks’ slides:
Figure 3: $1  \E$, for GGX.
and here is the decomposition into factors for the various orders of $\Fr$, over the ($\mu$, roughness) domain:
Figure 4: Multiplescattering Fresnel factors for GGX.
Naturally we have more factors this time, since with multiple scattering there could be $1 \ldots N$ additional reflections before light leaves the microsurface. Given these factors, we can calculate $\Ems$ thusly:
As with the approach for environmental lighting, these factors can be precomputed and stored in 2D LUTs. At run time, we fetch the appropriate factors (based on view angle and roughness) and calculate $\Ems$ using Eq. $\ref{eq:ms_sum}$.
While this hopefully all makes sense, it’s a little abstract, so let’s visualise $\Ems$ for our copper material:
Figure 5: $\Ems$ for copper GGX material.
As we can see, it’s just like the multiplescattering albedo shown in Figure 3, only now it’s been tinted by the different orders of Fresnel reflection. Note how the saturation increases from top left (low roughness, grazing angle) to bottom right (high roughness, incident view angle), as we would expect^{2}.
Finally, here are the spheres again with this LUTbased solution, this time under direct lighting:
Figure 6: Lit spheres, with roughness $\in [\frac{1}{8}, \frac{2}{8}, \ldots 1]$ (left halves: with $\Ems$, right halves: Heitz).
In this example, our revised multiplescattering approximation (Eq.$~\ref{eq:fms}$) is barely indistinguishable from the Heitz model. The only slight difference, at roughness = 1, comes from the multiplescattering lobe not being a perfect match to the ground truth, as we already saw in the last post:
Figure 7: GGX with multiple scattering, roughness = 1
(left half: Imageworks, right half: Heitz).
I will stop things here as this post has already reached a comfortable reading length, but I hope you’ll agree that we’ve made some progress.
In the next post, I cover how the precomputation of $\Ems$ is achieved in practice, and how it can be further optimised for realtime use.
This is an entirely reasonable choice given its popularly in realtime rendering, and it’s actually what I have been using in all of my examples so far.↩
Of course this is already visualised by the decomposition in Figure 4, but it’s consistent with the behaviour of the Fresnel function and the average number of bounces increasing with roughness.↩
In the last post, we saw that significant energy was being lost due to the common singlescattering limitation of microfacetbased shading models:
An intuitive way to think about this is that these BRDFs are only modelling direct lighting (= single scattering) of the microsurface heightfield. Indirect lighting (= multiple scattering) is not simulated, and that is the cause of the missing energy in the image above.
So, why do we have this limitation? Well, in much the same way that direct lighting and shadowing is relatively straightforward to do in realtime compared to global illumination, the same is true for microfacet shading models.
Single scattering is solved efficiently by making certain simplifying assumptions about how microfacets of a microsurface are arranged. Through this, it’s possible to come up with analytic expressions for how light is directly reflected by the microsurface, while incorporating self shadowing.
This second aspect is modelled, naturally enough, by the shadowing term (part of the shadowingmasking term, $G$) of standard microfacet BRDFs. The Smith shadowing term [Smith 1976] is currently the most popular option here, and it has been shown by Heitz [2014] to produce results that are a close match to bruteforce simulation. This is impressive considering that Smith’s model is pretty simple in terms of the microsurface assumptions that it makes.
Given the desirable properties of the Smith model (simple yet plausible, and also widely used), Heitz et al. [2016] chose to use it as the foundation for a new multiplescattering model. It derives directly from Smith’s microsurface assumptions and is evaluated through a random walk process. As a result, all orders of scattering are accounted for, and energy conservation is achieved as a natural consequence.
Let’s return now to our earlier spheres, this time rendered using the Heitz model:
The rougher spheres are certainly a lot brighter than before, and placing them under uniform lighting (which matches the background) confirms that energy is now completely conserved:
Heitz et al. also showed that, just as with Smith and singlescattering, their multiplescattering model has similar behaviour to a bruteforce simulation. Given that property, I will proceed to use their model as a ground truth reference to compare Imageworks’ approach against.
In contrast to the Heitz model, Imageworks’ solution is an approximation that attempts to compensate for the missing energy, rather than actually simulate the physical process of multiple scattering. This is achieved by adding an extra multiplescattering lobe, $\fms$ – based on [Kelemen and SzirmayKalos 2001] – to the existing singlescattering BRDF, $\fss$:
Note: $\mu_o$ and $\mu_i$ are simply the view and light cosines, i.e. $n \cdot v$ and $n \cdot l$.
All of the details can be found in Kulla and Conty’s excellent presentation, but I’ll cover the salient points. Briefly, there are two terms that make up this additional Kelemen lobe:
$\Emo$ is equivalent to what we saw previously with the furnace test result of a fully reflective, singlescattering material:
It’s the fraction of incoming light that leaves the microsurface after a single bounce, for the view angle $\mu_o$. The new lobe, $\fms$, is designed to account for the remainder, $1  \Emo$, so that we get $\fss + \fms = 1$, i.e. perfect energy conservation.
For materials that aren’t 100% reflective, there’s a further term, $\Fms$, which I’ll discuss later. For now, let’s stick with our “fully reflective” assumption (in which case $\Fms = 1$) and examine the results of adding $\fms$:
At first glance, the render looks pretty similar to the Heitz model, and we can see with a furnace test that energy is again conserved:
With a sidebyside comparison for each sphere (left half: Imageworks, right half: Heitz), we can see that indeed the two methods are very close, with only a minor visual difference at roughness = 1 (far right):
Here’s a zoomed in view of that particular case:
This is a promising early result, and I think you would be hardpressed to tell the difference between the two in real production scenarios containing more complex lighting and spatially varying roughness, for instance.
Now let’s see how things fare with more general conductors. To handle this case, Imageworks adapted a multiplescattering Fresnel term, $\Fms$, from [Jakob et al. 2014] (Expanded Technical Report, Section 5.6), which accounts for absorption/tinting as light bounces multiple times on the microsurface:
where $\Favg$ is the cosineweighted average of the Fresnel function, $F$, over the hemisphere.
Here is a new sidebyside comparison (left half: Imageworks, right half: Heitz), with a copper material:
There’s now a bit more of a difference between the two approaches, which is easier to see in a furnace:
Evidently the Imageworks result is lighter and less saturated than Heitz at higher roughness.
When I first saw this difference, I was tempted to conclude that $\Fms$ is simply an approximation that fails to accurately model the complexities of real multiple scattering. However, a closer look at the derivation of this term revealed a problem.
The diffuse multiplescattering model of Jakob et al. assumes that $\Eavg$ is the fraction of light that escapes the microsurface after each scattering event, leaving $1  \Eavg$ to continue to bounce. Furthermore, each reflection is assumed to attenuate the light energy by $\Favg$. This means that after the first bounce, the fraction of light energy leaving the surface is $\Favg\,\Eavg$, followed by $\Favg\,\Eavg\,\Favg\,(1  \Eavg)$, for the second bounce, etc. $\Fms$ is the total factor if we sum over all orders of scattering:
Note: this is equivalent to the interreflection model of Stewart and Langer [1996] (Equation 2).^{2}
The problem is that this model is including single scattering events ($\Favg\,\Eavg$), which we’ve already accounted for with $\fss$. This suggests that we should instead use:
However, this now gives $\Fms = 1  \Eavg$ when $\Favg = 1$, instead of $\Fms = 1$ previously. We want the latter behaviour because the $\fms$ lobe already has a magnitude of $1  \Emo$ (as discussed earlier), so we should normalise our adjusted $\Fms$ by $1/(1  \Eavg)$:
This is very close to what we had before (Eq. $\ref{eq:fms}$), except there’s $\Favg^2$ instead of $\Favg$ in the numerator.
With this simple change, the Imageworks solution gets closer to the ground truth:
The furnace test reveals that there is still a small difference: Imageworks is now a bit darker and more saturated compared to Heitz. Still, it’s an overall improvement.
This adjustment has been added to an updated version of Imageworks’ slides, along with numerical fits for $\Em$ and $\Eavg$ from Christopher Kulla. The new slides also contain several important corrections^{3} that are highlighted in the speaker notes.
In the next post, I discuss a further improvement to $\Fms$ that gets us even closer to the reference.
Also known as directionalhemispherical reflectance.↩
Thanks to Naty Hoffman for bringing this paper to my attention a while ago, in the context of ambient occlusion.↩
To my embarrassment, the original equation for Fms was wrong in the slides, due to a misedit on my part. Hopefully this blog post is a suitable atonement.↩
As part of the Physically Based Shading Course at SIGGRAPH last year, Christopher Kulla and Alejandro Conty presented the latest iteration of Sony Pictures Imageworks’ core production shading models. One area of focus for them was to improve the energy conservation of their materials; in particular they wanted to compensate for the inherent lack of multiple scattering in common analytic BSDFs, which can be a major source of energy loss. (More on this later.)
A year prior, Heitz et al. [2016] had addressed this very issue with an accurate model^{1} for microfacet multiple scattering. Unfortunately, since it uses a stochastic process, it wasn’t a good fit for Imageworks’ renderer. Instead, Kulla and Conty adapted ideas from earlier work [Kelemen and SzirmayKalos 2001; Jakob et al. 2014] in order to develop practical solutions for conductors and dielectrics.
While the multiplescattering term that Imageworks uses is energy conserving by design, it doesn’t actually simulate how light scatters between the facets of a given microsurface (in their case modelled by GGX) in the way that the Heitz model does. Still, in spite of this theoretical shortcoming, it is undoubtedly an improvement over doing nothing at all.
I remember being quite excited when I first saw Imageworks’ results, particularly because their approach appeared to be suitable for realtime use. At the same time, I was curious to see exactly how well it compared to the Heitz model as the gold standard. And beyond that, I was eager to explore the general topic of realtimefriendly approximations to microfacet multiple scattering. In the next few posts, I will share my findings, but first let’s start with a quick recap of the problem…
The most popular microfacetbased BSDFs in use today have a common limitation: they only model single scattering. This means that any incoming light that doesn’t immediately leave the microsurface through a single reflection or refraction event is ignored by these models. For instance, light that hits one facet and is reflected onto another (and so on) is treated as though it has been completely absorbed.
It’s this restriction on single scattering that made the derivation of compact, analytic models possible in the first place. However, the lack of multiple scattering can lead to significant energy loss with rougher surfaces, which makes sense since there’s a higher probability of light bouncing several times within the microsurface before escaping.
To give a concrete example of this problem, let’s start with a very simple material model: a GGXbased conductor with a constant reflectance of 1. Here is a render of a set of spheres made of this material, with roughness^{2} varying from 0.125 to 1 (left to right):
At a first glance this result might seem reasonable, but the problem of energy loss becomes readily apparent when the same spheres are placed in a uniform lighting environment (a.k.a. a furnace test):
As we can see, though our material is supposed to be completely reflective, more and more energy is lost as roughness increases. In fact, at the very right, close to 60% of the light has vanished due to the absence of microfacet multiple scattering. Up until recently, we had largely been sweeping this problem under the rug, but it’s hard to argue with concrete numbers like that.
A practical consequence of this behaviour is that it makes life harder for artists doing texture painting and look development, and while it might be possible to manually compensate for the darkening effect in simple cases such as above, it soon becomes an impossible task with textured reflectance and roughness.
It’s clear that we are falling a little short on the physically based shading front and the promise of intuitive material parameters. Fortunately, our collective feeling of shame will be momentary, since help is at hand.
Next we’ll take an initial look at Imageworks’ approach and see how it measures up against the Heitz model.
]]>SIGGRAPH 2017 Conference Content (for a limited time)
An Interactive Introduction to WebGL and three.js
An Introduction to Laplacian Spectral Kernels and Distances: Theory, Computation, and Applications
Advances in RealTime Rendering (most content online)
Applications of Visual Perception to Virtual Reality Rendering
Computational Narrative
Computing and Processing Correspondences with Functional Maps (SIGGRAPH Asia 2016 material)
Directional Field Synthesis, Design, and Processing
Multithreading for Visual Effects
Open Problems in RealTime Rendering
OpenVDB (to appear)
Path Tracing in Production (course notes; slides starting to appear)
Physically Based Shading in Theory and Practice (most content online)
Production Volume Rendering
Rethinking Texture Mapping
Video for Virtual Reality
VR Interactions (recording)
See KeSen Huang’s comprehensive papers list.
Double Hierarchies for Efficient Sampling in Monte Carlo Rendering
Headset Removal for Virtual and Mixed Reality
Importance Sampling of Many Lights With Adaptive Tree Splitting
Learning Light Transport the Reinforced Way
Lighting Up The Smurfs’ Enchanted Forest
Modeling Vellus Facial Hair from Asperity Scattering Silhouettes
Novel Algorithm for Sparse and Parallel Fast Sweeping: Efficient Computation of Signed Distance Fields
Precomputed Multiple Scattering for Light Simulation in Participating Medium
Production Ready MPM Simulations
Proxy Clouds for RGBD Stream Processing: An Insight
The Iray Light Transport Simulation and Rendering System
“VarCity  The Video”  semantic and dynamic city modelling from images (video)
A Gradient Mesh Tool for NonRectangular Gradient Meshes
Analyzing Interfaces and Workflows for Light Field Editing
Attributepreserving gamut mapping of measured BRDFs
Fast BackProjection for NonLine of Sight Reconstruction (code
Combining Biomechanical and DataDriven BodySurface Models
Digital Fabrication and Manipulation Method for Underwater Display and Entertainment (video)
Directional Occlusion via MultiIrradiance Mapping
Exploiting the Room Structure of Buildings for Scalable Architectural Modeling of Interiors
Improved Chromakey of Hair Strands via OrientationFilter Convolution
LeviFab: Stabilization and Manipulation of Digitally Fabricated Objects for Superconductive Levitation
Morpho Sculptures: Digital Fabrication Methods of Engraving Flat Materials Into ShapeChanging User Interfaces
Optimized Sampling for View Interpolation in Light Fields Using Local Dictionaries
Retexturing Under SelfOcclusion Using Hierarchical Markers
SemiDynamic Light Maps
Sonovortex: Rendering MultiResolution Aerial Haptics by Aerodynamic Vortex and Focused Ultrasound
Submerged Haptics: A 3DOF Fingertip Haptic Display Using Miniature 3D Printed Airbags
Touch3D: Touchscreen Interaction on Multiscopic 3D With Electrovibration Haptics
Unphotogenic Light: HighSpeed Projection Method to Prevent Secret Photography by Small Cameras
Khronos Group BOFs
OpenColorIO
AMD Capsaicin
Autodesk
Intel: Day 1, Day 2
NVIDIA
Pixar Science Fair
SideFX: Houdini HIVE
Unity
SIGGRAPH 2016 Conference Content (for a limited time)
Courses: Physically Based Shading in Theory and Practice
Talks: Brain & Brawn
Live Streaming Sessions (free registration required)
A Practical Introduction to Frequency Analysis of Light Transport
Advances in RealTime Rendering
An Elementary Introduction to Matrix Exponential for CG
An Introduction to Graphics Programming Using WebGL (preliminary)
Augmented Reality  Principles and Practice
Computational Tools for 3D Printing
Fourier Analysis of Numerical Integration in Monte Carlo Rendering: Theory and Practice
Fundamentals Seminar
Geometric and Discrete Path Planning for Interactive Virtual Worlds
Haptic Technologies for Direct Touch in Virtual Reality
HDR Content Creation: Creative and Technical Challenges
Inverse Procedural Modeling of 3D Models for Virtual Worlds
Modeling Plant Life in Computer Graphics
Moving Mobile Graphics
Open Problems in RealTime Rendering (starting to appear)
Physically Based Shading in Theory and Practice (video recording) (starting to appear)
Physically Based Sound for Computer Animation and Virtual Environments
The Material Point Method for Simulating Continuum Materials
The Quest for the Ray Tracing API
VectorField Processing on Triangle Meshes
See KeSen Huang’s comprehensive papers list.
A Practical Stochastic Algorithm for Rendering MirrorLike Flakes
Bluenoise Dithered Sampling
CacheFriendly MicroJittered Sampling (video)
Differential appearance editing for measured BRDFs
Digital Painting Classroom: Learning Oil Painting Using a Tablet
Estimating Local Beckmann Roughness for Complex BSDFs
HFTS: Hybrid FrustumTraced Shadows in ‘The Division’
Luma HDRv: An OpenSource HighDynamicRange Video Codec Optimized by LargeScale Testing
Making a Dinosaur Seem Small: Cloudscapes in “The Good Dinosaur”
Mesh Colors With Hardware Texture Filtering
Practical Analytic 2D Signed Distance Field Generation (direct link)
Quantum Supersampling
ShapeAnalysisDriven Surface Correction
Simulating Rivers in “The Good Dinosaur”
Stochastic Layered Alpha Blending
VolumeModeling Techniques in “The Good Dinosaur”
A Tabletop Stereoscopic 3DCG System With Motion Parallax for Two Users
Coded Skeleton: Programmable Bodies for Shape Changing User Interfaces
CrossField Haptics: PushPull Haptics Combined With Magnetic and Electrostatic Fields
Dynamic Spatial Augmented Reality With a Single IR Camera
ErrorBounded Surface Remeshing With MinimalAngle Elimination
Graphical Manipulation of Human’s Walking Direction With Visual Illusion
Interaction With Virtual Shadow Through Real Shadow Using Two Projectors
Interactive MultiScale Oil Paint Filtering on Mobile Devices
LightField Completion Using FocalStack Propagation
Model Predictive Control for Robust ArtDirectable Fluids
NonHumanoid Creature Performance From Human Acting
OpenEXR/Id (code)
Optimal LED Selection for Multispectral Lighting Reproduction
RayTraced Diffusion Points
RealTime 3D Face SuperResolution From Monocular IntheWild Videos
RealTime 3D Rendering Using DepthBased Geometry
RelationBased Parametrization and Exploration of Shape Collections
Sculpting Fluids: A New and Intuitive Approach to ArtDirectable Fluids
Synesthesia Suit: The FullBody Immersive Experience
glTF
Khronos Chapters (videos)
OpenCL
Vision
WebGL
NVIDIA
The Foundry (videos)
RealTime Cinematography in Unreal Engine 4
Once again, I’m collecting links to SIGGRAPH content: courses, talks, posters, etc. I’ll continue to update the post as new stuff appears; if you’re aware of anything that’s not listed here, please let me know in the comments.
Note: The ACM is also providing open access to all of the content^{1} for a limited time^{2}, as they did last year. (This probably explains why it’s taking a while for some of the material to show up elsewhere.)
Advances in RealTime Rendering
An Overview of NextGeneration Graphics APIs
BulletPhysics Simulation
Computational Tools for 3D Printing
Denoising Your Monte Carlo Renders…
Modeling and Toolpath Generation for ConsumerLevel 3D Printing
Moving Mobile Graphics
MultiThreading for Visual Effects
Open Problems in RealTime Rendering
OpenVDB
Physically Based Shading in Theory and Practice (starting to appear)
RealTime ManyLight Management and Shadows with Clustered Shading
RealTime Rendering of Physically Based Optical Effects in Theory and Practice
The PathTracing Revolution in the Movie Industry (starting to appear)
UserCentric Computational Videography
See KeSen Huang’s comprehensive papers list.
Accumulative AntiAliasing
Accurate Analytic Approximations for RealTime Specular Area Lighting
ArtDirectable Volumetric Multiple Scattering
Crafting Victorian London: The Environment Art and Material Pipelines of The Order: 1886
FeatureBased Texture Stretch Compensation for 3D Meshes
FlashMob: NearInstant Capture of HighResolution Facial Geometry and Reflectance
FrustumTraced Irregular ZBuffers: Fast, Subpixel Accurate Hard Shadows (direct link)
Melton and Moustaches: The Character Art and Shot Lighting Pipelines of The Order: 1886
Multiresolution Geometric Transfer for Jurassic World
RealTime Transformations in The Order: 1886
SemanticPaint: Interactive Segmentation and Learning of 3D Worlds
Authoring of Procedural Environments in “The Blacksmith” (direct link)
Build Your Own Game Controller
Interactive Robogami
FrameShift: Shift Your Attention, Shift the Story
Mirror Mirror: An OnBody Clothing Design System
MOR4R: Microwave Oven Recipes for Resins
PaperPulse: An Integrated Approach for Embedding Electronics in Paper Designs
Scanning and Printing a 3D Portrait of President Barack Obama
Augmented Reality for Cryoablation Procedures
Automatic Synthesis of Eye and Head Animation According to Duration and Point of Gaze
CHILDHOOD: Wearable Suit for Augmented Child Experience
Continuous and Automatic Registration of RGBD Video Streams with Partial Overlapping Views
CrowdPowered Parameter Analysis for Computational Design Exploration
Encore: 3DPrinted Augmentation of Everyday Objects…
Flex AR: Anatomy Education Through Kinetic Tangible Augmented Reality
Fractured 3D Object Restoration and Completion (video)
Inferring Gaze Shifts from Captured Body Motion
MouthGestureBased Emotion Awareness and Interaction in Virtual Reality
MR Coral Sea Evolved: MixedReality Aquarium With Physical MR Displays
Realtime Rendering of Subsurface Scattering according to Translucency Magnitude
Sketch Dance Stage
Touch the Virtual Reality: Using the Leap Motion Controller for Hand Tracking…
VISTouch
Visualizing Valley Wind Flow
Wobble Strings: Spatially Divided Stroboscopic Effect for Augmenting
Abstracts and videos
The Light Field Stereoscope
3D Graphics API State of the Union with Vulkan, OpenGL, and OpenGL ES (video)
3D Web Graphics With WebGL (video)
Accelerating Vision Processing with OpenVX and OpenCL (videos)
Cartographic Visualization
 Cesium: A WebGL Virtual Globe and Map Engine
 Update on X3D Geospatial from the Web 3D Consortium
Educator’s BOF: Preparing Students for Industry Using Open Source and GitHub
Virtual Globes Using WebGL and Cesium
Autodesk
NVIDIA Best of GTC
The Foundry (videos)
– Girls, Guns, and 3D Projection Painting in Mari
As before, I’m collecting links to SIGGRAPH content: courses, talks, posters, etc. I’ll continue to update the post as new stuff appears; if you’ve seen anything that’s not here, please let me know in the comments.
Update: In a welcome change this year, conference content is freely available from the ACM Digital Library (albeit via AuthorIzer, so there’s a tedious countdown timer for each link). Here are the most relevant pages:
Courses
Technical Papers
Talks
Posters
Emerging Technologies
Studio
The remaining links are to authorhosted presentations, project pages, videos and so on…
Advances in RealTime Rendering
AttentionAware Rendering, Mobile Graphics, and Games
Building an Empire: Asset Production in Ryse
Character Heads Creation Pipeline and Rendering in Destiny
Computational Cameras and Displays (via Naty Hoffman)
Destiny CharacterAnimation System and Lessons Learned (starting to appear)
Digital Ira and Beyond: Creating Photoreal RealTime Digital Characters
Introduction to WebGL Programming
Mathematical Basics of Motion and Deformation in Computer Graphics (via Naty Hoffman)
Navigation Meshes and RealTime Dynamic Planning for Interactive Virtual Worlds
Physically Based Shading in Theory and Practice (starting to appear)
Recent Advances in Light Transport Simulation: Some Theory and a Lot of Practice
Scattered Data Interpolation (direct link)
Skinning: Realtime Shape Deformation
StructureAware Shape Processing
The Glass Class: Designing Wearable Interfaces
Why Graphics Programmers Need to Know About DRAM
Creating Content to Drive Destiny’s Investment Game: One Solution to Rule Them All
See KeSen Huang’s excellent papers list.
A Continuum Model for Simulating Crowd Turbulence
A Fiber Scattering Model With NonSeparable Lobes
A Zerovariancebased Sampling Scheme for Monte Carlo Subsurface Scattering
Adaptive Rendering based on Weighted Local Regression
ASTC: The Extra Dimension
Dark Matter: A Tale of Virtual Production
Dynamic OnMesh Procedural Generation Control
Efficient Rendering With Tile Local Storage
Hierarchical Digital Differential Analyzer for Efficient RayMarching in OpenVDB
High Level Saliency Prediction for Smart Game Balancing
Implementing Efficient Virtual Shadow Maps for Many Lights
LargeScale Simulation and Surfacing of Water and Ice Effects in “How to Train Your Dragon 2”
Measurement and Modeling of Microfacet Distributions under Deformation
OpenVL: A DeveloperLevel Abstraction of Computer Vision
Perceptually Based Parameter Adjustments for VideoProcessing Operations
PositionBased Elastic Rods
Progressive Streaming of Compressed 3D Graphics in a Web Browser
Rapid Avatar Capture and Simulation Using Commodity Depth Sensors
RealTime Geometry Caches
Silencing the Noise on Elysium (abstract, code)
Temporally Coherent Video DeAnaglyph
Tuning Facial Animation in a Mocap Pipeline
An IcicleGeneration Model Based on SPH Method
3D Dynamic Visualization of Swallowing From MultiSlice Computed Tomography
A Virtual 3D Photocopy System
Augmented Reality Theater Experience
BelliesWave: Color and Shape Changing Pixels Using Bilayer Rubber Membranes
CageBased Deformation Transfer Using Mass Spring System
Coded Lens: Using a Coded Aperture for LowCost and Versatile Imaging
ContextAware MaterialSelective Rendering for Mobile Graphics
Detection of Stereo Window Violation in 3D movies
DirectionalityAware Rectilinear TextureWarped Shadow Maps
How Personal Video Navigation History can be Visualized
Interactive Relighting of Arbitrary Rough Surfaces
Material Parameter Editing System for Volumetric Simulation Models
metamoCrochet: Augmenting Crocheting With BiStable Color Changing Inks
Mossxels: Slow Changing Pixels Using the Shape of Racomitrium Canescens
Music as an Interventional Design Tool for Urban Designers
Ocean Wave Animation Using Boundary Integral Equations and Explicit Mesh Tracking (related SCA paper)
Optimizing Infinite Homography for BulletTime Effect
Parametric Stylized Highlight for Character Animation Based on 3D Scene Data
Shading Approach for Artistic Stroke Thickness Using 2D Light Position
Screen Space Cone Tracing for Glossy Reflections
VisionGL: Towards an API for Integrating Vision and Graphics
Waving Tentacles: A System and Method for Controlling a SMA Actuator
Wearable Haptics and Hand Tracking via an RGBD Camera for Immersive Tactile Experiences
(In)visible Light Communication: Combining Illumination and Communication
A Collaborative Seethrough Display Supporting Ondemand Privacy
A Compressive Light Field Projection System
Birdly
Cyberith Virtualizer
Graffiti Fur: Turning Your Carpet Into a Computer Display
HORN: The HaptOptic Reconstruction
JANUS
LumiConSense: A Transparent, Flexible, Scalable, and Disposable Image Sensor…
MaD: Mapping by Demonstration for Continuous Sonification
Physical Painting With a Digital Airbrush
Pinlight Displays: Wide Field of View Augmented Reality Eyeglasses
Pixie Dust: Graphical Levitation System
Spheree: A 3D PerspectiveCorrected Interactive Spherical Scalable Display (project, abstract, video)
Tangible and Modular Input Device for Character Articulation
Traxion: A Tactile Interaction Device With Virtual Force Sensation
Carto
 3D Geospatial Visualization on the Web with Cesium (via Patrick Cozzi)
OpenGL, OpenCL… (via Cyril Crassin)
Teaching Computer Graphics Online
 Teaching Intro and Advanced Graphics with WebGL, Patrick Cozzi
 Teaching Graphics Online, Andrew Glassner
Virtual Globes Using WebGL and Cesium (via Patrick Cozzi)
WebGL
Best of GTC, NVIDIA (via Christopher Sierigk)
Intel Exhibitor Sessions
The Foundry
2pm, Wednesday 13th August. Located in the west building, rooms 211214.
We’re back once again with the Physically Based Shading (in Theory and Practice) course at SIGGRAPH! You can find the details on the new course page, but I’ll copy the schedule here, for your convenience:
14:00
Physics and Math of Shading (Naty Hoffman)
14:20
Understanding the MaskingShadowing Function (Eric Heitz)
14:40
Antialiasing Physically Based Shading with LEADR Mapping (Jonathan Dupuy)
15:00
Designing Reflectance Models for New Consoles (Yoshiharu Gotanda)
15:30
Break
15:45
Moving Frostbite to PBR (Sébastien Lagarde and Charles de Rousiers)
16:15
Physically Based Shader Design in Arnold (Anders Langlands)
16:35
Art Direction within Pixar’s Physically Based Lighting System (Ian Megibben and Farhez Rayani)
As you can see, the composition of this year’s lineup is a little different than in previous years. To start with, we’ve incorporated a bit more theory into the first half of the course, beyond Naty Hoffman’s established and superlative introduction; Eric Heitz will be summarising his excellent JGCT paper on microfacet maskingshadowing functions, and Jonathan Dupuy will be distilling their recent work on LEADR Mapping. Jonathan also discusses a number of practical issues in his accompanying course notes.
Either side of the break, we have two game industry speakers, Yoshiharu Gotanda and Sébastien Lagarde. Yoshiharu will be covering his latest R&D at triAce, targeting nextgen hardware; Sébastien will also be presenting some advances, along with sharing the Frostbite team’s experiences in bringing physically based rendering principles to their engine and a number of titles. Séb and Charles de Rousiers have also compiled a highly detailed and extensive set of course notes, which should be available in the coming days.
Arnold has fast become a (physically based) force to be reckoned with inside the VFX industry, so it’s high time that the renderer receive attention in the course. With that in mind, we have Anders Langlands (Solid Angle) talking about what makes his opensource shader library alShaders
tick, the design decisions behind it, and how it plays to the strengths of Arnold.
Rounding off the session, we have Ian Megibben and Farhez Rayani from Pixar recounting the evolution of lighting over previous Toy Story films from an art perspective, as well as the challenges and benefits brought about by the switch to physically based rendering for Toy Story OF TERROR!
I hope to see you there!
We’ve been fortunate to have some really excellent presentations in the course over the past few years. One of the most enduring and influential has been Brent Burley’s Physically Based Shading at Disney, in 2012. Two years on, Brent has taken the time to update his course notes with a few additional details, complementing the shading model implementation that was added to Disney’s BRDF Explorer last November. Brent also revisits his “Smith G” roughness remapping, following the findings of Eric Heitz’ aforementioned paper. You can find Brent’s updated notes on the 2012 course page here.
]]>Once again, I’m collecting links to SIGGRAPH content: courses, talks, posters, etc. I’ll continue to update the post as new stuff appears.
An Introduction to OpenGL Programming
Advances in RealTime Rendering in 3D Graphics and Games
Combining GPU DataParallel Computing With OpenGL
Dynamic 2D/3D Registration for the Kinect
Efficient RealTime Shadows
Lights! Speed! Action! Fundamentals of Physical Computing for Programmers
Geometry Processing With Discrete Exterior Calculus
Multithreading and VFX
Numerical Methods for Linear Complementarity Problems in PhysicsBased Animation
Physically Based Shading in Theory and Practice
Ray Tracing is the Future and Ever Will Be
Recent Advances in LightTransport Simulation: Theory & Practice
Rendering Massive Virtual Worlds (via @_Humus_)
Turbulent Fluids
See KeSen Huang’s excellent papers list.
BlockParty 2: Visual Procedural Rigging for Film, TV, and Games
BSSRDF Importance Sampling
Creating a Nimble New Curriculum for Digital Media Artists
Crafting the Vision Effect: An Interactive, ParticleBased Hologram for “Epic”
Coded Exposure HDR LightField Video Recording
Discrete Texture Design Using a Programmable Approach
Driving HighResolution Facial Blendshapes with Video Performance
Hair Growth by Means of Sparse Modeling and Advection
Imperfect Voxelized Shadow Volumes
Incendiary Reflection: Evoking Emotion Through Deformed Facial Feedback
Interactive Indirect Lighting Computed in the Cloud
Jack’s Frost: Controllable Magic Frost Simulations for “Rise of the Guardians”
Lighting Technology of “The Last of Us”
NearEye Light Field Displays
Oz: The Great and Volumetric
PencilTracing Mirage: Principle and its Evaluation
Rendering Fur for “Life of Pi”
ScreenSpace Curvature for ProductionQuality Rendering and Compositing
Sketchbased Pipeline for Mass Customization
SubPixel Shadow Mapping
Survey and Evaluation of Tone Mapping Operators for HDR Video
Intel
NVIDIA Visual Computing Theater
NVIDIA Tech Talks
Challenges With HighQuality Mobile Graphics
Graphics on the Go
Make Mobile Apps Quickly
Multichannel Acoustic Data Transmission to Adhoc Mobile Phone Arrays
Unity: The Chase  Pushing the Limits of Modern Mobile GPU (video)
Adjusting the Disparity of Stereoscopic 3D Media in Post Production
Advanced Interfaces to Stem the Data Deluge in Mixed Reality…
Affective Music Recommendation System Using Input Images (video)
Celestia: A Vocal Interaction Music Game
Click & Draw Selection
Coded Exposure HDR LightField Video Recording
Conducting and Performing Virtual Orchestra
CostBased Workload Balancing for Ray Tracing on MultiGPU Systems
Digital Ira: Creating a RealTime Photoreal Digital Actor
Enchanted Scissors: A Scissor Interface for Support in Cutting and Interactive Fabrication
Fast and Accurate Distance, Penetration, and Collision Queries…
Fine Water on Coarse Grids
GPU Ray Tracing with Rayforce
Hierarchical Volumetric Object Representations for Digital Fabrication (thesis)
Humanlike Behavior Model With Probabilistic Intention
ime TOUCH: Detecting Human Touch Interaction
Interactive Card Weaving Design and Construction
LearningBased Compression for RealTime Rendering of Surface Light Fields
LightCluster  Clustering Lights to Accelerate Shadow Computation
Musical Flocks
NearField Illumination for Mixed Reality With Delta Radiance Fields
Nutty Tracks  Symbolic Animation Pipeline for Expressive Robotics
Photorealistic Inner Mouth Expression in Speech Animation (video)
Pencil Tracing Mirage: Principle and its Evaluation
PhysPix: Instantaneous RigidBody Simulation of Rasters
Polka Dot – The Garden of Water Spirits
Perch on My Arm!: A Haptic Device That Presents Weight and a Sense of Being Grabbed
Practical 3D+2D TV Displays
RealTime Dust Rendering by Parametric Shell Texture Synthesis (video)
Reflective, Deformable, Colloidal Display: A WaterfallBased Colloidal Membrane…
Specular LobeAware UpSampling Based on Spherical Gaussians
SplineGrip  An EightDegreesofFreedom Flexible Haptic Sculpting Tool
The Hand as a Shading Probe
Toward Accurate and Efficient OrderIndependent Transparency
Tracking Magnetics Above Portable Displays
“Tworlds”: Twirled Worlds for Multimodal “Padiddle” Spinning
Unsupervised Cell Identification on Multidimensional XRay Fluorescence Datasets
Visualizing the Flow of Users on a Wireless Network
Visualizing Urban Mobility
AGATHE: A Tool for Personalized Rehabilitation of Cognitive Functions
AIREAL: Tactile Gaming Experiences in Free Air
An Autostereoscopic Projector Array Optimized for 3D Facial Display
AquaTop Display: A True ”Immersive” Water Display System
EMY: FullBody Exoskeleton
IllumiRoom: Peripheral Projected Illusions for Interactive Experiences
Incendiary Reflection: Evoking Emotion Through Deformed Facial Feedback
LightinFlight: Transient Imaging Using Photonic Mixer Devices
PAPILLON: Expressive Eyes for Interactive Characters
TransWall (second video)
Camera
COLLADA
OpenCL
OpenGL
REST 3D
WebCL
Marc Davis Lecture Series: Giants’ First Steps (via Eric Haines)
As you likely know, modern GPUs shade triangles in blocks of 2x2 pixels, or quads. Consequently, redundant processing can happen along the edges where there’s partial coverage, since only some of the pixels will end up contributing to the final image. Normally this isn’t a problem, but – depending on the complexity of the pixel shader – it can significantly increase, or even dominate, the cost of rendering meshes with lots of very small or thin triangles.
Figure 1: Quad overshading, the silent performance killer
For more information, see Fabian Giesen’s post, plus his excellent series in general.
It’s hardly surprising, then, that IHVs have been advising for years to avoid triangles smaller than a certain size, but that’s somewhat at odds with game developers – artists in particular – wanting to increase visual fidelity and believability, through greater surface detail, smoother silhouettes, more complex shading, etc. (As a 3D programmer, part of my job involves the thrill of being stuck in the middle of these kinds of arguments!)
Traditionally, mesh LODs have helped to keep triangle density in check. More recently, deferred rendering methods have sidestepped a large chunk of the redundant shading work, by writing out surface attributes and then processing lighting more coherently via volumes or tiles. However, these are by no means definitive solutions, and nascent techniques such as DX11 tessellation and tilebased forward shading not only challenge the status quo, but also bring new relevancy to the problem of quad shading overhead.
Knowing about this issue is one thing, but, as they say: seeing is believing. In a previous article, I showed how to display hiz and quad overshading on Xbox 360, via some plaformspecific tricks. That’s all well and good, but it would be great to have the same sort of visualisation on PC, built into the game editor. It would also be helpful to have some overall stats on shading efficiency, without having to link against a library (GPUPerfAPI, PerfKit) or run a separate tool.
There are several ways of reaching these modest goals, which I’ll cover next. What I’ve settled on so far is admittedly a hack: a compromise between efficiency, memory usage, correctness and simplicity. Still, it fulfils my needs so far and I hope you find it useful as well.
First, let’s restate the problem: what we want, essentially, is to count up the number of times we shade a given screen quad. The trick is to only count each shading quad once.
The way I achieved this on Xbox 360 hinged on knowing whether a given pixel was ‘alive’ or not, and then only accumulating overdraw for the first live pixel in each shading quad. As far as I’m aware, there’s no official way of detemining this on PC through standard graphics APIs, but some features of DX11 – namely Unordered Access Views (UAVs) and atomic operations – will allow us to arrive at the same result via a different route.
What I was after was an implementation that was as simple as before, involving three steps:
A straightforward, safe option is to gather a list of triangles per screen quad, filtering by ID (a combination of SV_PrimitiveID
and object ID). This filtering can be performed during the overdraw pass or as a postprocess.
What’s unsatisfying with this approach is that it involves budgeting memory for the worst case, or accepting an upper bound on displayable overdraw. Whilst I can imagine that a multipass variation is doable, that just adds unwanted complexity to what ought to be a simple debug rendering mode.
So, in order to overcome these limitations, I started toying around with something a lot simpler:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

The intent here is to use a UAV to keep track of the current triangle per screen quad. Through InterlockedExchange
, we both update the ID and use the previous state to determine if we’re the first pixel to write this ID (prevID != id
). If so, we increment an overdraw counter in a second UAV. This is similar in the spirit to the Xbox 360 version, in that we’re selecting one of the live pixels in a shading quad to update the overdraw count. Finally, we can display the results in a fullscreen pass:
1 2 3 4 5 6 7 

On paper, this appears to elegantly avoid the storage and complexity of the previous approach. Alas, it relies on one major, dubious assumption: that quads are shaded sequentially! In reality, GPUs process pixels in larger batches of warps/wavefronts and there’s no guarantee that UAV operations are ordered between quads – hence the name: unordered. So, during the shading of pixels in a quad for one triangle, it’s perfectly possible for another unruly triangle to stomp over the quad ID and break the whole process!
Fortunately, we can get around this issue with a few modifications. The basic idea here is to loop and use InterlockedCompareExchange
to attempt to lock the screen quad:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 

This leads to three outcomes for unprocessed pixels:
prevID == unlockedID
, then the pixel holds the lock for its shading quadprevID == id
, another pixel in the shading quad holds the lockIn the first case we mark the pixel as processed and increment a lock counter. After an additional iteration, we release the lock. This ensures that pixels with the same ID see the state of the lock (second case), so that they can be filtered out. Finally, pixels that held the lock update the quad overdraw.
Ideally we’d loop until the pixel has been tagged as processed, but I haven’t had success with current NVIDIA drivers and UAVdependent flow control, i.e.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

As a workaround, I’ve simply set the iteration count to a number that works well in practice across NVIDIA and AMD GPUs (those that I’ve had a chance to test, anyway).
Now that we have a working system in place, it’s easy to gather other stats. For instance, although we can’t determine directly if a pixel is alive, we can count the number of live pixels in each shading quad, since Interlocked*
operations are masked out for dead pixels. With this, we can tally up the number of quads with 1 to 4 live pixels in yet another UAV:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 

To my surprise, incrementing a 4wide UAV didn’t lead to a massive slowdown here. That said, one can certainly use a number of buckets for intermediate results (indexed by the lower bits of the screen position, for instance), if this proves to be a problem.
With these numbers, it’s trivial to add a pie chart to the final pass:
Figure 2: Quad overdraw (dark blue = 1x, to green = 4x),
and proportion of live pixels per quad (yellow = 4, to dark red = 1)
For your convenience, I’ve packaged things up into a simple demo. Please let me know if you hit any compatibility issues, or come up with any enhancements.
]]>As with last year, I’m gathering links to SIGGRAPH content, to complement KeSen Huang’s invaluable Technical Papers list. Please let me know if you have anything to add to the list.
COLLADA (video)
OpenCL (video)
OpenGL (video) (via @jbaert)
OpenGL ES (video)
OpenSceneGraph (starting to appear)
Teaching OpenGL in a PostDeprecation World
WebGL (video)
Advanced (Quasi) Monte Carlo Methods for Image Synthesis (via @sjb3d)
Advances in RealTime Rendering in Games
Applying Color Theory to Digital Media and Visualization (video)
Beyond Programmable Shading
Cinematic Color: From Your Monitor to the Big Screen (draft notes)
Color Transfer
Computational Displays
Computational Plenoptic Imaging
DataDriven Simulation Methods in Computer Graphics: Cloth, Tissue, and Faces
Efficient RealTime Shadows
 Shadows in Games: Practical Considerations (via @mickaelgilabert)
FEM Simulation of 3D Deformable Solids: A Practitioner’s Guide to Theory, Discretization, and Model Reduction
Fundamentals Seminar
GPU Shaders for OpenGL 4.x
Graphics Programming on the Web
Introduction to Modern OpenGL
Optimizing Realistic Rendering With ManyLight Methods
Practical Physically Based Shading in Film and Game Production
Principles of Animation Physics
State of the Art in Photon Density Estimation (via Jens Fursund)
The Hitchhiker’s Guide to the Galaxy of Mathematical Tools for Shape Analysis
Virtual Texuring in Software and Hardware
Augmented Reflection of Reality
Botanicus Interacticus: Interactive Plants Technology
Chilly Chair: Facilitating an Emotional Feeling With Artificial Piloerection
ClaytricSurface: An Interactive Surface With Dynamic Softness Control Capability
Gosen: A Handwritten Notation Interface for Musical Performance and Learning Music
Interactive LightField Painting
JUKE Cylinder: A Device to Metamorphose Hands to a Musical Instrument
Mood Meter: LargeScale and LongTerm Smile Monitoring System
PossessedHand
REVEL: A Tactile Feedback Technology for Augmented Reality
ShaderPrinter
SplashDisplay: Volumetric Projecting Using Projectile Beads
Stuffed Toys Alive! Cuddly Robots From a Fantasy World
TECHTILE Toolkit (video)
TELESAR V: TELExistence Surrogate Anthropomorphic Robot
Tensor Displays: Compressive LightField Synthesis Using Multilayer Displays With Directional Backlighting
Intel
NVIDIA (streamable videos)
Advancing Dynamic Lighting on Mobile (via @palgorithm, Eric Haines)
Auto(mobile): Mobile Visual Interfaces for the Road
Mobile Augmented Reality in Advertising: the TineMelk AR App  A Case Study (slides, video) (via Kim Baumann Larsen)
Unity: iOS and Android  CrossPlatform Challenges and Solutions (via @__Rej__)
BRDF Explorer
OpenSubdiv
OpenVDB
8D Display
A Biologically Inspired Latent Space for Gait Parameterization
A CollisionDetection Method for HighResolution Objects Using Tessellation Unit on GPU (video)
A Colloidal Display: Membrane Screen That Combines Transparency, BRDF, and 3D Volume (videos)
Base Mesh Construction Using Global Parametrization
CurveThis: A Tool to Create Controllable Massive Crawling (video)
Direct Spatial Interactions With SeeThrough 3D Desktop (project page, video)
Distance Aware Ray Tracing for Curves
EasyToUse Authoring System for Noh (Japanese Traditional) Dance Animation
Estimating Diffusion Parameters From Polarized Spherical Gradient Illumination (via Naty Hoffman)
Estimating Specular Normals From Spherical Stokes Reflectance Fields (via Naty Hoffman)
Fast MultiImageBased Photon Tracing With GridBased Gathering
Focus Tracking for Cinematography (video)
GaussSketch: AddOn Magnetic Sensing for Natural Sketching on Smartphones
GeigerCam: Measuring Radioactivity With Webcams
Graphic Narratives: Generative Book Covers (background)
HighDetail MarkerBased 3D Reconstruction by Enforcing Multiview Constraints
How to Draw Illustrative Figures?
ImageBased Smartphone Interaction With Large High Resolution Displays
Interactive Generation of (Paleontological) Scientific Illustrations From 3D Models (via @numb3r23)
Lifelike Interactive Characters With Behavior Trees for Social Territorial Intelligence
LightField Supported Fast Volume Rendering
Magic Pot: Interactive Metamorphosis of the Perceived Shape (video)
Mimicat: Face Input Interface Supporting Animatronics Costume Performer’s Facial Expression (video)
NonRigid Shape Correspondence and Description Using Geodesic Field Estimate Distribution (video)
Panorama LightField Imaging
Perceptually Optimized Content Remapping for Automultiscopic Displays
Pixelating Vector Line Art
Radiance Filtering for Interactive Path Tracing
Randomized Coherent Sampling for Reducing Perceptual Rendering Error
RealTime HDR Video Reconstruction for MultiSensor Systems
Shadow++: A System for Generating Artificial Shadows Based on Object Movement
Technoculture of Handcraft: Fine Gesture Recognition for Haute Couture Skills Preservation and Transfer in Italy
Towards A Transparent, Flexible, Scalable, and Disposable Image Sensor
Typeface Styling with Ramp Responses
Video Retrieval Based on UserSpecified Deformation
DIYLILCNC v2.0 (project page)
Interactive Modeling with Mesh Surfaces (via Ryan Schmidt)
Loosely Fitted Design Synthesizer [LFDS] (project page, media)
RhythmSynthesis (website, thesis)
SketchGraph: Gestural Data Input for Mobile Tablet Devices (video)
Vignette: A StylePreserving Sketching Tool for PenandInk Illustration (project page, videos)
MAXScript for Artists
VFX for Games: Particle Effects
VFX for Games: PreBaked Destruction
3D Diff: An Interactive Approach to Mesh Differencing and Conflict Resolution
A SingleShot Light Probe
A World of Voxels: The Volumetric Effects of “Ice Age: Continental Drift”
Amorphous: An OpenGL Sparse Volume Renderer
AudioCloning: Extracting Material Fingerprints from Example Audio Recording
Building Interior MultiPanorama Experiences at Scale
CageR: From 3D Performance Capture to CageBased Representation
Cloud Modeling And Rendering for “Puss In Boots”
CoDAC: Compressive Depth Acquisition Using a Single TimeResolved Sensor
Creating Vast Game Worlds  Experiences From Avalanche Studios (via @__Humus__)
dRig: An ArtistFriendly, ObjectOriented Approach to Rig Building (via Naty)
Efficient and Seamless Volumetric Fracturing
Estimating Diffusion Parameters From Polarized Spherical Gradient Illumination
Fast Generation of Directional Occlusion Volumes
HeroQuality Crowds in “Madagascar 3: Europe’s Most Wanted”
Importance Sampling for Hair Scattering
Intelligent Brush Strokes
KinÊtre: Animating the World With the Human Body (project page, videos)
LibEE: A Multithreaded Dependency Graph for Character Animation
Local ImageBased Lighting With ParallaxCorrected Cubemaps (via @SebLagarde)
MeasurementBased Synthesis of Facial Microgeometry
Magic Beanstalk Ride in “Puss In Boots”
Multiresolution Radiosity Caching for Global Illumination in Movies
Panorama LightField Imaging
PointBased Global Illumination Directional Importance Mapping
Progressive Lightcuts for GPU
Relativistic Ultrafast Rendering Using TimeResolved Imaging
Rich Intrinsic Image Decomposition of Outdoor Scenes From Multiple Views (TVCG paper)
Screen Space Decals in Warhammer 40,000: Space Marine (via Tuan Kuranes, @BlindRenderer)
SGRT: A Scalable Mobile GPU Architecture Based on Ray Tracing
Tiled and Clustered Forward Shading
VolumeAware Extinction Mapping (alternative link, via Naty)
Vortex of Awesomeness
I’ve added a new article, Blending in Detail, written together with Colin BarréBrisebois, on the topic of blending normal maps. We go through various techniques that are out there, as well as a neat alternative (“Reoriented Normal Mapping”) from Colin that I helped to optimise.
This is by no means a complete analysis – particularly as we focus on detail mapping – so we might return to the subject at a later date and tie up some loose ends. In the meantime, I hope you find the article useful. Please let us know in the comments!
]]>Existing post URLs remain the same, but if you’re one of the illustrious few who subscribe to the blog via RSS, I’m guessing that you’ll need to change over to the new feed. Update: I’m redirecting the old feed URL now, so everything should be back to normal! Speaking of RSS, as I’m now using MathJax for $\LaTeX$, it appears that I’ll need to implement a fallback there, in addition to tracking down a rendering issue with Chrome. Please let me know if you spot any other oddities.
I first tinkered with SH wrap shading (as described in part 1) for Splinter Cell: Conviction, since we were using a couple of models [1][2] for some characterspecific materials. Unfortunately, due to the way that indirect character lighting was performed, it would have required additional memory that we couldn’t really justify at that point in development. Consequently, this work was left on the cutting room floor and I only got as far as testing out Green’s model [1].
Recently, however, I spotted that Irradiance Rigs [3] covers similar ground. At the very end of the short paper, they briefly present a generalisation of Valve’s Half Lambert model [2] and the SH convolution terms for the first three bands:
This tidily combines the tunability of [1] with the tighter falloff of [2], albeit at the cost of a few extra instructions in the case of direct lighting. It’s not energyconserving though, so for kicks I went through the maths – see appendix – and made the necessary adjustments:
I would suggest this as a good workout if your calculus skills are a little on the rusty side; think of it as a muchneeded trip to the maths gym: sure it’s going to hurt at first, but you’ll feel better afterwards!
The same authors have since written a more indepth paper, Wrap Shading [4], which Derek Nowrouzezahrai has kindly made available here. I recommend checking it out, since there’s some nice analysis and plenty of background information. One notable insight is that their model is perfectly represented by 3rdorder SH when $a = 1$ (i.e, Half Lambert). This becomes clear when you consider that the model is effectively unclamped in that case, so appropriate scaling of the constant, linear and quadratic bands () will match the function:
A similar observation can be made with Green’s model: it’s perfectly represented by 2ndorder SH when $a = 1$.
But wait, at the end of the part 1, didn’t I promise that there would be a discussion of optimisation in this post? You’re quite right. Well, it just so happens that a snippet of reference shader code from this last paper makes for a neat little case study on improving shader performance.
This is pretty much the reference implementation for generating the normalised convolution terms of their generalised model:
1 2 3 4 5 6 7 8 9 10 11 12 13 

The only thing that I’ve changed – beyond adding calling code – is to pass in the wrap parameter fA
from the vertex shader. It was previously a usersupplied constant, which doesn’t make for a particularly credible example, since in that case all of the maths could simply be moved to the CPU and performed just the once!
Note that there’s been some attempt to pull out common terms, particularly for the final component, where instead of fA*fA  2*fA + 3
(see $\mathbf{f}$) we now have fA*fA  t.x + 5
.
Without further ado, let’s see how this stacks up in terms of ps_3_0
instructions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

Ouch! 16 is fairly substantial, but perhaps not all that surprising going by the HLSL. Since this is deviceindependent assembly, I decided to check the ALU count on Xbox 360 for comparison. In that case it’s a somewhat more reasonable 10 operations, because 5 scalar ops get dualissued with vector ops. So, in summary, we have:
DX9: 16, X360: 10(+5) ALU ops
Immediately, a simple but very effective change we can make is to cancel through by the normalisation term, which leaves us with $\mathbf{\hat{f}}$ directly:
1 2 3 4 5 

Don’t expect the compiler to do intelligent optimisations like this; constant folding yes, factoring sometimes, sophisticated symbolic manipulation? Good luck!
For instance, even seemingly ‘obvious’ opportunities like (a/b)/(b/a)
will go unnoticed by FXC. This isn’t down to the compiler trying to maintain specialcase behaviour such as divide by zero either, because it will happily replace a/a
with 1
in the absence of any knowledge about the value of a.
Apologies if that was already perfectly clear and all I’ve done is insult your intelligence, but I’ve seen some people blithely leave everything up to the compiler and not scrutinise what it’s generating. Of course, highlevel algorithmic optimisations are hugely important as well, but so is this lowerlevel stuff when a shader is being executed for millions of pixels!
Just look at what this small amount of effort has netted us:
DX9: 10, X360: 5(+3) ALU ops
Next we can factor fA*fA  2*fA + 3
again – this time as (fA + 1)(fA + 3)  6*fA
– to reduce the numerator of the third term to a single multiplyadd:
1 2 3 4 5 6 

I’ve also taken the opportunity to manually vectorise the addition of fA
, plus a subsequent pair of multiplications between resulting terms. In fact, the compiler does this anyway, as it’s relatively good at vectorising code. Still, one shouldn’t assume that it will always get things right!
Whether there’s a gain or not, manual vectorisation – which is often quick to do – makes it easier to sanity check the output assembly. Just scanning through, you might expect add, mul, mov, rcp, mul, mad, rcp, mul
and you’d be pretty much spot on.
So, for DX9 we’ve reduced the op count by 2, but what about Xbox 360? Here, we’ve only succeeded in shaving off one paired scalar op. However, this may turn into a real gain once the function is part of a larger shader.
DX9: 8, X360: 5(+2) ALU ops
This next trick involves rescaling so that the second term becomes 1/t.y
, or a single rcp
:
1 2 3 4 5 6 7 8 

You might wonder why I’m using an external constant here. Well, it turns out that FXC will misoptimise when it knows the values. Bad compiler! Again, there’s one less paired scalar op on Xbox 360:
DX9: 7, X360: 5(+1) ALU ops
Rather than factoring terms, we could have expanded $\mathbf{\hat{f}}$ instead:
Or in code:
1 2 3 4 5 6 7 8 9 10 11 

This is a win for ps_3_0
but not for Xbox 360, as it removes the opportunity for pairing. It’s possible that some clever variation could fix this, but it doesn’t matter because we haven’t exhausted our optimisation options…
DX9: 6, X360: 6 ALU ops
There are potentially significant gains to be had from numerical fitting, so it’s worth taking the time familiarise yourself with the various techniques, maths packages and libraries out there.
In this instance, I’m performing a cubic fit – i.e. $ax^3 + bx^2 + cx + d$ – for the 2nd and 3rd bands. Polynomials are attractive for performance because they can be efficiently evaluated as a series of mad
instructions when written in Horner form: $x(x(ax + b) + c) + d$
With careful vectorisation, this collapses to the following:
1 2 3 4 5 6 7 8 9 10 11 12 

Xbox 360 does all this in one less operation because placing 1
into r.x
can be achieved with a destination register modifier:
DX9: 4, X360: 3 ALU ops
I could present graphs showing how the cubic approximations fare, but take it from me that they are extemely close. In fact, we can arguably drop down to a quadratic fit and save a further mad
in the process. This is still acceptable:
Figure 1: Comparison between original and quadratic fit for 2nd and 3rd bands (left, right)
In both cases – cubic and quadratic – I’ve actually constrained the fitting process so that the curves go through the endpoints. This reduces the worst case error a little and maintains the nice property of exactness when $a = 1$. Of course, something has to give and so the average error is a little higher.
In practice, this quadratic approximation has little effect on the end result. When lighting with a single directional source – a worstcase scenario – the difference is slight and far less significant than the error that comes from using 3rdorder SH in the first place.
Here’s the code for the quadratic version:
1 2 3 4 5 6 7 8 9 10 11 

DX9: 3, X360: 2 ALU ops
And yet, we’re still not done! The DX9 figure suggests that we might pay the instruction cost of moving 1
into r.x
with some GPUs, and although it could go away when the terms are actually used, it would be cute if we could get rid of it just in case.
Notice that the two curves are monotonically decreasing and within the range [0, 1]. If we negate the intermediate result of the first mad
, saturate and then negate again, there will be no overall effect. By doing this, we can take r.x
along for the ride and force it to 0
through one of the negative constants, then add 1
via the final mad
:
1 2 3 4 5 6 7 8 9 10 

Because saturation and negation are typically free register modifiers, we save an operation:
DX9: 2, X360: 2 ALU ops
The Wrap Shading paper doesn’t include a normalised version of Green’s model (see part 1), so here’s code for that too:
1 2 3 4 5 6 7 8 9 10 

DX9: 2, X360: 2 ALU ops
Here’s a WebGL sample that encapsulates this miniseries on wrap shading.
In conclusion, shader optimisation is critical for video game rendering, so you shouldn’t defer to the compiler. To quote Michael Abrash: “The best optimizer is between your ears”. Don’t forget it, train it!
[1] Green, S., “RealTime Approximations to Subsurface Scattering”, GPU Gems, 2004.
[2] Mitchell, J., McTaggart, G., Green, C., “Shading in Valve’s Source Engine”, Advanced RealTime Rendering in 3D Graphics and Games, SIGGRAPH Course, 2006.
[3] Yuan, H., Nowrouzezahrai, D., Sloan, P.P., “Irradiance Rigs”, SIGGRAPH Talk, 2010.
[4] Sloan, P.P., Nowrouzezahrai, D., Yuan, H., “Wrap Shading”, Journal of Graphics, GPU, and Game Tools, 15:4, 252259, 2011.
Normalisation factor for generalised Half Lambert:
]]>Wrap shading has its uses when more accurate techniques are too expensive, or simply to achieve a certain aesthetic, but common models [1][2] have some deficiencies out of the box. Neither of these is energy conserving and they don’t really play well with shadows either. On top of that, Valve’s Half Lambert model [2] has a fixed amount of wrap, so it can’t be tuned to suit different materials (or, perhaps, to limit shadow oddities). I’ll come back to the point about flexibility in part 2, but first I’d like to discuss another factor that’s easily overlooked: environmental lighting.
If you’re set on using some form of wrap shading, then it’s not just a matter of applying it to your standard direct sources – directional, point and spot lights, for instance – it ought to be carried through to environmental lighting as well! Naturally, the importance of this depends on how strong and directional your secondary lighting is; obviously if you’re only using constant ambient then there’s no problem, but these days it’s fairly common to encode indirect lighting in Spherical Harmonics (SH) [3] and perhaps some additional lights as well. Fortunately, wrap shading in the context of SH lighting is easy, and much like energy conservation it’s a relatively cheap or free addition, so it’s worth considering even if the results prove to be subtle.
So, how do we accomplish this? Well, that’s best explained with a quick recap. If you recall, for diffuse SH lighting, we first project the lighting environment, $f$, into SH:
(Of course, in practice, this is commonly performed offline as a numerical integration over a cube map.)
We then convolve this with the SHprojected cosine lobe, $h$, like so:
Next, we can evaluate the lighting (more specifically, irradiance) for a given surface direction, $s$:
Finally, a division by $\pi$ gives us outgoing (or exit) radiance. Personally, I find it convenient to roll these extra terms into $h$ itself. The nice thing about this is that the convolution kernel then boils down (via analytical integration) to easy to remember values for the first three SH bands:
For further details, you can find a complete and approachable account in [4].
Now, back to wrap: adjusting things for our shading model of choice is simply a matter of replacing $\hat{h}$. Let’s try this for the simple wrap model from Green [1] that Steve already discussed:
From Steve’s post, we know that we need an additional normalisation factor of $1 + w$ for energy conservation, so the full formula for our new convolution, which I’ll call $\hat{g}$, is:
You can go through a similar process of analytical integration as Steve did, only now with the additional SH basis terms $y_{l}^{0}$, or if you’re lazy like me, you can throw the formula at a package like Mathematica. Either way, once you’re done, you’ll arrive at the following (or something equivalent):
We can clearly see that this reduces to the cosine convolution kernel, $\hat{h}$, when $w = 0$. In visual terms, the effect of changing $w$ is evident with a single directional light, as you would expect:
Figure 1: Variable wrap shading (0, 0.5, 1) with a single directional light in SH
On the other hand, the difference is a lot subtler with a more uniform lighting environment:
Figure 2: Variable wrap shading (0, 0.5, 1) with a general SH environment (Grace Cathedral)
Okay, confession time: this post wasn’t just about wrap shading, since it also serves as a foundation for future posts. For instance, part 2 will conveniently segue into shader optimisations, which is a topic I’ve been planning to write – or, perhaps more accurately, rant – about in general, and I’ll also be returning to Spherical Harmonics down the line.
[1] Green, S., “RealTime Approximations to Subsurface Scattering”, GPU Gems, 2004.
[2] Mitchell, J., McTaggart, G., Green, C., “Shading in Valve’s Source Engine”, Advanced RealTime Rendering in 3D Graphics and Games, SIGGRAPH Course, 2006.
[3] Green, R., “Spherical Harmonic Lighting: The Gritty Details”, GDC 2003.
[4] Sloan, P.P, “Efficient Evaluation of Irradiance Environment Maps”, ShaderX^2: Shader Programming Tips and Tricks with DirectX 9.0, 2003.
Figure 1: Major axes for original (left), swizzle (mid) and perpendicular (right) vectors
Two months ago, there was a question (and subsequent discussion) on Twitter as to how to go about generating a perpendicular unit vector, preferably without branching. It seemed about time that I finally post something more complete on the subject, since there are various ways to go about doing this, as well as a few traps awaiting the unwary programmer.
Here are four options with various tradeoffs. If you happen to know of any others, by all means let me know and I’ll update this post.
Note: in all of the following approaches, normalisation is left as an optional postprocessing step.
A quick hack involves taking the cross product of the original unit vector – let’s call it $\mathbf{u}(x, y, z)$ – with a fixed ‘up’ axis, e.g. $(0, 1, 0)$, and then normalising. A problem here is that if the two vectors are very close – or equally, pointing directly away from each other – then the result will be a degenerate vector. However, it’s still a reasonable approach in the context of a camera, if the view direction can be restricted to guard against this. A general solution in this situation is to fall back to an alternative axis:
1 2 3 4 5 6 7 8 9 

Listing 1: A quick way to generate a perpendicular vector
In a neat little Journal of Graphics Tools paper [1], Hughes and Möller proposed a more systematic approach to computing a perpendicular vector. Here’s the heart of it:
Or, as the paper also states: “Take the smallest entry (in absolute value) of $\mathbf{u}$ and set it to zero; swap the other two entries and negate the first of them”.
Figure 2: Distribution of v over the sphere
However, there’s a problem with this as written: it doesn’t handle cases where multiple components are the smallest, such as $(0, 0, 1)$! I hit this a few years back when I needed to generate an orthonormal basis for some offline geometry processing, and it’s easily remedied by replacing $<$ with $\leq$. Here’s a corrected version in code form:
1 2 3 4 5 6 7 8 9 10 11 12 

Listing 2: HughesMöller perpendicular vector generation
More recently, Michael M. Stark suggested some improvements to the HughesMöller approach [2]. Firstly, his choice of permuted vectors is almost the same – differing only in signs – but even easier to remember:
In plain English: the perpendicular vector $\mathbf{\bar{v}}$ is found by taking the cross product of $\mathbf{u}$ with the axis of its smallest component. (Note: the same care is needed when multiple components are the smallest.)
Figure 3 visualises this intermediate ‘swizzle’ vector over the sphere:
Figure 3: Intermediate swizzle vector
Secondly, Michael also provides a branchfree implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Listing 3: Branchfree perpendicular vector generation
I know what you’re thinking, there’s a problem here too: it should be zm = 1^(xm  ym)
! Although still robust, the effect of this error is that the nice property of even, symmetrical distribution over the sphere is lost:
Figure 4: Broken symmetry in the perpendicular vector
(that’s 7 years bad luck!)
Finally, another branchfree solution is provided in the form of XMVector3Orthogonal
, which is part of the XNAMath library. Here’s the actual code taken from the DirectX SDK (June 2010):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 

Listing 4: XNAMath’s semivectorised method
Let me save you the trouble of parsing this fully (or consider it an exercise for later); if you boil it down, what you’re effectively left with is:
I’ve failed, thus far, to pinpoint the origin of or thought process behind this approach. That said, some insight can be gained from visualising the resulting vectors:
Figure 5: Covering one’s axis
Their maximum component is in the x axis, except close to the +/ve x poles. Essentially, Microsoft’s solution ensures robustness without concern for distribution, much like the initial ‘quick’ approach.
I haven’t benchmarked these implementations, since in cases where I’ve needed to generate perpendicular vectors, absolute speed wasn’t important or the call frequency was vanishingly small. Even in performancecritical situations, it really depends on what properties/restrictions you can live with and your target architecture(s). Still, I can’t help but think that XMVector3Orthogonal
is doing a little bit more than it needs to, so maybe there’s cause to revisit this subject at a later date.
I hope you’ve learnt something about generating perpendicular vectors, or that I’ve at least made you aware of some of the minor issues in previous work on the subject. On that note, if you spot any new errors here, please let me know!
[1] Hughes, J. F., Möller, T., “Building an Orthonormal Basis from a Unit Vector”, Journal of Graphics Tools 4:4 (1999), 3335.
[2] Stark, M. M., “Efficient Construction of Perpendicular Vectors without Branching”, Journal of Graphics Tools 14:1 (2009), 5561.
The first of these was originally published in GPU Pro 2. Unfortunately, I missed some errors that crept into the typeset version, so I was pleased to finally correct those and I took the opportunity to rework a few sentences for greater clarity as well. Now that it’s online, I’ll also be able to refer directly to certain sections in followup blog posts on the subject.
The second took the form of a journal entry for the Microsoft Game Developer Network, which went up in the spring. It may have flown under your radar, as I’ve since spoken to a few developers who hadn’t seen it, yet were keen to have such a tool in their engine. For NDA reasons, I can’t go into all of the implementation details here, so think of it as a ‘graphical appetiser’.
In a way, the two topics are related: the primary goal of a visibility system is to efficiently remove parts of the world that can’t be seen from a given viewpoint, whereas the purpose of a debug overshading mode is to directly visualise pixel shader work, some of which can likewise have zero contribution to the final image.
I think it’s also fair to say that keeping both forms of redundancy in check is a critical part of optimising the rendering performance of most AAA titles. For that reason, I hope you find these articles useful, and as always, please let me know what you think!
]]>