<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Self Shadow]]></title>
  <link href="http://blog.selfshadow.com/atom.xml" rel="self"/>
  <link href="http://blog.selfshadow.com/"/>
  <icon>http://blog.selfshadow.com/favicon.png</icon>
  <updated>2013-05-13T09:52:02-04:00</updated>
  <id>http://blog.selfshadow.com/</id>
  <author>
    <name><![CDATA[Stephen Hill]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Counting Quads]]></title>
    <link href="http://blog.selfshadow.com/2012/11/12/counting-quads/"/>
    <updated>2012-11-12T22:59:00-05:00</updated>
    <id>http://blog.selfshadow.com/2012/11/12/counting-quads</id>
    <content type="html"><![CDATA[<p><em>This is a DX11 followup to an <a href="http://blog.selfshadow.com/publications/overdraw-in-overdrive/">earlier article</a> on quad &#8216;overshading&#8217;. If you&#8217;ve already read that, then feel free to skip to the <a href="#meat">meat of this post</a>.</em></p>

<h2>Recount</h2>

<p>As you likely know, modern GPUs shade triangles in blocks of 2x2 pixels, or <em>quads</em>. Consequently, redundant processing can happen along the edges where there&#8217;s partial coverage, since only some of the pixels will end up contributing to the final image. Normally this isn&#8217;t a problem, but – depending on the complexity of the pixel shader – it can significantly increase, or even dominate, the cost of rendering meshes with lots of very small or thin triangles.</p>

<p><img class="center" src="http://blog.selfshadow.com/images/counting-quads/figure_1.png"></p>

<p style="text-align: center;"><em>Figure 1: Quad overshading, the silent performance killer</em></p>


<p><em>For more information, see Fabian Giesen&#8217;s <a href="http://fgiesen.wordpress.com/2011/07/10/a-trip-through-the-graphics-pipeline-2011-part-8/">post</a>, plus his <a href="http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/">excellent series</a> in general.</em></p>

<p>It&#8217;s hardly surprising, then, that IHVs have been advising for years to avoid triangles smaller than a certain size, but that&#8217;s somewhat at odds with game developers – artists in particular – wanting to increase visual fidelity and believability, through greater surface detail, smoother silhouettes, more complex shading, etc. (As a 3D programmer, part of my job involves the thrill of being stuck in the middle of these kinds of arguments!)</p>

<p>Traditionally, mesh LODs have helped to keep triangle density in check. More recently, deferred rendering methods have sidestepped a large chunk of the redundant shading work, by writing out surface attributes and then processing lighting more coherently via volumes or tiles. However, these are by no means definitive solutions, and nascent techniques such as DX11 tessellation and <a href="http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/"><em>tile-based forward shading</em></a> not only challenge the status quo, but also bring new relevancy to the problem of quad shading overhead.</p>

<p>Knowing about this issue is one thing, but, as they say: <em>seeing is believing</em>. In a <a href="http://blog.selfshadow.com/publications/overdraw-in-overdrive/">previous article</a>, I showed how to display hi-z and quad overshading on Xbox 360, via some plaform-specific tricks. That&#8217;s all well and good, but it would be great to have the same sort of visualisation on PC, built into the game editor. It would also be helpful to have some overall stats on shading efficiency, without having to link against a library (<a href="http://developer.amd.com/tools/gpu/GPUPerfAPI/Pages/default.aspx">GPUPerfAPI</a>, <a href="https://developer.nvidia.com/nvidia-perfkit">PerfKit</a>) or run a separate tool.</p>

<p>There are several ways of reaching these modest goals, which I&#8217;ll cover next. What I&#8217;ve settled on so far is admittedly a hack: a compromise between efficiency, memory usage, correctness and simplicity. Still, it fulfils my needs so far and I hope you find it useful as well. <!--more--></p>

<h2>Going To Eleven <a id="meat"></a></h2>

<p>First, let&#8217;s restate the problem: what we want, essentially, is to count up the number of times we shade a given <em>screen</em> quad. The trick is to only count each <em>shading</em> quad once.</p>

<p>The way I achieved this on Xbox 360 hinged on knowing whether a given pixel was &#8216;alive&#8217; or not, and then only accumulating overdraw for the first live pixel in each shading quad. As far as I&#8217;m aware, there&#8217;s no official way of detemining this on PC through standard graphics APIs, but some features of DX11 – namely <em>Unordered Access Views</em> (UAVs) and atomic operations – will allow us to arrive at the same result via a different route.</p>

<h4>The right way</h4>

<p>What I was after was an implementation that was as simple as before, involving three steps:</p>

<ul>
<li>Render depth pre-pass (optional; do whatever the regular rendering path does for this)</li>
<li>Render scene (material/lighting passes) with special overdraw shader</li>
<li>Display results</li>
</ul>


<p>A straightforward, safe option is to gather a list of triangles per screen quad, filtering by ID (a combination of <code>SV_PrimitiveID</code> and object ID). This filtering can be performed during the overdraw pass or as a post-process.</p>

<p>What&#8217;s unsatisfying with this approach is that it involves budgeting memory for the worst case, or accepting an upper bound on displayable overdraw.  Whilst I can imagine that a multi-pass variation is doable, that just adds unwanted complexity to what ought to be a simple debug rendering mode.</p>

<h4>The wrong way</h4>

<p>So, in order to overcome these limitations, I started toying around with something a lot simpler:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">primIDUAV</span>   <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u0</span><span class="p">);</span>
</span><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">overdrawUAV</span> <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u1</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="p">[</span><span class="k-Attributes">earlydepthstencil</span><span class="p">]</span>
</span><span class='line'><span class="kt">void</span> <span class="n">OverdrawPS</span><span class="p">(</span><span class="kt">float4</span> <span class="n">vpos</span> <span class="o">:</span> <span class="k-Semantics">SV_Position</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">id</span> <span class="o">:</span> <span class="k-Semantics">SV_PrimitiveID</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">uint2</span> <span class="n">quad</span> <span class="o">=</span> <span class="n">vpos</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="mf">0.5</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">uint</span>  <span class="n">prevID</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="nb">InterlockedExchange</span><span class="p">(</span><span class="n">primIDUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">id</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">!=</span> <span class="n">id</span><span class="p">)</span>
</span><span class='line'>        <span class="nb">InterlockedAdd</span><span class="p">(</span><span class="n">overdrawUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The intent here is to use a UAV to keep track of the current triangle per screen quad. Through <code>InterlockedExchange</code>, we both update the ID and use the previous state to determine if we&#8217;re the first pixel to write this ID (<code>prevID != id</code>). If so, we increment an overdraw counter in a second UAV. This is similar in the spirit to the Xbox 360 version, in that we&#8217;re selecting one of the live pixels in a shading quad to update the overdraw count. Finally, we can display the results in a fullscreen pass:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="k">Texture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">overdrawSRV</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'><span class="kt">float4</span> <span class="n">DisplayPS</span><span class="p">(</span><span class="kt">float4</span> <span class="n">vpos</span> <span class="o">:</span> <span class="k-Semantics">SV_Position</span><span class="p">)</span> <span class="o">:</span> <span class="k-Semantics">SV_Target</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">uint2</span> <span class="n">quad</span> <span class="o">=</span> <span class="n">vpos</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="mf">0.5</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">ToColour</span><span class="p">(</span><span class="n">overdrawSRV</span><span class="p">[</span><span class="n">quad</span><span class="p">]);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>On paper, this appears to elegantly avoid the storage and complexity of the previous approach. Alas, it relies on one major, dubious assumption: that quads are shaded sequentially! In reality, GPUs process pixels in larger batches of <em>warps/wavefronts</em> and there&#8217;s no guarantee that UAV operations are ordered between quads – hence the name: <em>unordered</em>. So, during the shading of pixels in a quad for one triangle, it&#8217;s perfectly possible for another unruly triangle to stomp over the quad ID and break the whole process!</p>

<h4>The cheat&#8217;s way</h4>

<p>Fortunately, we can get around this issue with a few modifications. The basic idea here is to loop and use <code>InterlockedCompareExchange</code> to attempt to <em>lock</em> the screen quad:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">lockUAV</span>     <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u0</span><span class="p">);</span>
</span><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">overdrawUAV</span> <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u1</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="p">[</span><span class="k-Attributes">earlydepthstencil</span><span class="p">]</span>
</span><span class='line'><span class="kt">void</span> <span class="n">OverdrawPS</span><span class="p">(</span><span class="kt">float4</span> <span class="n">vpos</span> <span class="o">:</span> <span class="k-Semantics">SV_Position</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">id</span> <span class="o">:</span> <span class="k-Semantics">SV_PrimitiveID</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">uint2</span> <span class="n">quad</span> <span class="o">=</span> <span class="n">vpos</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="mf">0.5</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">uint</span>  <span class="n">prevId</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">unlockedID</span> <span class="o">=</span> <span class="mh">0xffffffff</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">bool</span> <span class="n">processed</span>  <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">int</span>  <span class="n">lockCount</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">16</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
</span><span class='line'>    <span class="p">{</span>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">processed</span><span class="p">)</span>
</span><span class='line'>            <span class="nb">InterlockedCompareExchange</span><span class="p">(</span><span class="n">lockUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">unlockedID</span><span class="p">,</span> <span class="n">id</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>        <span class="p">[</span><span class="k-Attributes">branch</span><span class="p">]</span>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">==</span> <span class="n">unlockedID</span><span class="p">)</span>
</span><span class='line'>        <span class="p">{</span>
</span><span class='line'>            <span class="c1">// Wait a bit, then unlock for other quads</span>
</span><span class='line'>            <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">lockCount</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span>
</span><span class='line'>                <span class="nb">InterlockedExchange</span><span class="p">(</span><span class="n">lockUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">unlockedID</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>            <span class="n">processed</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>        <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">==</span> <span class="n">id</span><span class="p">)</span>
</span><span class='line'>            <span class="n">processed</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">lockCount</span><span class="p">)</span>
</span><span class='line'>        <span class="nb">InterlockedAdd</span><span class="p">(</span><span class="n">overdrawUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>This leads to three outcomes for unprocessed pixels:</p>

<ul>
<li>If <code>prevID == unlockedID</code>, then the pixel holds the lock for its shading quad</li>
<li>If <code>prevID == id</code>, another pixel in the shading quad holds the lock</li>
<li>Otherwise, no pixel in the shading quad holds the lock</li>
</ul>


<p>In the first case we mark the pixel as processed and increment a lock counter. After an additional iteration, we release the lock. This ensures that pixels with the same ID see the state of the lock (second case), so that they can be filtered out. Finally, pixels that held the lock update the quad overdraw.</p>

<p>Ideally we&#8217;d loop until the pixel has been tagged as processed, but I haven&#8217;t had success with current NVIDIA drivers and UAV-dependent flow control, i.e.:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="p">[</span><span class="k-Attributes">allow_uav_condition</span><span class="p">]</span>
</span><span class='line'><span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="c1">// ...</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">lockCount</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span>
</span><span class='line'>        <span class="p">{</span>
</span><span class='line'>            <span class="nb">InterlockedExchange</span><span class="p">(</span><span class="n">lockUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">unlockedID</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>            <span class="k">break</span><span class="p">;</span>
</span><span class='line'>        <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>    <span class="c1">// ...</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">==</span> <span class="n">id</span><span class="p">)</span>
</span><span class='line'>        <span class="k">break</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>As a workaround, I&#8217;ve simply set the iteration count to a number that works well in practice across NVIDIA and AMD GPUs (those that I&#8217;ve had a chance to test, anyway).</p>

<h2>Four, Three, Two, One</h2>

<p>Now that we have a working system in place, it&#8217;s easy to gather other stats. For instance, although we can&#8217;t determine directly if a pixel is alive, we can count the number of live pixels in each shading quad, since <code>Interlocked*</code> operations are masked out for dead pixels. With this, we can tally up the number of quads with 1 to 4 live pixels in yet another UAV:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">lockUAV</span>      <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u0</span><span class="p">);</span>
</span><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">overdrawUAV</span>  <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u1</span><span class="p">);</span>
</span><span class='line'><span class="k">RWTexture2D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">liveCountUAV</span> <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u2</span><span class="p">);</span>
</span><span class='line'><span class="k">RWTexture1D</span><span class="o">&lt;</span><span class="kt">uint</span><span class="o">&gt;</span> <span class="n">liveStatsUAV</span> <span class="o">:</span> <span class="k-Declation">register</span><span class="p">(</span><span class="n">u3</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'><span class="p">[</span><span class="k-Attributes">earlydepthstencil</span><span class="p">]</span>
</span><span class='line'><span class="kt">void</span> <span class="n">OverdrawPS</span><span class="p">(</span><span class="kt">float4</span> <span class="n">vpos</span> <span class="o">:</span> <span class="k-Semantics">SV_Position</span><span class="p">,</span> <span class="kt">uint</span> <span class="n">id</span> <span class="o">:</span> <span class="k-Semantics">SV_PrimitiveID</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">uint2</span> <span class="n">quad</span> <span class="o">=</span> <span class="n">vpos</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="mf">0.5</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">uint</span>  <span class="n">prevID</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">unlockedID</span> <span class="o">=</span> <span class="mh">0xffffffff</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">bool</span> <span class="n">processed</span>  <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">int</span>  <span class="n">lockCount</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">int</span>  <span class="n">pixelCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
</span><span class='line'>    <span class="p">{</span>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">processed</span><span class="p">)</span>
</span><span class='line'>            <span class="nb">InterlockedCompareExchange</span><span class="p">(</span><span class="n">lockUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">unlockedID</span><span class="p">,</span> <span class="n">id</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>        <span class="p">[</span><span class="k-Attributes">branch</span><span class="p">]</span>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">==</span> <span class="n">unlockedID</span><span class="p">)</span>
</span><span class='line'>        <span class="p">{</span>
</span><span class='line'>            <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">lockCount</span> <span class="o">==</span> <span class="mi">4</span><span class="p">)</span>
</span><span class='line'>            <span class="p">{</span>
</span><span class='line'>                <span class="c1">// Retrieve live pixel count (minus 1) in quad</span>
</span><span class='line'>                <span class="nb">InterlockedAnd</span><span class="p">(</span><span class="n">liveCountUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="mi">0</span><span class="p">,</span> <span class="n">pixelCount</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>                <span class="c1">// Unlock for other quads</span>
</span><span class='line'>                <span class="nb">InterlockedExchange</span><span class="p">(</span><span class="n">lockUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="n">unlockedID</span><span class="p">,</span> <span class="n">prevID</span><span class="p">);</span>
</span><span class='line'>            <span class="p">}</span>
</span><span class='line'>            <span class="n">processed</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>        <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">if</span> <span class="p">(</span><span class="n">prevID</span> <span class="o">==</span> <span class="n">id</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">processed</span><span class="p">)</span>
</span><span class='line'>        <span class="p">{</span>
</span><span class='line'>            <span class="nb">InterlockedAdd</span><span class="p">(</span><span class="n">liveCountUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'>            <span class="n">processed</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
</span><span class='line'>        <span class="p">}</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">lockCount</span><span class="p">)</span>
</span><span class='line'>    <span class="p">{</span>
</span><span class='line'>        <span class="nb">InterlockedAdd</span><span class="p">(</span><span class="n">overdrawUAV</span><span class="p">[</span><span class="n">quad</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'>        <span class="nb">InterlockedAdd</span><span class="p">(</span><span class="n">liveStatsUAV</span><span class="p">[</span><span class="n">pixelCount</span><span class="p">],</span> <span class="mi">1</span><span class="p">);</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>To my surprise, incrementing a 4-wide UAV didn&#8217;t lead to a massive slowdown here. That said, one can certainly use a number of buckets for intermediate results (indexed by the lower bits of the screen position, for instance), if this proves to be a problem.</p>

<p>With these numbers, it&#8217;s trivial to add a pie chart to the final pass:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/counting-quads/figure_2.png"></p>

<p style="text-align: center;"><em>Figure 2: Quad overdraw (dark blue = 1x, to green = 4x),<br>and proportion of live pixels per quad (yellow = 4, to dark red = 1)</em></p>


<h2>Demolition</h2>

<p>For your convenience, I&#8217;ve packaged things up into a <a href="http://blog.selfshadow.com/code/QuadShading.zip">simple demo</a>. Please let me know if you hit any compatibility issues, or come up with any enhancements.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[SIGGRAPH 2012 Links]]></title>
    <link href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/"/>
    <updated>2012-08-11T02:59:00-04:00</updated>
    <id>http://blog.selfshadow.com/2012/08/11/siggraph-2012-links</id>
    <content type="html"><![CDATA[<p><img class="center" src="http://blog.selfshadow.com/images/siggraph-2012/s12_logo.png"></p>

<p>As with <a href="http://blog.selfshadow.com/2011/08/13/hpg-siggraph-2011/">last year</a>, I&#8217;m gathering links to SIGGRAPH content, to complement Ke-Sen Huang&#8217;s invaluable <a href="http://kesen.realtimerendering.com/sig2012.html">Technical Papers list</a>. Please let me know if you have anything to add to the list. <!--more--></p>

<h4>Birds of a Feather</h4>

<p><a href="http://www.khronos.org/developers/library/2012-siggraph-collada-bof">COLLADA</a> (<a href="http://www.youtube.com/watch?v=TvFqkP7zXcs">video</a>)<br/>
<a href="http://www.khronos.org/developers/library/2012-siggraph-opencl-bof">OpenCL</a> (<a href="http://www.youtube.com/watch?v=MPoS34BTuEE">video</a>)<br/>
<a href="http://www.khronos.org/developers/library/2012-siggraph-opengl-bof">OpenGL</a> (<a href="https://www.youtube.com/watch?v=bTO1D9pg4Ug">video</a>) (via <a href="https://twitter.com/jbaert/status/234648190132056064">@jbaert</a>)<br/>
<a href="http://www.khronos.org/developers/library/2012-siggraph-opengl-es-bof">OpenGL ES</a> (<a href="https://www.youtube.com/watch?v=LwVgEytP8GQ">video</a>)<br/>
<a href="http://www.openscenegraph.com/index.php/community/events">OpenSceneGraph</a> (starting to appear)<br/>
<a href="http://web.engr.oregonstate.edu/~mjb/sig12/">Teaching OpenGL in a Post-Deprecation World</a><br/>
<a href="http://www.khronos.org/webgl/wiki/Presentations#SIGGRAPH_2012_WebGL_BOF">WebGL</a> (<a href="http://www.youtube.com/watch?v=l40d2yEG-VU">video</a>)</p>

<h4>Courses</h4>

<p><a href="http://sites.google.com/site/qmcrendering/">Advanced (Quasi-) Monte Carlo Methods for Image Synthesis</a> (via <a href="https://twitter.com/sjb3d/status/234216726240301056">@sjb3d</a>)<br/>
<a href="http://advances.realtimerendering.com/s2012/index.html">Advances in Real-Time Rendering in Games</a><br/>
<a href="http://www.youtube.com/watch?v=Bsmamxfj_Jk">Applying Color Theory to Digital Media and Visualization</a> (video)<br/>
<a href="http://bps12.idav.ucdavis.edu/">Beyond Programmable Shading</a><br/>
<a href="http://cinematiccolor.com/">Cinematic Color: From Your Monitor to the Big Screen</a> (<a href="https://github.com/jeremyselan/cinematiccolor">draft</a> notes)<br/>
<a href="http://taniapouli.co.uk/research.html">Color Transfer</a><br/>
<a href="http://web.media.mit.edu/~gordonw/courses/ComputationalDisplays/">Computational Displays</a><br/>
<a href="http://web.media.mit.edu/~gordonw/courses/ComputationalPlenopticImaging/">Computational Plenoptic Imaging</a><br/>
<a href="http://www.gmrv.es/~motaduy/SIG12Course/">Data-Driven Simulation Methods in Computer Graphics: Cloth, Tissue, and Faces</a><br/>
<a href="http://www.cse.chalmers.se/~uffe/publications.htm">Efficient Real-Time Shadows</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;- <a href="http://www.dimension3.sk/2012/09/siggraph-2012-talk-shadows-in-games-practical-considerations/">Shadows in Games: Practical Considerations</a> (via <a href="https://twitter.com/mickaelgilabert/status/242696857581662208">@mickaelgilabert</a>)<br/>
<a href="http://www.femdefo.org/">FEM Simulation of 3D Deformable Solids: A Practitioner&#8217;s Guide to Theory, Discretization, and Model Reduction</a><br/>
<a href="http://web.engr.oregonstate.edu/~mjb/sig12/">Fundamentals Seminar</a><br/>
<a href="http://web.engr.oregonstate.edu/~mjb/sig12/">GPU Shaders for OpenGL 4.x</a><br/>
<a href="https://github.com/KhronosGroup/siggraph2012course">Graphics Programming on the Web</a><br/>
<a href="https://www.cs.unm.edu/~angel/">Introduction to Modern OpenGL</a><br/>
<a href="http://cgg.mff.cuni.cz/~jaroslav/papers/mlcourse2012/">Optimizing Realistic Rendering With Many-Light Methods</a><br/>
<a href="http://blog.selfshadow.com/publications/s2012-shading-course/">Practical Physically Based Shading in Film and Game Production</a><br/>
<a href="http://algarcia.org/">Principles of Animation Physics</a><br/>
<a href="http://users-cs.au.dk/toshiya/starpm2012/">State of the Art in Photon Density Estimation</a> (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-618809199">Jens Fursund</a>)<br/>
<a href="http://www.ge.imati.cnr.it/http%3A//www.ge.imati.cnr.it/training">The Hitchhiker&#8217;s Guide to the Galaxy of Mathematical Tools for Shape Analysis</a><br/>
<a href="http://mrelusive.com/publications/presentations/2012_siggraph/">Virtual Texuring in Software and Hardware</a></p>

<h4>Emerging Technologies</h4>

<p><a href="http://augmented-mirror.onthewings.net/air-drum/">Augmented Reflection of Reality</a><br/>
<a href="http://www.disneyresearch.com/research/projects/hci_botanicus_drp.htm">Botanicus Interacticus: Interactive Plants Technology</a><br/>
<a href="http://www.youtube.com/watch?v=m5GSSbSabGI">Chilly Chair: Facilitating an Emotional Feeling With Artificial Piloerection</a><br/>
<a href="http://www.vogue.is.uec.ac.jp/project/projects-1/claytricsurface">ClaytricSurface: An Interactive Surface With Dynamic Softness Control Capability</a><br/>
<a href="http://www.youtube.com/watch?v=eftU6iebOTo">Gosen: A Handwritten Notation Interface for Musical Performance and Learning Music</a><br/>
<a href="http://www0.cs.ucl.ac.uk/staff/j.kautz/publications/">Interactive Light-Field Painting</a><br/>
<a href="http://www.uetamasamichi.com/creative/jukecylinder">JUKE Cylinder: A Device to Metamorphose Hands to a Musical Instrument</a><br/>
<a href="http://moodmeter.media.mit.edu/">Mood Meter: Large-Scale and Long-Term Smile Monitoring System</a><br/>
<a href="http://lab.rekimoto.org/projects/possessedhand/">PossessedHand</a><br/>
<a href="http://www.disneyresearch.com/research/projects/hci_revel_drp.htm">REVEL: A Tactile Feedback Technology for Augmented Reality</a><br/>
<a href="http://www.designinterface.jp/en/projects/ShaderPrinter/">ShaderPrinter</a><br/>
<a href="http://www.vogue.is.uec.ac.jp/project/projects-1/splash">SplashDisplay: Volumetric Projecting Using Projectile Beads</a><br/>
<a href="http://www.youtube.com/watch?v=YqqpiBAfn0Q">Stuffed Toys Alive! Cuddly Robots From a Fantasy World</a><br/>
<a href="http://www.techtile.org/en/techtiletoolkit/">TECHTILE Toolkit</a> (<a href="http://vimeo.com/39972363">video</a>)<br/>
<a href="www.youtube.com/watch?v=KoC1iTOmYTg">TELESAR V: TELExistence Surrogate Anthropomorphic Robot</a><br/>
<a href="http://web.media.mit.edu/~dlanman/research/compressivedisplays/">Tensor Displays: Compressive Light-Field Synthesis Using Multilayer Displays With Directional Backlighting</a></p>

<h4>Exhibitor Tech Talks</h4>

<p><a href="http://software.intel.com/en-us/articles/SIGGraph-2012-event/">Intel</a><br/>
<a href="http://www.nvidia.com/object/siggraph-2012.html">NVIDIA</a> (streamable videos)</p>

<h4>Mobile Talks</h4>

<p><a href="http://www.geomerics.com/media/presentations.html">Advancing Dynamic Lighting on Mobile</a> (via <a href="https://twitter.com/palgorithm/status/235660094291976192">@palgorithm</a>, <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-619905685">Eric Haines</a>)<br/>
<a href="http://vidyasetlur.com/research.html">Auto(mobile): Mobile Visual Interfaces for the Road</a><br/>
Mobile Augmented Reality in Advertising: the TineMelk AR App - A Case Study (<a href="http://www.pfx.no/media/TineMelkAR/TineMelkAR_SIGGRAPH2012_1920x1080.pdf">slides</a>, <a href="http://www.pfx.no/media/TineMelkAR/TineMelk_AR_demo_subs.mov">video</a>) (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-628419811">Kim Baumann Larsen</a>)<br/>
<a href="http://blogs.unity3d.com/2012/08/13/unity-talk-at-siggraph-mobile-2012/">Unity: iOS and Android - Cross-Platform Challenges and Solutions</a> (via <a href="https://twitter.com/__ReJ__/status/235269902716502016">@__Rej__</a>)</p>

<h4>Open Source Releases</h4>

<p><a href="http://www.disneyanimation.com/technology/brdf.html">BRDF Explorer</a><br/>
<a href="http://graphics.pixar.com/opensubdiv">OpenSubdiv</a><br/>
<a href="http://www.openvdb.org/">OpenVDB</a></p>

<h4>Posters</h4>

<p><a href="http://web.media.mit.edu/~mhirsch/8D/">8D Display</a><br/>
<a href="http://www.motionsynthesis.org/hi-its-me.html">A Biologically Inspired Latent Space for Gait Parameterization</a><br/>
A Collision-Detection Method for High-Resolution Objects Using Tessellation Unit on GPU (<a href="http://www.youtube.com/watch?v=cdrkHuIbka8">video</a>)<br/>
<a href="http://96ochiai.ws/colloidaldisplay">A Colloidal Display: Membrane Screen That Combines Transparency, BRDF, and 3D Volume</a> (videos)<br/>
<a href="http://w3.impa.br/~andmax/publications.html">Base Mesh Construction Using Global Parametrization</a><br/>
<a href="http://www3.nccu.edu.tw/~li/">CurveThis: A Tool to Create Controllable Massive Crawling</a> (<a href="http://vimeo.com/45883300">video</a>)<br/>
<a href="http://leejinha.com/See-Through-3D-Desktop">Direct Spatial Interactions With See-Through 3D Desktop</a> (project page, video)<br/>
<a href="http://research.lighttransport.com/distance-aware-ray-tracing-for-curves/index.html">Distance Aware Ray Tracing for Curves</a><br/>
<a href="http://dali.ces.kyutech.ac.jp/paper/index.html">Easy-To-Use Authoring System for Noh (Japanese Traditional) Dance Animation</a><br/>
<a href="http://gl.ict.usc.edu/Research/StokesNormals/">Estimating Diffusion Parameters From Polarized Spherical Gradient Illumination</a> (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-658388094">Naty Hoffman</a>)<br/>
<a href="http://gl.ict.usc.edu/Research/Diffusion/">Estimating Specular Normals From Spherical Stokes Reflectance Fields</a> (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-658388094">Naty Hoffman</a>)<br/>
<a href="http://cgcad.thss.tsinghua.edu.cn/feiyun/">Fast Multi-Image-Based Photon Tracing With Grid-Based Gathering</a><br/>
<a href="http://cinescopophilia.com/unique-focus-tracking-for-cinematography/">Focus Tracking for Cinematography</a> (video)<br/>
<a href="http://graphics.im.ntu.edu.tw/~robin/plist.html#pp">GaussSketch: Add-On Magnetic Sensing for Natural Sketching on Smartphones</a><br/>
<a href="http://www.cg.tuwien.ac.at/research/publications/2012/Auzinger_2012_GeigerCam/">GeigerCam: Measuring Radioactivity With Webcams</a><br/>
<a href="http://ligiaduro.com/2012/06/22/graphicnarratives/">Graphic Narratives: Generative Book Covers</a> (background)<br/>
<a href="http://graphics.tu-bs.de/publications/NeumannMarkersSiggraph2012/">High-Detail Marker-Based 3D Reconstruction by Enforcing Multiview Constraints</a><br/>
<a href="http://drl.moo.jp/Publication_en.html">How to Draw Illustrative Figures?</a><br/>
<a href="http://www.calit2.net/~jschulze/publications/">Image-Based Smartphone Interaction With Large High Resolution Displays</a><br/>
<a href="http://www.gdv.informatik.uni-frankfurt.de/index.php?m=3&amp;sm=1&amp;c=%2Fforschung%2Fnibbla%2Findex.php">Interactive Generation of (Paleontological) Scientific Illustrations From 3D Models</a> (via <a href="https://twitter.com/numb3r23/status/236064697475018753">@numb3r23</a>)<br/>
<a href="http://www.youtube.com/watch?v=uKb9Yh9IANc">Lifelike Interactive Characters With Behavior Trees for Social Territorial Intelligence</a><br/>
<a href="http://www.jku.at/cg/content/e60566/e155404">Light-Field Supported Fast Volume Rendering</a><br/>
Magic Pot: Interactive Metamorphosis of the Perceived Shape (<a href="http://www.youtube.com/watch?v=5uEVNtgOcm4">video</a>)<br/>
Mimicat: Face Input Interface Supporting Animatronics Costume Performer’s Facial Expression (<a href="http://www.youtube.com/watch?v=y3mn0OGBijQ">video</a>)<br/>
Non-Rigid Shape Correspondence and Description Using Geodesic Field Estimate Distribution (<a href="http://www.youtube.com/watch?v=WsUNo7EIHsU">video</a>)<br/>
<a href="http://www.jku.at/cg/content/e152197">Panorama Light-Field Imaging</a><br/>
<a href="http://webdiis.unizar.es/~bmasia/pubs/project_page_PCA.html">Perceptually Optimized Content Remapping for Automultiscopic Displays</a><br/>
<a href="http://sites.google.com/site/tiffanycinglis/cv/publications">Pixelating Vector Line Art</a><br/>
<a href="http://www.karsten-schwenk.de/papers/papers_radfilter.html">Radiance Filtering for Interactive Path Tracing</a><br/>
<a href="http://cs.au.dk/~toshiya/">Randomized Coherent Sampling for Reducing Perceptual Rendering Error</a><br/>
<a href="http://webstaff.itn.liu.se/~jonun/web/Home.php">Real-Time HDR Video Reconstruction for Multi-Sensor Systems</a><br/>
<a href="http://amateras.wsd.kutc.kansai-u.ac.jp/shadowpp/">Shadow++: A System for Generating Artificial Shadows Based on Object Movement</a><br/>
<a href="http://www.cs.unibo.it/~marfia/biblio.html">Technoculture of Handcraft: Fine Gesture Recognition for Haute Couture Skills Preservation and Transfer in Italy</a><br/>
<a href="http://www.jku.at/cg/content/e152197">Towards A Transparent, Flexible, Scalable, and Disposable Image Sensor</a><br/>
<a href="http://www.j3l7h.de/publications.html">Typeface Styling with Ramp Responses</a><br/>
<a href="http://www.youtube.com/watch?v=0dSXVNZpYLE">Video Retrieval Based on User-Specified Deformation</a></p>

<h4><a href="http://jasonrmsmith.wordpress.com/2012/08/18/siggraph-real-time-live-2012/">Real-Time Live!</a></h4>

<h4>Studio Talks</h4>

<p><a href="http://diylilcnc.org/">DIYLILCNC v2.0</a> (project page)<br/>
<a href="http://www.dgp.toronto.edu/~rms/#Publications">Interactive Modeling with Mesh Surfaces</a> (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-619348610">Ryan Schmidt</a>)<br/>
<a href="http://www.rawshaping.com/?p=709">Loosely Fitted Design Synthesizer [LFDS]</a> (project page, media)<br/>
<a href="http://www.rhythmsynthesis.com/">RhythmSynthesis</a> (website, thesis)<br/>
<a href="http://flowywork.wordpress.com/2012/08/14/did-you-miss-our-sketchgraph-talk-at-siggraph/">SketchGraph: Gestural Data Input for Mobile Tablet Devices</a> (<a href="http://www.youtube.com/watch?v=tN7nMKiDDnI">video</a>)<br/>
<a href="http://www.designinterface.jp/en/projects/vignette/">Vignette: A Style-Preserving Sketching Tool for Pen-and-Ink Illustration</a> (project page, videos)</p>

<h4>Studio Workshops</h4>

<p><a href="http://www.crytek.com/cryengine/presentations">MAXScript for Artists</a><br/>
<a href="http://www.crytek.com/cryengine/presentations">VFX for Games: Particle Effects</a><br/>
<a href="http://www.crytek.com/cryengine/presentations">VFX for Games: Pre-Baked Destruction</a></p>

<h4><a href="http://www.renderman.org/RMR/Examples/srt2012/">Stupid RenderMan/RAT Tricks</a></h4>

<h4>Talks</h4>

<p><a href="http://3drepo.org/publications/">3D Diff: An Interactive Approach to Mesh Differencing and Conflict Resolution</a><br/>
<a href="http://gl.ict.usc.edu/Research/SSLP/">A Single-Shot Light Probe</a><br/>
<a href="http://schneiderfx.blogspot.ca/p/publications.html">A World of Voxels: The Volumetric Effects of “Ice Age: Continental Drift”</a><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Amorphous: An OpenGL Sparse Volume Renderer</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://gamma.cs.unc.edu/AUDIO_MATERIAL/">AudioCloning: Extracting Material Fingerprints from Example Audio Recording</a><br/>
<a href="http://research.satkin.com/#publication">Building Interior Multi-Panorama Experiences at Scale</a><br/>
<a href="http://perso.telecom-paristech.fr/~boubek/papers/CageR-talk/">CageR: From 3D Performance Capture to Cage-Based Representation</a><br/>
<a href="http://ken.museth.org/Publications.html">Cloud Modeling And Rendering for “Puss In Boots”</a><br/>
<a href="www.rle.mit.edu/stir/codac/">CoDAC: Compressive Depth Acquisition Using a Single Time-Resolved Sensor</a><br/>
<a href="http://www.humus.name/index.php?page=Articles">Creating Vast Game Worlds - Experiences From Avalanche Studios</a> (via <a href="https://twitter.com/_Humus_/status/237513268632092672">@__Humus__</a>)<br/>
<a href="http://www.disneyanimation.com/technology/publications#talks">dRig: An Artist-Friendly, Object-Oriented Approach to Rig Building</a> (via <a href="http://blog.selfshadow.com/2012/08/11/siggraph-2012-links/#comment-702127745">Naty</a>)<br/>
<a href="http://ken.museth.org/Publications.html">Efficient and Seamless Volumetric Fracturing</a><br/>
<a href="http://www-scf.usc.edu/~yufengzh/">Estimating Diffusion Parameters From Polarized Spherical Gradient Illumination</a><br/>
<a href="http://www.andrewwillmott.com/talks/fast-generation-of-directional-occlusion-volumes">Fast Generation of Directional Occlusion Volumes</a><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Hero-Quality Crowds in &#8220;Madagascar 3: Europe&#8217;s Most Wanted&#8221;</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Importance Sampling for Hair Scattering</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://www.the11ers.com/glaze/">Intelligent Brush Strokes</a><br/>
<a href="http://research.microsoft.com/en-us/projects/animateworld/">KinÊtre: Animating the World With the Human Body</a> (project page, videos)<br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">LibEE: A Multithreaded Dependency Graph for Character Animation</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://seblagarde.wordpress.com/2012/08/11/siggraph-2012-talk/">Local Image-Based Lighting With Parallax-Corrected Cubemaps</a> (via <a href="https://twitter.com/SebLagarde/status/234329267608100864">@SebLagarde</a>)<br/>
<a href="http://gl.ict.usc.edu/Research/Microgeometry/">Measurement-Based Synthesis of Facial Microgeometry</a><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Magic Beanstalk Ride in &#8220;Puss In Boots&#8221;</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://graphics.pixar.com/library/RadiosityCaching/">Multiresolution Radiosity Caching for Global Illumination in Movies</a><br/>
<a href="http://www.jku.at/cg/content/e152197/">Panorama Light-Field Imaging</a><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Point-Based Global Illumination Directional Importance Mapping</a> <img src="http://blog.selfshadow.com/images/new.png"><br/>
<a href="http://www.iliyan.com/publications/ProgressiveLightcuts">Progressive Lightcuts for GPU</a><br/>
<a href="http://webdiis.unizar.es/~bmasia/pubs/project_page_RR.html">Relativistic Ultrafast Rendering Using Time-Resolved Imaging</a><br/>
<a href="http://www-sop.inria.fr/reves/Basilic/2012/LBD12/">Rich Intrinsic Image Decomposition of Outdoor Scenes From Multiple Views</a> (TVCG paper)<br/>
<a href="http://www.popekim.com/2012/10/siggraph-2012-screen-space-decals-in.html">Screen Space Decals in Warhammer 40,000: Space Marine</a> (via <a href="https://plus.google.com/118115999548214977791/posts/P7rP2uHg8uE">Tuan Kuranes</a>, <a href="https://twitter.com/BlindRenderer/status/256752490366791680">@BlindRenderer</a>)<br/>
<a href="http://web.yonsei.ac.kr/wjlee/papers.htm">SGRT: A Scalable Mobile GPU Architecture Based on Ray Tracing</a><br/>
<a href="http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&amp;id=tiled_clustered_forward_talk">Tiled and Clustered Forward Shading</a><br/>
<a href="http://gautron.pascal.free.fr/publications.htm">Volume-Aware Extinction Mapping</a><br/>
<a href="http://www.dreamworksanimation.com/insidedwa/tech/papers">Vortex of Awesomeness</a> <img src="http://blog.selfshadow.com/images/new.png"></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Blending in Detail]]></title>
    <link href="http://blog.selfshadow.com/2012/07/10/blending-in-detail/"/>
    <updated>2012-07-10T02:40:00-04:00</updated>
    <id>http://blog.selfshadow.com/2012/07/10/blending-in-detail</id>
    <content type="html"><![CDATA[<table style="margin-left:auto;margin-right:auto;">
<tr>
<td><img src="http://blog.selfshadow.com/images/blending-in-detail/base_160.png"></td>
<td><strong>&nbsp;&nbsp;+&nbsp;&nbsp;</strong></td>
<td><img src="http://blog.selfshadow.com/images/blending-in-detail/detail_160.png"></td>
<td><strong>&nbsp;&nbsp;=&nbsp;&nbsp;</strong></td>
<td><img src="http://blog.selfshadow.com/images/blending-in-detail/dashed_box.png"></td>
</table>


<p></br>
I&#8217;ve added a new article, <a href="http://blog.selfshadow.com/publications/blending-in-detail"><em>Blending in Detail</em></a>, written together with <a href="http://colinbarrebrisebois.com/">Colin Barré-Brisebois</a>, on the topic of blending normal maps. We go through various techniques that are out there, as well as a neat alternative (&#8220;Reoriented Normal Mapping&#8221;) from Colin that I helped to optimise.</p>

<p>This is by no means a <em>complete</em> analysis – particularly as we focus on detail mapping – so we might return to the subject at a later date and tie up some loose ends. In the meantime, I hope you find the article useful. Please let us know in the comments!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Travelling Without Moving]]></title>
    <link href="http://blog.selfshadow.com/2012/04/11/travelling-without-moving/"/>
    <updated>2012-04-11T23:44:00-04:00</updated>
    <id>http://blog.selfshadow.com/2012/04/11/travelling-without-moving</id>
    <content type="html"><![CDATA[<p>Lately I’d been getting increasingly frustrated with the limitations of WordPress(.com), so I longed for a change. With the Easter weekend, I finally had a little extra time and energy to make the switch to <a href="http://octopress.org">Octopress</a>, plus a dedicated web host. Hopefully that’ll encourage me to start posting again, or at least remove one major grumble. I’m also looking forward to such liberties as the ability to embed WebGL, though I can’t entirely promise that I’ll wield such power responsibly.</p>

<p><del>Existing post URLs remain the same, but if you’re one of the illustrious few who subscribe to the blog via RSS, I’m guessing that you’ll need to change over to the new feed.</del> <strong>Update:</strong> I&#8217;m redirecting the old feed URL now, so everything should be back to normal! Speaking of RSS, as I’m now using <a href="http://www.mathjax.org/">MathJax</a> for $\LaTeX$, it appears that I’ll need to implement a fallback there, in addition to tracking down a rendering issue with Chrome. Please let me know if you spot any other oddities.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Righting Wrap (Part 2)]]></title>
    <link href="http://blog.selfshadow.com/2012/01/07/righting-wrap-part-2/"/>
    <updated>2012-01-07T21:07:12-05:00</updated>
    <id>http://blog.selfshadow.com/2012/01/07/righting-wrap-part-2</id>
    <content type="html"><![CDATA[<h2>Wrapping Paper</h2>

<p>I first tinkered with SH wrap shading (as described in <a href="http://blog.selfshadow.com/2011/12/31/righting-wrap-part-1">part 1</a>) for <em>Splinter Cell: Conviction</em>, since we were using a couple of models [1][2] for some character-specific materials. Unfortunately, due to the way that indirect character lighting was performed, it would have required additional memory that we couldn&#8217;t really justify at that point in development. Consequently, this work was left on the cutting room floor and I only got as far as testing out Green&#8217;s model [1].</p>

<p>Recently, however, I spotted that <em>Irradiance Rigs</em> [3] covers similar ground. At the very end of the short paper, they briefly present a generalisation of Valve&#8217;s <em>Half Lambert</em> model [2] and the SH convolution terms for the first three bands:<!--more--></p>

<p>$$f(\theta, a) = \left(\frac{\mathrm{max}\left[\mathrm{cos}\,\theta + a,\,0\right]}{1+a}\right)^{1+a}, &#92;&#92; \quad \mathbf{f} = \left[\frac{2(1+a)}{2 + a},\frac{4(1+a)}{(2+a)(3+a)},\frac{2(1+a)(3-2a+a^2)}{(2+a)(3+a)(4+a)} \right]$$</p>

<p>This tidily combines the tunability of [1] with the tighter falloff of [2], albeit at the cost of a few extra instructions in the case of direct lighting. It&#8217;s not energy-conserving though, so for kicks I went through the maths – see appendix – and made the necessary adjustments:</p>

<p>$$\hat{f}(\theta, a) = \frac{2+a}{2(1+a)}\left(\frac{\mathrm{max}\left[\mathrm{cos}\,\theta + a,\,0\right]}{1+a}\right)^{1+a}, &#92;&#92; \quad \mathbf{\hat{f}} = \left[1,\frac{2}{3+a},\frac{3-2a+a^2}{(3+a)(4+a)} \right]$$</p>

<p>I would suggest this as a good workout if your calculus skills are a little on the rusty side; think of it as a much-needed trip to the maths gym: sure it&#8217;s going to hurt at first, but you&#8217;ll feel better afterwards!</p>

<p>The same authors have since written a more in-depth paper, <em>Wrap Shading</em> [4], which Derek Nowrouzezahrai has kindly made available <a href="http://www.iro.umontreal.ca/~derek/publication8.html">here</a>. I recommend checking it out, since there&#8217;s some nice analysis and plenty of background information. One notable insight is that their model is perfectly represented by 3rd-order SH when $a = 1$ (i.e, Half Lambert). This becomes clear when you consider that the model is effectively <em>unclamped</em> in that case, so appropriate scaling of the constant, linear and quatratic bands ($y_{0}^{0}, y_{1}^{0}, y_{2}^{0}$) will match the function:</p>

<p>$$ f(\theta, 1) = \left(\frac{\mathrm{cos}\,\theta + 1}{2}\right)^{2} = 0.25 + 0.5\mathrm{cos}\,\theta + 0.25\mathrm{cos^2}\,\theta $$</p>

<p>A similar observation <a href="http://twitter.com/#!/nothings/status/119728347449278465">can be made</a> with Green&#8217;s model: it&#8217;s perfectly represented by <em>2nd</em>-order SH when $a = 1$.</p>

<h2>Shrink Wrap</h2>

<p>But wait, at the end of the <a href="http://blog.selfshadow.com/2011/12/31/righting-wrap-part-1">part 1</a>, didn&#8217;t I promise that there would be a discussion of <em>optimisation</em> in this post? You&#8217;re quite right. Well, it just so happens that a snippet of reference shader code from this last paper makes for a neat little case study on improving shader performance.</p>

<h4>Reference Version</h4>

<p>This is pretty much the reference implementation for generating the normalised convolution terms of their generalised model:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="c1">// Normalization factor for our model.</span>
</span><span class='line'>    <span class="kt">float</span> <span class="n">norm</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="p">(</span><span class="mi">2</span> <span class="o">+</span> <span class="n">fA</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">fA</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="n">fA</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">fA</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="n">fA</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">fA</span> <span class="o">+</span> <span class="mi">4</span><span class="p">);</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">norm</span><span class="o">*</span><span class="kt">float3</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="mi">2</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="o">/</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">z</span><span class="p">),</span>
</span><span class='line'>                       <span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="o">*</span><span class="p">(</span><span class="n">fA</span><span class="o">*</span><span class="n">fA</span> <span class="o">-</span> <span class="n">t</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">z</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">w</span><span class="p">));</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="kt">float4</span> <span class="n">main</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span> <span class="o">:</span> <span class="n">TEXCOORD</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">GeneralWrapSH</span><span class="p">(</span><span class="n">fA</span><span class="p">).</span><span class="n">xyzz</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>The only thing that I&#8217;ve changed – beyond adding calling code – is to pass in the wrap parameter <code>fA</code> from the vertex shader. It was previously a user-supplied constant, which doesn&#8217;t make for a particularly credible example, since in that case all of the maths could simply be moved to the CPU and performed just the once!</p>

<p>Note that there&#8217;s been some attempt to pull out common terms, particularly for the final component, where instead of <code>fA*fA - 2*fA + 3</code> (see $\mathbf{f}$) we now have <code>fA*fA - t.x + 5</code>.</p>

<p>Without further ado, let&#8217;s see how this stacks up in terms of <code>ps_3_0</code> instructions:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>//
</span><span class='line'>// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
</span><span class='line'>//
</span><span class='line'>//   fxc /Tps_3_0 /O3 shader.hlsl
</span><span class='line'>//
</span><span class='line'>    ps_3_0
</span><span class='line'>    def c0, 2, 1, 3, 4
</span><span class='line'>    def c1, 0.5, 4, 5, 0
</span><span class='line'>    dcl_texcoord v0.x
</span><span class='line'>    add r0, c0, v0.x
</span><span class='line'>    add r1.x, r0.y, r0.y
</span><span class='line'>    mad r1.y, v0.x, v0.x, -r1.x
</span><span class='line'>    add r1.y, r1.y, c1.z
</span><span class='line'>    mul r1.y, r1.y, r1.x
</span><span class='line'>    mul r0.z, r0.z, r0.x
</span><span class='line'>    mul r0.w, r0.w, r0.z
</span><span class='line'>    rcp r0.z, r0.z
</span><span class='line'>    rcp r0.w, r0.w
</span><span class='line'>    mul r2.zw, r0.w, r1.y
</span><span class='line'>    mul r1.yz, r0.xxyw, c1.xxyw
</span><span class='line'>    rcp r0.y, r0.y
</span><span class='line'>    rcp r0.x, r0.x
</span><span class='line'>    mul r2.xy, r0.xzzw, r1.xzzw
</span><span class='line'>    mul r0.x, r0.y, r1.y
</span><span class='line'>    mul oC0, r2, r0.x
</span><span class='line'>
</span><span class='line'>// approximately 16 instruction slots used</span></code></pre></td></tr></table></div></figure>


<p>Ouch! 16 is fairly substantial, but perhaps not all that surprising going by the HLSL. Since this is device-independent assembly, I decided to check the ALU count on Xbox 360 for comparison. In that case it&#8217;s a somewhat more reasonable 10 operations, because 5 scalar ops get dual-issued with vector ops. So, in summary, we have:</p>

<p><code>DX9: 16, X360: 10(+5) ALU ops</code></p>

<h4>Cancellation</h4>

<p>Immediately, a simple but very effective change we can make is to cancel through by the normalisation term, which leaves us with $\mathbf{\hat{f}}$ directly:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH2</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float2</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">float2</span><span class="p">(</span><span class="n">fA</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">fA</span> <span class="o">+</span> <span class="mi">4</span><span class="p">);</span>
</span><span class='line'>    <span class="k">return</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="n">fA</span><span class="o">*</span><span class="n">fA</span> <span class="o">-</span> <span class="mi">2</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="p">));</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Don&#8217;t expect the compiler to do intelligent optimisations like this; constant folding yes, factoring sometimes, sophisticated symbolic manipulation? Good luck!</p>

<p>For instance, even seemingly &#8216;obvious&#8217; opportunities like <code>(a/b)/(b/a)</code> will go unnoticed by FXC. This isn&#8217;t down to the compiler trying to maintain special-case behaviour such as divide by zero either, because it will happily replace <code>a/a</code> with <code>1</code> in the absence of any knowledge about the value of a.</p>

<p>Apologies if that was already perfectly clear and all I&#8217;ve done is insult your intelligence, but I&#8217;ve seen some people blithely leave everything up to the compiler and not scrutinise what it&#8217;s generating. Of course, high-level algorithmic optimisations are hugely important as well, but so is this lower-level stuff when a shader is being executed for millions of pixels!</p>

<p>Just look at what this small amount of effort has netted us:</p>

<p><code>DX9: 10, X360: 5(+3) ALU ops</code></p>

<h4>Factorisation</h4>

<p>Next we can factor <code>fA*fA - 2*fA + 3</code> again – this time as <code>(fA + 1)(fA + 3) - 6*fA</code> – to reduce the numerator of the third term to a single multiply-add:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH3</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">t</span> <span class="o">=</span> <span class="n">fA</span> <span class="o">+</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float2</span> <span class="n">s</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="mi">6</span><span class="o">*</span><span class="n">fA</span><span class="p">)</span><span class="o">/</span><span class="n">s</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>I&#8217;ve also taken the opportunity to manually vectorise the addition of <code>fA</code>, plus a subsequent pair of multiplications between resulting terms. In fact, the compiler does this anyway, as it&#8217;s relatively good at vectorising code. Still, one shouldn&#8217;t assume that it will <em>always</em> get things right!</p>

<p>Whether there&#8217;s a gain or not, manual vectorisation – which is often quick to do – makes it easier to sanity check the output assembly. Just scanning through, you might expect <code>add, mul, mov, rcp, mul, mad, rcp, mul</code> and you&#8217;d be pretty much spot on.</p>

<p>So, for DX9 we&#8217;ve reduced the op count by 2, but what about Xbox 360? Here, we&#8217;ve only succeeded in shaving off one paired scalar op. However, this may turn into a real gain once the function is part of a larger shader.</p>

<p><code>DX9: 8, X360: 5(+2) ALU ops</code></p>

<h4>Rescaling</h4>

<p>This next trick involves rescaling so that the second term becomes <code>1/t.y</code>, or a single <code>rcp</code>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float4</span> <span class="n">c</span><span class="p">;</span> <span class="c1">// {1, 0.5, 1.5, 4}</span>
</span><span class='line'>
</span><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH4</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">t</span> <span class="o">=</span> <span class="n">fA</span><span class="o">*</span><span class="n">c</span><span class="p">.</span><span class="n">xyx</span> <span class="o">+</span> <span class="n">c</span><span class="p">.</span><span class="n">xzw</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">float2</span> <span class="n">s</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="n">t</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="mi">3</span><span class="o">*</span><span class="n">fA</span><span class="p">)</span><span class="o">/</span><span class="n">s</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>You might wonder why I&#8217;m using an external constant here. Well, it turns out that FXC will misoptimise when it knows the values. Bad compiler! Again, there&#8217;s one less paired scalar op on Xbox 360:</p>

<p><code>DX9: 7, X360: 5(+1) ALU ops</code></p>

<h4>Expansion</h4>

<p>Rather than factoring terms, we could have expanded $\mathbf{\hat{f}}$ instead:</p>

<p>$$\mathbf{\hat{f}} = \left[1,\frac{2}{3+a},1+\frac{18}{3+a}-\frac{27}{4+a} \right ]=\left[1,0,1 \right ] + \frac{1}{3+a}\left[0,2,18 \right ] - \frac{1}{4+a}\left[0,0,27 \right ]$$</p>

<p>Or in code:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float4</span> <span class="n">c</span><span class="p">;</span> <span class="c1">// {1, 0, 2, 18}</span>
</span><span class='line'>
</span><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH5</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float2</span> <span class="n">t</span> <span class="o">=</span> <span class="n">fA</span> <span class="o">+</span> <span class="kt">float2</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">xyz</span>  <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">xyx</span><span class="p">;</span>     <span class="c1">// (1, 0,  1)</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">xyz</span> <span class="o">+=</span> <span class="n">c</span><span class="p">.</span><span class="n">xzw</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">x</span><span class="p">;</span> <span class="c1">// (0, 2, 18)</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">z</span>   <span class="o">-=</span> <span class="mi">27</span><span class="o">/</span><span class="n">t</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>This is a win for <code>ps_3_0</code> but not for Xbox 360, as it removes the opportunity for pairing. It&#8217;s possible that some clever variation could fix this, but it doesn&#8217;t matter because we haven&#8217;t exhausted our optimisation options&#8230;</p>

<p><code>DX9: 6, X360: 6 ALU ops</code></p>

<h4>Fitting</h4>

<p>There are potentially significant gains to be had from numerical fitting, so it&#8217;s worth taking the time familiarise yourself with the various techniques, maths packages and libraries out there.</p>

<p>In this instance, I&#8217;m performing a cubic fit – i.e. $ax^3 + bx^2 + cx + d$ – for the 2nd and 3rd bands. Polynomials are attractive for performance because they can be efficiently evaluated as a series of <code>mad</code> instructions when written in <a href="http://reference.wolfram.com/mathematica/ref/HornerForm.html">Horner form</a>: $x(x(ax + b) + c) + d$</p>

<p>With careful vectorisation, this collapses to the following:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH6</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t0</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="o">-</span><span class="mf">0.018974</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.052933</span><span class="p">,</span> <span class="mf">0.076301</span><span class="p">,</span> <span class="mf">0.208592</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t1</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="o">-</span><span class="mf">0.223994</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.305659</span><span class="p">,</span> <span class="mf">0.666667</span><span class="p">,</span> <span class="mf">0.250000</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">x</span>  <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">yz</span> <span class="o">=</span> <span class="n">t0</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t0</span><span class="p">.</span><span class="n">zw</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">yz</span> <span class="o">=</span>  <span class="n">r</span><span class="p">.</span><span class="n">yz</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t1</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">yz</span> <span class="o">=</span>  <span class="n">r</span><span class="p">.</span><span class="n">yz</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t1</span><span class="p">.</span><span class="n">zw</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Xbox 360 does all this in one less operation because placing <code>1</code> into <code>r.x</code> can be achieved with a destination register modifier:</p>

<p><code>DX9: 4, X360: 3 ALU ops</code></p>

<p>I could present graphs showing how the cubic approximations fare, but take it from me that they are extemely close. In fact, we can arguably drop down to a quadratic fit and save a further <code>mad</code> in the process. This is still acceptable:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/wrap-2/figure_1.png"></p>

<p style="text-align: center;"><em>Figure 1: Comparison between original and quadratic fit for 2nd and 3rd bands (left, right)</em></p>


<p>In both cases – cubic and quadratic – I&#8217;ve actually constrained the fitting process so that the curves go through the endpoints. This reduces the worst case error a little and maintains the nice property of exactness when $a = 1$. Of course, something has to give and so the <em>average</em> error is a little higher.</p>

<p>In practice, this quadratic approximation has little effect on the end result. When lighting with a single directional source – a worst-case scenario – the difference is slight and far less significant than the error that comes from using 3rd-order SH in the first place.</p>

<p>Here&#8217;s the code for the quadratic version:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH7</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t0</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="mf">0.047771</span><span class="p">,</span> <span class="mf">0.129310</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.214438</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.279310</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t1</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="mf">0.666667</span><span class="p">,</span> <span class="mf">0.250000</span><span class="p">,</span>  <span class="mf">0.000000</span><span class="p">,</span>  <span class="mf">0.000000</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">x</span>  <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">yz</span> <span class="o">=</span> <span class="n">t0</span><span class="p">.</span><span class="n">xy</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t0</span><span class="p">.</span><span class="n">zw</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span><span class="p">.</span><span class="n">yz</span> <span class="o">=</span>  <span class="n">r</span><span class="p">.</span><span class="n">yz</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t1</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><code>DX9: 3, X360: 2 ALU ops</code></p>

<h4>Modifiers</h4>

<p>And yet, we&#8217;re still not done! The DX9 figure suggests that we might pay the instruction cost of moving <code>1</code> into <code>r.x</code> with some GPUs, and although it could go away when the terms are actually <em>used</em>, it would be cute if we could get rid of it just in case.</p>

<p>Notice that the two curves are monotonically decreasing and within the range [0, 1]. If we negate the intermediate result of the first <code>mad</code>, saturate and then negate again, there will be no overall effect. By doing this, we can take <code>r.x</code> along for the ride and force it to <code>0</code> through one of the negative constants, then add <code>1</code> via the final <code>mad</code>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">GeneralWrapSH8</span><span class="p">(</span><span class="kt">float</span> <span class="n">fA</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t0</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span><span class="o">-</span><span class="mf">0.047771</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.129310</span><span class="p">,</span> <span class="mf">0.214438</span><span class="p">,</span> <span class="mf">0.279310</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float4</span> <span class="n">t1</span> <span class="o">=</span> <span class="kt">float4</span><span class="p">(</span> <span class="mf">1.000000</span><span class="p">,</span>  <span class="mf">0.666667</span><span class="p">,</span> <span class="mf">0.250000</span><span class="p">,</span> <span class="mf">0.000000</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span> <span class="o">=</span> <span class="nb">saturate</span><span class="p">(</span><span class="n">t0</span><span class="p">.</span><span class="n">xxy</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t0</span><span class="p">.</span><span class="n">xzw</span><span class="p">);</span>
</span><span class='line'>    <span class="n">r</span> <span class="o">=</span> <span class="o">-</span><span class="n">r</span><span class="o">*</span><span class="n">fA</span> <span class="o">+</span> <span class="n">t1</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Because saturation and negation are typically free register modifiers, we save an operation:</p>

<p><code>DX9: 2, X360: 2 ALU ops</code></p>

<h2>Going Green</h2>

<p>The <em>Wrap Shading</em> paper doesn&#8217;t include a normalised version of Green&#8217;s model (see <a href="http://blog.selfshadow.com/2011/12/31/righting-wrap-part-1">part 1</a>), so here&#8217;s code for that too:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float4</span> <span class="n">t0</span><span class="p">;</span> <span class="c1">// {0, 1/4, -1/3, -1/2}</span>
</span><span class='line'><span class="kt">float4</span> <span class="n">t1</span><span class="p">;</span> <span class="c1">// {1, 2/3,  1/4,    0}</span>
</span><span class='line'>
</span><span class='line'><span class="kt">float3</span> <span class="n">GreenWrapSH</span><span class="p">(</span><span class="kt">float</span> <span class="n">fW</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span> <span class="o">=</span> <span class="n">t0</span><span class="p">.</span><span class="n">xxy</span><span class="o">*</span><span class="n">fW</span> <span class="o">+</span> <span class="n">t0</span><span class="p">.</span><span class="n">xzw</span><span class="p">;</span>
</span><span class='line'>    <span class="n">r</span> <span class="o">=</span>  <span class="n">r</span><span class="p">.</span><span class="n">xyz</span><span class="o">*</span><span class="n">fW</span> <span class="o">+</span> <span class="n">t1</span><span class="p">.</span><span class="n">xyz</span><span class="p">;</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><code>DX9: 2, X360: 2 ALU ops</code></p>

<h2>Wrapping Up</h2>

<p>Here&#8217;s a <strong><a href="http://www.selfshadow.com/sandbox/wrap.html">WebGL sample</a></strong> that encapsulates this mini-series on wrap shading.</p>

<p>In conclusion, shader optimisation is critical for video game rendering, so you shouldn&#8217;t defer to the compiler. To quote Michael Abrash: &#8220;The best optimizer is between your ears&#8221;. Don&#8217;t forget it, <em>train</em> it!</p>

<h2>References</h2>

<p>[1] Green, S., <a href="http://http.developer.nvidia.com/GPUGems/gpugems_ch16.html">&#8220;Real-Time Approximations to Subsurface Scattering&#8221;</a>, GPU Gems, 2004.<br/>
[2] Mitchell, J., McTaggart, G., Green, C., <a href="http://www.valvesoftware.com/publications.html">&#8220;Shading in Valve&#8217;s Source Engine&#8221;</a>, Advanced Real-Time Rendering in 3D Graphics and Games, SIGGRAPH Course, 2006.<br/>
[3] Yuan, H., Nowrouzezahrai, D., Sloan, P.-P., <a href="http://www.iro.umontreal.ca/~derek/publication9.html">&#8220;Irradiance Rigs&#8221;</a>, SIGGRAPH Talk, 2010.<br/>
[4] Sloan, P.-P., Nowrouzezahrai, D., Yuan, H., <a href="http://www.iro.umontreal.ca/~derek/publication8.html">&#8220;Wrap Shading&#8221;</a>, Journal of Graphics, GPU, and Game Tools, 15:4, 252-259, 2011.</p>

<h2>Appendix</h2>

<p>Normalisation factor for generalised Half Lambert:</p>

<p>$$ \begin{array}{lcl} &amp;&amp; \frac{1}{\pi}\int_{\Omega}\left(\frac{\mathrm{max}\left[\mathrm{cos}\,\theta+w,\,0\right]}{1+w}\right)^{1+w}\mathrm{d}\omega &#92;&#92; &amp;=&amp;\frac{1}{\pi}\int_{0}^{2\pi}\int_{0}^{\pi}\left(\frac{\mathrm{max}\left[\mathrm{cos}\,\theta+w,\,0\right]}{1+w}\right)^{1+w}\mathrm{sin}\,\theta\,\mathrm{d}\theta\,\mathrm{d}\phi &#92;&#92; &amp;=&amp; 2\int_{0}^{\alpha}\left(\frac{\mathrm{cos}\,\theta+w}{1+w}\right)^{1+w}\mathrm{sin}\,\theta\,\mathrm{d}\theta \text{, where }\alpha=\mathrm{cos}^{-1}(-w) &#92;&#92; &amp;&amp;\text {Substitute } x=\mathrm{cos}\,\theta,\mathrm{d}x=-\mathrm{sin}\,\theta\,\mathrm{d}\theta &#92;&#92; &amp;=&amp; -2\int_{1}^{-w}\left(\frac{x+w}{1+w}\right)^{1+w}\mathrm{d}x &#92;&#92; &amp;=&amp; 2\int_{-w}^{1}\left(\frac{x+w}{1+w}\right)^{1+w}\mathrm{d}x &#92;&#92; &amp;&amp; \text {Recall that}\int \left(\frac{x+a}{b}\right)^n \mathrm{d}x = \frac{b}{n+1}\left(\frac{x+a}{b}\right)^{n+1} + c &#92;&#92; &amp;=&amp; 2\left[\frac{1+w}{2+w}\left(\frac{x+w}{1+w}\right)^{2+w}\right]_{-w}^{1} &#92;&#92; &amp;=&amp; 2\frac{1+w}{2+w}\left(\left(\frac{1+w}{1+w}\right)^{2+w}-\left(\frac{-w +w}{1+w}\right)^{2+w}\right) &#92;&#92; &amp;=&amp; \frac{2(1+w)}{2+w}\left(1-0\right) &#92;&#92; \end{array} &#92;&#92; \therefore \text{ normalise } f(\theta, w) \text{ with } \frac{2+w}{2(1+w)} $$</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Righting Wrap (Part 1)]]></title>
    <link href="http://blog.selfshadow.com/2011/12/31/righting-wrap-part-1/"/>
    <updated>2011-12-31T18:39:22-05:00</updated>
    <id>http://blog.selfshadow.com/2011/12/31/righting-wrap-part-1</id>
    <content type="html"><![CDATA[<p>A while back, <a href="http://blog.stevemcauley.com/">Steve McAuley</a> and I were discussing physically-based rendering miscellanea over a quiet pint – hardly a stretch of the imagination, since we&#8217;re English 3D programmers after all! Anyway, it turned out that we both had plans to write up a few thoughts in relation to <em>wrap shading</em>, and, following some gentle arm-twisting, Steve has <a href="http://blog.stevemcauley.com/2011/12/03/energy-conserving-wrapped-diffuse/">posted</a> his. I suggest that you go and read that first if you haven&#8217;t already, then return here for a continuation of the subject.</p>

<h2>Bad Wrap</h2>

<p>Wrap shading has its uses when more accurate techniques are too expensive, or simply to achieve a certain aesthetic, but common models [1][2] have some deficiencies out of the box. Neither of these is energy conserving and they don&#8217;t really play well with shadows either. On top of that, Valve&#8217;s <em>Half Lambert</em> model [2] has a fixed amount of wrap, so it can&#8217;t be tuned to suit different materials (or, perhaps, to limit shadow oddities). I&#8217;ll come back to the point about flexibility in <a href="http://blog.selfshadow.com/2012/01/07/righting-wrap-part-2/">part 2</a>, but first I&#8217;d like to discuss another factor that&#8217;s easily overlooked: <em>environmental lighting</em>.</p>

<!--more-->


<p>If you&#8217;re set on using some form of wrap shading, then it&#8217;s not just a matter of applying it to your standard direct sources – directional, point and spot lights, for instance – it ought to be carried through to environmental lighting as well! Naturally, the importance of this depends on how strong and directional your secondary lighting is; obviously if you&#8217;re only using constant ambient then there&#8217;s no problem, but these days it&#8217;s fairly common to encode indirect lighting in <em>Spherical Harmonics</em> (SH) [3] and perhaps some additional lights as well. Fortunately, wrap shading in the context of SH lighting is easy, and much like energy conservation it&#8217;s a relatively cheap or free addition, so it&#8217;s worth considering even if the results prove to be subtle.</p>

<h2>Looking After the Environment</h2>

<p>So, how do we accomplish this? Well, that&#8217;s best explained with a quick recap. If you recall, for diffuse SH lighting, we first project the lighting environment, $f$, into SH:</p>

<p>$$f_{l}^{m} = \int f(s)\,y_{l}^{m}(s)\,\mathrm{d}s$$</p>

<p>(Of course, in practice, this is commonly performed offline as a numerical integration over a cube map.)</p>

<p>We then convolve this with the SH-projected cosine lobe, $h$, like so:</p>

<p>$$c_{l}^{m} = \sqrt{\frac{4\pi}{2l+1}}h_{l}^{0}f_{l}^{m},\quad \text{where } h_{l}^{0} = \int \mathrm{cos}(s)\,y_{l}^{0}(s)\,\mathrm{d}s$$</p>

<p>Next, we can evaluate the lighting (more specifically, <em>irradiance</em>) for a given surface direction, $s$:</p>

<p>$$E(s) = \sum_{i=0}^n\sum_{m=-l}^{l}c_l^m y_l^m(s)$$</p>

<p>Finally, a division by $\pi$ gives us <em>outgoing</em> (or <em>exit</em>) <em>radiance</em>. Personally, I find it convenient to roll these extra terms into $h$ itself. The nice thing about this is that the convolution kernel then boils down (via analytical integration) to easy to remember values for the first three SH bands:</p>

<p>$$\begin{array}{lcl} \hat{h}_{l}^{0} &amp;=&amp; \frac{1}{\pi}\sqrt{\frac{4\pi}{2l+1}}h_{l}^{0} &#92;&#92; \hat{h} &amp;=&amp; \left[1, \frac{2}{3}, \frac{1}{4}, \cdots \right] \end{array}$$</p>

<p>For further details, you can find a complete and approachable account in [4].</p>

<p>Now, back to wrap: adjusting things for our shading model of choice is simply a matter of replacing $\hat{h}$. Let&#8217;s try this for the simple wrap model from Green [1] that Steve already discussed:</p>

<p>$$\frac{1}{\pi}\int_{0}^{2\pi} \int_{0}^{\alpha} \frac{\mathrm{cos}(\theta)+w}{1+w}\mathrm{sin}(\theta)\,\mathrm{d}\theta\mathrm{d}\phi, \quad \text{where }\alpha=\mathrm{cos}^{-1}(-w)$$</p>

<p>From Steve&#8217;s post, we know that we need an additional normalisation factor of $1 + w$ for energy conservation, so the full formula for our new convolution, which I&#8217;ll call $\hat{g}$, is:</p>

<p>$$ \hat{g}_{l}^{0} = \frac{1}{1 + w}\sqrt{\frac{4\pi}{2l+1}}\frac{1}{\pi}\int_{0}^{2\pi} \int_{0}^{\alpha} \frac{\mathrm{cos}(\theta)+w}{1+w}y_{l}^{0}(s)\,\mathrm{sin}(\theta)\,\mathrm{d}\theta\mathrm{d}\phi $$</p>

<p>You can go through a similar process of analytical integration as Steve did, only now with the additional SH basis terms $y_{l}^{0}$, or if you&#8217;re lazy like me, you can throw the formula at a package like Mathematica. Either way, once you&#8217;re done, you&#8217;ll arrive at the following (or something equivalent):</p>

<p>$$\hat{g} = \left[ 1, \frac{1}{3}(2 - w), \frac{1}{4}(1 - w)^2, \cdots \right]$$</p>

<p>We can clearly see that this reduces to the cosine convolution kernel, $\hat{h}$, when $w = 0$. In visual terms, the effect of changing $w$ is evident with a single directional light, as you would expect:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/wrap-1/figure_1.png"></p>

<p style="text-align: center;"><em>Figure 1: Variable wrap shading (0, 0.5, 1) with a single directional light in SH</em></p>


<p>On the other hand, the difference is a lot subtler with a more uniform lighting environment:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/wrap-1/figure_2.png"></p>

<p style="text-align: center;"><em>Figure 2: Variable wrap shading (0, 0.5, 1) with a general SH environment (Grace Cathedral)</em></p>


<h2>It&#8217;s a Wrap?</h2>

<p>Okay, confession time: this post wasn&#8217;t <em>just</em> about wrap shading, since it also serves as a foundation for future posts. For instance, <a href="http://blog.selfshadow.com/2012/01/07/righting-wrap-part-2/">part 2</a> will conveniently segue into shader optimisations, which is a topic I&#8217;ve been planning to write – or, perhaps more accurately, <em>rant</em> – about in general, and I&#8217;ll also be returning to Spherical Harmonics down the line.</p>

<h2>References</h2>

<p>[1] Green, S., <a href="http://http.developer.nvidia.com/GPUGems/gpugems_ch16.html">&#8220;Real-Time Approximations to Subsurface Scattering&#8221;</a>, GPU Gems, 2004.<br/>
[2] Mitchell, J., McTaggart, G., Green, C., <a href="http://www.valvesoftware.com/publications.html">&#8220;Shading in Valve&#8217;s Source Engine&#8221;</a>, Advanced Real-Time Rendering in 3D Graphics and Games, SIGGRAPH Course, 2006.<br/>
[3] Green, R., <a href="http://www.research.scea.com/gdc2003/spherical-harmonic-lighting.pdf">&#8220;Spherical Harmonic Lighting: The Gritty Details&#8221;</a>, GDC 2003.<br/>
[4] Sloan, P.-P, &#8220;Efficient Evaluation of Irradiance Environment Maps&#8221;, <a href="http://tog.acm.org/resources/shaderx/">ShaderX<sup>2</sup>: Shader Programming Tips and Tricks with DirectX 9.0</a>, 2003.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Perpendicular Possibilities]]></title>
    <link href="http://blog.selfshadow.com/2011/10/17/perp-vectors/"/>
    <updated>2011-10-17T03:25:02-04:00</updated>
    <id>http://blog.selfshadow.com/2011/10/17/perp-vectors</id>
    <content type="html"><![CDATA[<p><img class="center" src="http://blog.selfshadow.com/images/perp-vectors/figure_1.png"></p>

<p style="text-align: center;"><em>Figure 1: Major axes for original (left), swizzle (mid) and perpendicular (right) vectors</em></p>


<h2>Introduction</h2>

<p>Two months ago, there was <a href="http://twitter.com/#!/KeefJudge/status/103531192451743744">a question</a> (and subsequent discussion) on Twitter as to how to go about generating a perpendicular unit vector, preferably without branching. It seemed about time that I finally post something more complete on the subject, since there are various ways to go about doing this, as well as a few traps awaiting the unwary programmer.<!--more--></p>

<h2>Solution Quartet</h2>

<p>Here are four options with various trade-offs. If you happen to know of any others, by all means let me know and I&#8217;ll update this post.</p>

<p><em>Note: in all of the following approaches, normalisation is left as an optional post-processing step.</em></p>

<h4>Quick &#8216;n&#8217; Dirty</h4>

<p>A <a href="http://twitter.com/#!/tom_forsyth/status/103678633406763008">quick hack</a> involves taking the cross product of the original unit vector – let&#8217;s call it $\mathbf{u}(x, y, z)$ – with a fixed &#8216;up&#8217; axis, e.g. $(0, 1, 0)$, and then normalising. A problem here is that if the two vectors are very close – or equally, pointing directly away from each other – then the result will be a degenerate vector. However, it&#8217;s still a reasonable approach in the context of a camera, if the view direction can be restricted to guard against this. A general solution in this situation is to fall back to an alternative axis:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">perp_quick</span><span class="p">(</span><span class="kt">float3</span> <span class="n">u</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">v</span><span class="p">;</span>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">u</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">0.99</span><span class="p">)</span>          <span class="c1">// abs(dot(u, UP)), somewhat arbitrary epsilon</span>
</span><span class='line'>        <span class="n">v</span> <span class="o">=</span> <span class="kt">float3</span><span class="p">(</span><span class="o">-</span><span class="n">u</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="n">x</span><span class="p">);</span> <span class="c1">// cross(u, UP)</span>
</span><span class='line'>    <span class="k">else</span>
</span><span class='line'>        <span class="n">v</span> <span class="o">=</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="o">-</span><span class="n">u</span><span class="p">.</span><span class="n">y</span><span class="p">);</span> <span class="c1">// cross(u, RIGHT)</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><em>Listing 1: A quick way to generate a perpendicular vector</em></p>

<h4>Hughes-Möller</h4>

<p>In a neat little <em>Journal of Graphics Tools</em> paper <strong>[1]</strong>, Hughes and Möller proposed a more systematic approach to computing a perpendicular vector. Here&#8217;s the heart of it:</p>

<div markdown="0">$$
\mathbf{\bar{v}} = \begin{cases}
(\;\;\:0,-z,\;\;\; y) & \text{if } |x| < |y| \text{ and } |x| < |z| \\
(-z,\;\;\;0,\;\;\, x) & \text{if } |y| < |x| \text{ and } |y| < |z| \\
(-y,\;\;\,x,\;\;\; 0) & \text{if } |z| < |x| \text{ and } |z| < |y| \\
\end{cases}$$</div>


<p>Or, as the paper also states: &#8220;Take the smallest entry (in absolute value) of $\mathbf{u}$ and set it to zero; swap the other two entries and negate the first of them&#8221;.</p>

<p><img class="center" src="http://blog.selfshadow.com/images/perp-vectors/figure_2.png"></p>

<p style="text-align: center;"><em>Figure 2: Distribution of v over the sphere</em></p>


<p>However, there&#8217;s a problem with this as written: it doesn&#8217;t handle cases where <em>multiple</em> components are the smallest, such as $(0, 0, 1)$! I hit this a few years back when I needed to generate an orthonormal basis for some offline geometry processing, and it&#8217;s easily remedied by replacing $&lt;$ with $\leq$. Here&#8217;s a corrected version in code form:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">perp_hm</span><span class="p">(</span><span class="kt">float3</span> <span class="n">u</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">a</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">u</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">v</span><span class="p">;</span>
</span><span class='line'>    <span class="k">if</span> <span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">&lt;=</span> <span class="n">a</span><span class="p">.</span><span class="n">y</span> <span class="o">&amp;&amp;</span> <span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">&lt;=</span> <span class="n">a</span><span class="p">.</span><span class="n">z</span><span class="p">)</span>
</span><span class='line'>        <span class="n">v</span> <span class="o">=</span> <span class="kt">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="n">u</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
</span><span class='line'>    <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">y</span> <span class="o">&lt;=</span> <span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">&amp;&amp;</span> <span class="n">a</span><span class="p">.</span><span class="n">y</span> <span class="o">&lt;=</span> <span class="n">a</span><span class="p">.</span><span class="n">z</span><span class="p">)</span>
</span><span class='line'>        <span class="n">v</span> <span class="o">=</span> <span class="kt">float3</span><span class="p">(</span><span class="o">-</span><span class="n">u</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="n">x</span><span class="p">);</span>
</span><span class='line'>    <span class="k">else</span>
</span><span class='line'>        <span class="n">v</span> <span class="o">=</span> <span class="kt">float3</span><span class="p">(</span><span class="o">-</span><span class="n">u</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><em>Listing 2: Hughes-Möller perpendicular vector generation</em></p>

<h4>Stark</h4>

<p>More recently, Michael M. Stark suggested some improvements to the Hughes-Möller approach <strong>[2]</strong>. Firstly, his choice of permuted vectors is almost the same – differing only in signs – but even easier to remember:</p>

<div markdown="0">$$
\mathbf{\bar{v}} = \begin{pmatrix} x &#92;&#92; y &#92;&#92; z \end{pmatrix} \times
\begin{pmatrix}
\left[x \text{ is smallest}\right] &#92;&#92;
\left[y \text{ is smallest}\right] &#92;&#92;
\left[z \text{ is smallest}\right]
\end{pmatrix}
$$</div>


<p>In plain English: the perpendicular vector $\mathbf{\bar{v}}$ is found by taking the cross product of $\mathbf{u}$ with the axis of its smallest component. (Note: the same care is needed when multiple components are the smallest.)</p>

<p>Figure 3 visualises this intermediate &#8216;swizzle&#8217; vector over the sphere:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/perp-vectors/figure_3.png"></p>

<p style="text-align: center;"><em>Figure 3: Intermediate swizzle vector</em></p>


<p>Secondly, Michael also provides a branch-free implementation:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float3</span> <span class="n">perp_stark</span><span class="p">(</span><span class="kt">float3</span> <span class="n">u</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">a</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">u</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">uyx</span> <span class="o">=</span> <span class="n">SIGNBIT</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">a</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">uzx</span> <span class="o">=</span> <span class="n">SIGNBIT</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">a</span><span class="p">.</span><span class="n">z</span><span class="p">);</span>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">uzy</span> <span class="o">=</span> <span class="n">SIGNBIT</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">a</span><span class="p">.</span><span class="n">z</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">xm</span> <span class="o">=</span> <span class="n">uyx</span> <span class="o">&amp;</span> <span class="n">uzx</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">ym</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">^</span><span class="n">xm</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">uzy</span><span class="p">;</span>
</span><span class='line'>    <span class="kt">uint</span> <span class="n">zm</span> <span class="o">=</span> <span class="mi">1</span><span class="o">^</span><span class="p">(</span><span class="n">xm</span> <span class="o">&amp;</span> <span class="n">ym</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="kt">float3</span> <span class="n">v</span> <span class="o">=</span> <span class="nb">cross</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="kt">float3</span><span class="p">(</span><span class="n">xm</span><span class="p">,</span> <span class="n">ym</span><span class="p">,</span> <span class="n">zm</span><span class="p">));</span>
</span><span class='line'>    <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><em>Listing 3: Branch-free perpendicular vector generation</em></p>

<p>I know what you&#8217;re thinking, there&#8217;s a problem here too: it should be <code>zm = 1^(xm | ym)</code>! Although still robust, the effect of this error is that the nice property of even, symmetrical distribution over the sphere is lost:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/perp-vectors/figure_4.png"></p>

<p style="text-align: center;"><em>Figure 4: Broken symmetry in the perpendicular vector</br>(that&#8217;s 7 years bad luck!)</em></p>


<h4>XNAMath</h4>

<p>Finally, another branch-free solution is provided in the form of <code>XMVector3Orthogonal</code>, which is part of the XNAMath library. Here&#8217;s the actual code taken from the <a href="http://msdn.microsoft.com/en-gb/directx/">DirectX SDK</a> (June 2010):</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
</pre></td><td class='code'><pre><code class='cpp'><span class='line'><span class="n">XMFINLINE</span> <span class="n">XMVECTOR</span> <span class="n">XMVector3Orthogonal</span><span class="p">(</span><span class="n">FXMVECTOR</span> <span class="n">V</span><span class="p">)</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">NegativeV</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">Z</span><span class="p">,</span> <span class="n">YZYY</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">ZIsNegative</span><span class="p">,</span> <span class="n">YZYYIsNegative</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">S</span><span class="p">,</span> <span class="n">D</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">R0</span><span class="p">,</span> <span class="n">R1</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">Select</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">Zero</span><span class="p">;</span>
</span><span class='line'>    <span class="n">XMVECTOR</span> <span class="n">Result</span><span class="p">;</span>
</span><span class='line'>    <span class="k">static</span> <span class="n">CONST</span> <span class="n">XMVECTORU32</span> <span class="n">Permute1X0X0X0X</span> <span class="o">=</span>
</span><span class='line'>        <span class="p">{</span><span class="n">XM_PERMUTE_1X</span><span class="p">,</span> <span class="n">XM_PERMUTE_0X</span><span class="p">,</span> <span class="n">XM_PERMUTE_0X</span><span class="p">,</span> <span class="n">XM_PERMUTE_0X</span><span class="p">};</span>
</span><span class='line'>    <span class="k">static</span> <span class="n">CONST</span> <span class="n">XMVECTORU32</span> <span class="n">Permute0Y0Z0Y0Y</span> <span class="o">=</span>
</span><span class='line'>        <span class="p">{</span><span class="n">XM_PERMUTE_0Y</span><span class="p">,</span> <span class="n">XM_PERMUTE_0Z</span><span class="p">,</span> <span class="n">XM_PERMUTE_0Y</span><span class="p">,</span> <span class="n">XM_PERMUTE_0Y</span><span class="p">};</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">Zero</span> <span class="o">=</span> <span class="n">XMVectorZero</span><span class="p">();</span>
</span><span class='line'>    <span class="n">Z</span> <span class="o">=</span> <span class="n">XMVectorSplatZ</span><span class="p">(</span><span class="n">V</span><span class="p">);</span>
</span><span class='line'>    <span class="n">YZYY</span> <span class="o">=</span> <span class="n">XMVectorPermute</span><span class="p">(</span><span class="n">V</span><span class="p">,</span> <span class="n">V</span><span class="p">,</span> <span class="n">Permute0Y0Z0Y0Y</span><span class="p">.</span><span class="n">v</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">NegativeV</span> <span class="o">=</span> <span class="n">XMVectorSubtract</span><span class="p">(</span><span class="n">Zero</span><span class="p">,</span> <span class="n">V</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">ZIsNegative</span> <span class="o">=</span> <span class="n">XMVectorLess</span><span class="p">(</span><span class="n">Z</span><span class="p">,</span> <span class="n">Zero</span><span class="p">);</span>
</span><span class='line'>    <span class="n">YZYYIsNegative</span> <span class="o">=</span> <span class="n">XMVectorLess</span><span class="p">(</span><span class="n">YZYY</span><span class="p">,</span> <span class="n">Zero</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">S</span> <span class="o">=</span> <span class="n">XMVectorAdd</span><span class="p">(</span><span class="n">YZYY</span><span class="p">,</span> <span class="n">Z</span><span class="p">);</span>
</span><span class='line'>    <span class="n">D</span> <span class="o">=</span> <span class="n">XMVectorSubtract</span><span class="p">(</span><span class="n">YZYY</span><span class="p">,</span> <span class="n">Z</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">Select</span> <span class="o">=</span> <span class="n">XMVectorEqualInt</span><span class="p">(</span><span class="n">ZIsNegative</span><span class="p">,</span> <span class="n">YZYYIsNegative</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">R0</span> <span class="o">=</span> <span class="n">XMVectorPermute</span><span class="p">(</span><span class="n">NegativeV</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">Permute1X0X0X0X</span><span class="p">.</span><span class="n">v</span><span class="p">);</span>
</span><span class='line'>    <span class="n">R1</span> <span class="o">=</span> <span class="n">XMVectorPermute</span><span class="p">(</span><span class="n">V</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">Permute1X0X0X0X</span><span class="p">.</span><span class="n">v</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">Result</span> <span class="o">=</span> <span class="n">XMVectorSelect</span><span class="p">(</span><span class="n">R1</span><span class="p">,</span> <span class="n">R0</span><span class="p">,</span> <span class="n">Select</span><span class="p">);</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">return</span> <span class="n">Result</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><em>Listing 4: XNAMath&#8217;s semi-vectorised method</em></p>

<p>Let me save you the trouble of parsing this fully (or consider it an exercise for later); if you boil it down, what you&#8217;re effectively left with is:</p>

<div markdown="0">$$
\mathbf{\bar{v}} =
\begin{pmatrix}x&#92;&#92;y&#92;&#92;z\end{pmatrix}
\times
\begin{pmatrix}0&#92;&#92;-\mathrm{sign}(yz)&#92;&#92;1\end{pmatrix}
\quad \text{where } \mathrm{sign}(x) = 
\begin{cases}
-1 &\text{if } x < 0 \\
\;\;\:1 &\text{if }x \geq 0
\end{cases}
$$</div>


<p>I&#8217;ve failed, thus far, to pinpoint the origin of or thought process behind this approach. That said, some insight can be gained from visualising the resulting vectors:</p>

<p><img class="center" src="http://blog.selfshadow.com/images/perp-vectors/figure_5.png"></p>

<p style="text-align: center;"><em>Figure 5: Covering one&#8217;s axis</em></p>


<p>Their maximum component is in the x axis, except close to the +/-ve x poles. Essentially, Microsoft&#8217;s solution ensures robustness without concern for distribution, much like the initial &#8216;quick&#8217; approach.</p>

<h2>Performance</h2>

<p>I haven&#8217;t benchmarked these implementations, since in cases where I&#8217;ve needed to generate perpendicular vectors, absolute speed wasn&#8217;t important or the call frequency was vanishingly small. Even in performance-critical situations, it really depends on what properties/restrictions you can live with and your target architecture(s). Still, I can&#8217;t help but think that <code>XMVector3Orthogonal</code> is doing a little bit more than it needs to, so maybe there&#8217;s cause to revisit this subject at a later date.</p>

<h2>Conclusion</h2>

<p>I hope you&#8217;ve learnt something about generating perpendicular vectors, or that I&#8217;ve at least made you aware of some of the minor issues in previous work on the subject. On that note, if you spot any <em>new</em> errors here, please let me know!</p>

<h2>References</h2>

<p>[1] Hughes, J. F., Möller, T., “Building an Orthonormal Basis from a Unit Vector”, Journal of Graphics Tools 4:4 (1999), 33-35.<br/>
[2] Stark, M. M., &#8220;Efficient Construction of Perpendicular Vectors without Branching&#8221;, Journal of Graphics Tools 14:1 (2009), 55-61.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Hidden Costs]]></title>
    <link href="http://blog.selfshadow.com/2011/10/01/hidden-costs/"/>
    <updated>2011-10-01T18:56:14-04:00</updated>
    <id>http://blog.selfshadow.com/2011/10/01/hidden-costs</id>
    <content type="html"><![CDATA[<p>I recently added two of my existing publications, one about high performance <a href="http://blog.selfshadow.com/publications/practical-visibility/">dynamic visibility</a> and the other on how to display <a href="http://blog.selfshadow.com/publications/overdraw-in-overdrive/">pixel quad overshading</a> in real-time on Xbox 360.</p>

<p>The first of these was originally published in <a href="http://downloads.akpeters.com/gpupro/">GPU Pro 2</a>. Unfortunately, I missed some errors that crept into the typeset version, so I was pleased to finally correct those and I took the opportunity to rework a few sentences for greater clarity as well. Now that it&#8217;s online, I&#8217;ll also be able to refer directly to certain sections in follow-up blog posts on the subject.</p>

<p>The second took the form of a journal entry for the Microsoft Game Developer Network, which went up in the spring. It may have flown under your radar, as I&#8217;ve since spoken to a few developers who hadn&#8217;t seen it, yet were keen to have such a tool in their engine. For NDA reasons, I can&#8217;t go into all of the implementation details here, so think of it as a &#8216;graphical appetiser&#8217;.</p>

<p>In a way, the two topics are related: the primary goal of a visibility system is to efficiently remove parts of the world that can&#8217;t be seen from a given viewpoint, whereas the purpose of a debug <em>overshading</em> mode is to directly visualise pixel shader work, some of which can likewise have zero contribution to the final image.</p>

<p>I think it&#8217;s also fair to say that keeping both forms of redundancy in check is a critical part of optimising the rendering performance of most AAA titles. For that reason, I hope you find these articles useful, and as always, please let me know what you think!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[HPG/SIGGRAPH 2011]]></title>
    <link href="http://blog.selfshadow.com/2011/08/13/hpg-siggraph-2011/"/>
    <updated>2011-08-13T19:28:41-04:00</updated>
    <id>http://blog.selfshadow.com/2011/08/13/hpg-siggraph-2011</id>
    <content type="html"><![CDATA[<table style="margin-left:auto;margin-right:auto;">
<td><img src="http://blog.selfshadow.com/images/hpg-siggraph-2011/hpg_logo.png"></td>
<td><img src="http://blog.selfshadow.com/images/hpg-siggraph-2011/s11_logo.png"></td>
</table>


<p>Both HPG and SIGGRAPH were a blast and I&#8217;m intending to write up a full report soon, but here are some links to conference content in the meantime. If you have any additional sources, please let me know in the comments section and I&#8217;ll update the post accordingly.<!--more--></p>

<h2>HPG</h2>

<p>Practically all of the slides and posters are available <a href="http://highperformancegraphics.org/program.php">here</a>, including Hot3D presentations. For everything else, see Ke-Sen Huang&#8217;s <a href="http://kesen.realtimerendering.com/hpg2011Papers.htm">list</a>.</p>

<h2>SIGGRAPH</h2>

<p>If you&#8217;re playing catch up and wondering where to start, the <a href="http://www.realtimerendering.com/blog/tag/siggraph-2011/">Real-Time Rendering blog</a> has your back.</p>

<h4>Birds of a Feather</h4>

<p><a href="http://www.blender.org/blenderorg/blender-foundation/press/siggraph-2011">Blender Foundation Community Meeting</a><br/>
<a href="http://code.google.com/p/cortex-vfx/downloads/detail?name=SIGGRAPH_2011_BOF_Slides.pdf">Cortex Open-Source Framework</a><br/>
There are also videos of the BOF <a href="http://www.vimeo.com/cortex/videos">here</a> (via <a href="http://twitter.com/#!/ImageEngine/status/106387320214790145">@ImageEngine</a>)<br/>
<a href="http://www.khronos.org/developers/library/2011-siggraph-opencl-bof">OpenCL</a><br/>
<a href="http://www.khronos.org/developers/library/2011-siggraph-opengl-bof">OpenGL</a><br/>
An annotated version of the <em>Brink Preferred Rendering With OpenGL</em> presentation can be found on the Splash Damage <a href="http://www.splashdamage.com/publications">publications page</a> (via <a href="http://twitter.com/#!/pixelmager/status/106555089736568832">@pixelmager</a>)<br/>
<a href="http://www.khronos.org/webgl/wiki/Presentations">WebGL</a> (slides + demos)</p>

<h4>Courses</h4>

<p><a href="http://advances.realtimerendering.com/s2011/index.html">Advances in Real-Time Rendering in Games</a><br/>
The <em>Battlefield 3</em> and <em>Need For Speed: The Run</em> presentation is also online <a href="http://www.slideshare.net/DICEStudio/five-rendering-ideas-from-battlefield-3-need-for-speed-the-run">here</a> (via <a href="http://twitter.com/#!/repi/status/104302945486639104">@repi</a>)<br/>
<a href="http://www.youtube.com/watch?v=3Qa2aXaUUGg">Applying Color Theory to Digital Media &amp; Animation</a> (video)<br/>
<a href="http://bps11.idav.ucdavis.edu/">Beyond Programmable Shading</a> (via <a href="http://twitter.com/#!/aaronlefohn/status/107447365648121856">@aaronlefohn</a>)<br/>
<a href="http://web.media.mit.edu/~mhirsch/byo3d/">Build Your Own 3D Display</a><br/>
<a href="http://sites.google.com/site/s2011compilers/">Compiler Technology for Rendering</a><br/>
<a href="http://bulletphysics.org/siggraph2011/">Destruction and Dynamics for Film and Game Production</a><br/>
<a href="http://iryoku.com/aacourse/">Filtering Approaches for Real-Time Anti-Aliasing</a><br/>
<a href="http://www.daveshreiner.com/SIGGRAPH/s11/">Introduction to Modern OpenGL Programming</a><br/>
<a href="http://pub.ist.ac.at/group_wojtan/meshyfluidscourse/meshyfluidscourse.html">Liquid Simulation with mesh-based Surface Tracking</a><br/>
Matthias Müller-Fischer&#8217;s slides are also online <a href="http://matthiasmueller.info"></a>here<br/>
<a href="http://www.cs.purdue.edu/cgvlab/urban/sg_2011_course/contents.html">Modeling 3D Urban Spaces Using Procedural and Simulation-Based Techniques</a> (via Naty)<br/>
<a href="http://physbam.stanford.edu/~mlentine/courses.html">PhysBAM: Physically Based Simulation</a><br/>
<a href="http://magnuswrenninge.com/productionvolumerendering">Production Volume Rendering</a><br/>
<a href="http://developer.nvidia.com/siggraph-2011-stereoscopy-course">Stereoscopy From XY to Z</a></p>

<h4>Emerging Technologies</h4>

<p><a href="http://www.mpi-inf.mpg.de/resources/brdfDisplay/">A Dynamic BRDF Display</a> (related EG paper)<br/>
<a href="http://affect.media.mit.edu/publications.php">A Medical Mirror for Non-Contact Health Monitoring</a><br/>
<a href="http://www.viddler.com/explore/engadget/videos/3035/">Face-to-Avatar</a> (video)<br/>
<a href="http://www.designinterface.jp/projects/FuwaFuwa/">FuwaFuwa: Detecting Shape Deformations on Soft Objects Using Directional Photo-reflectivity Measurements</a><br/>
<a href="http://www.labri.fr/perso/hachet/publications/Toucheo.html">Immersive Multitouch Workspace</a><br/>
<a href="http://www.kosaka-lab.com/kosaka_laboratory/2011/08/mommytummy-2.php">Mommy Tummy: A Pregnancy Experience System Simulating Fetal Movement</a><br/>
<a href="http://paulusch.nl/2011/08/11/molebot-mole-table/">MoleBot: Mole in a Table</a><br/>
<a href="http://vimeo.com/27454215">Photochromic Sculpture: Volumetric Color-Forming Pixels</a> (video)<br/>
<a href="http://www.youtube.com/watch?v=eTY2hTXT1IQ">PocoPoco: A Tangible Device That Allows Users To Play Dynamic Tactile Interaction</a> (video)<br/>
<a href="http://lakatosdavid.hu/?p=297">Recompose: Direct and Gestural Interaction With an Actuated Surface</a><br/>
<a href="http://www.disneyresearch.com/research/projects/hci_surround_haptics_drp.htm">Surround Haptics: Sending Shivers Down Your Spine</a><br/>
<a href="http://www.irisa.fr/bunraku/GENS/alecuyer/">The Virtual Crepe Factory: 6DoF Haptic Interaction with Fluids</a><br/>
<a href="http://www.alab.t.u-tokyo.ac.jp/~shinolab/projects/touchinterfaceonbackofhand/">Touch Interface on Back of the Hand</a><br/>
<a href="http://kaji-lab.jp/en/index.php?people/hachisu/publications">Vection Field for Pedestrian Traffic Control</a></p>

<h4>Exhibitor Tech Talks</h4>

<p><a href="http://software.intel.com/en-us/articles/siggraph-2011-event/">Intel</a><br/>
<a href="http://www.nvidia.com/object/siggraph-2011.html">NVIDIA</a> (via <a href="http://twitter.com/#!/Icare3D/status/101417196911206400">@Icare3D</a>)
You can also find streamable videos <a href="http://fullviewmedia.com/fb/nvidia/archive/video.html">here</a> (via Naty)</p>

<h4>Keynote</h4>

<p>Although Cory Doctorow&#8217;s keynote isn&#8217;t about rendering as such, the topic of digital rights is still highly relevant to the videogame industry, so <a href="http://www.youtube.com/watch?v=hfU6e6--izo">here it is</a> (via Naty).</p>

<h4>Posters</h4>

<p><a href="http://av.dfki.de/publications_2011/3d-shape-scanning-with-a-kinect">3D Shape Scanning With a Kinect</a><br/>
<a href="http://hal.inria.fr/inria-00617857/en/">3D Inverse Dynamic Modeling of Strands</a><br/>
<a href="http://tcts.fpms.ac.be/~tilmanne/">Adaptive Training of Hidden Markov Models for Stylistic Walk Synthesis</a><br/>
<a href="http://uwaterloo.academia.edu/KarenCollins/Papers/650300/Framework_for_distributed_audio_smartphone_games">A Framework For Distributed-Audio Smartphone Games</a><br/>
<a href="http://blogs.agi.com/agi/?p=2456">A Screen-Space Approach to Rendering Polylines on Terrain</a><br/>
<a href="http://ligiaduro.com/8666/83310/gallery/abstract-ocean-waves">Abstract Ocean Waves</a> (gallery)<br/>
<a href="http://w3.impa.br/~aschulz/ChoreoGraphics/index.html">ChoreoGraphics: An Authoring Environment for Dance Shows</a><br/>
<a href="http://www.ece.lsu.edu/xinli/Research/Guarding.html">Computing Optimal Guarding and Star Decomposition of 3D Models</a><br/>
<a href="http://iphome.hhi.de/schneider/">Deshaking Endoscopic Video for Kymography</a><br/>
<a href="http://image.inha.ac.kr/index.php/Publications">Design and Optimization of Image-Processing Algorithms on Mobile GPU</a><br/>
Dual Sphere-Unfolding Method for Single Pass Omni-directional Shadow Mapping (<a href="http://www.cimat.mx/~alberto/Paper.pdf">pdf</a>, <a href="http://www.youtube.com/watch?v=WnNOMTDmYTg">video</a>)<br/>
<a href="http://www.cgl.uwaterloo.ca/poster.html">Embroidery Modeling and Rendering in Real Time</a><br/>
<a href="http://wanochoi.com/?p=663">Fluid Simulation Without Pressure</a><br/>
<a href="http://zurich.disneyresearch.com/~owang/">Gradient Domain HDR Compositing</a><br/>
<a href="http://www.cs.umass.edu/~ruiwang/">Hierarchical Upsampling for Fast Image-based Depth Estimation</a><br/>
<a href="http://graphics.tu-bs.de/publications/kinectVVC/">Integrating Multiple Depth Sensors Into the Virtual Video Camera</a><br/>
<a href="http://www.cl.cam.ac.uk/~ls426/projects/">Layered Photo Pop-Up</a><br/>
<a href="http://www.jku.at/cg/content/e48343/">Light-Field Caching</a><br/>
<a href="http://www.jku.at/cg/content/e48343/">Light-Field Retargeting With Focal-Stack Seam Carving</a><br/>
<a href="http://www.xlab.sfc.keio.ac.jp/?page_id=306">Metamorphic Light: A Tabletop Tangible Interface Using Deformation of Plain Paper</a><br/>
<a href="http://www.drunk-boarder.com/works/meta-ryoshka/">Meta-Ryoshka: Haptic Illusion on Perceiving Shape</a><br/>
<a href="http://hal.inria.fr/inria-00611915/en">Multiscale Feature-Preserving Smoothing of Tomographic Data</a><br/>
<a href="http://www.rioleo.org/protoviewer/">Protoviewer: A Visual Design Environment for Protovis</a> (demo)<br/>
<a href="http://onnote.org/publication.html">onNote: A Musical Interface Using Markerless Physical Scores</a><br/>
<a href="http://www.sdm.ssi.ist.hokudai.ac.jp/unofficialpapers/mizo_siggraph11poster.pdf">Parts Identification and Motion Estimation on CT Scanned Assembly Meshes</a> (pdf)<br/>
<a href="http://aun.academia.edu/MohamedYousef/Papers/833944/ParXII_Optimized_Data-Parallel_Exemplar-Based_Image_Inpainting">ParXII: Optimized, Data-Parallel Exemplar-Based Image Inpainting</a><br/>
<a href="http://www.cyber.t.u-tokyo.ac.jp/~take/works/prima">PRIMA (Parallel Reality-based Interactive Motion Area)</a><br/>
<a href="http://w3.impa.br/~andmax/publications.html">Real-time Terrain Modeling using CPU-GPU Coupled Computation</a><br/>
<a href="http://graphics.tu-bs.de/people/berger/">Refractive Index-Dependent Bidirectional Scattering Distribution Functions</a><br/>
<a href="http://sherholz.wordpress.com/publications/">Screen-Space Spherical Harmonics Occlusion (S3HO) Sampling</a><br/>
<a href="http://www.cgl.uwaterloo.ca/poster.html">Self-Organized Criticality as a Method of Procedural Modeling</a><br/>
<a href="http://userver.ftw.at/~pucher/">Simultaneous Speech and Animation Synthesis</a><br/>
<a href="http://vimeo.com/26255322">SonalShooter: A Spatial Augmented Reality System Using Handheld Directional Speaker with Camera</a><br/>
<a href="http://graphics.im.ntu.edu.tw/~dreamway/projects/stereoscopy/">Stereoscopic Media Editing Based on 3D Cinematography Principles</a><br/>
<a href="http://sites.google.com/a/onailab.com/yamamoto/">Synthesis of a Video of a Performer Appearing to Play User-speciﬁed Music</a><br/>
<a href="http://www.cs.ucsb.edu/~daniel/publications/conferences/siggraph11/index.html">The Composition Context in Point-and-Shoot Photography</a><br/>
<a href="http://www.rug.nl/cit/hpcv/publications/watershader/index">Tiled Directional Flow</a> (videos, code)<br/>
<a href="http://www.alab.t.u-tokyo.ac.jp/~shinolab/projects/touchinterfaceonbackofhand/">Touch Interface on Back of the Hand</a><br/>
<a href="http://www.freeviewpointvideo.co.uk/Publications/tacvsl.php">Towards a Computer Vision Shader Language</a><br/>
<a href="http://www.j3l7h.de/publications.html">Turning a Graphics Tablet Into a Transparent Blackboard</a><br/>
<a href="http://graphics.cs.yale.edu/patrick/">Using Statistical Topic Models to Organize and Visualize Large-Scale Architectural Image Databases</a><br/>
<a href="http://www.youtube.com/watch?v=KhijiHo9JH0">VITA: Visualization System for Interaction With Transmitted Audio Signals</a> (video)<br/>
<a href="http://www.cs.uoi.gr/~fudos/siggraph2011.html">Z-fighting Aware Depth Peeling</a></p>

<h4>Production Sessions</h4>

<p><a href="http://www.guerrilla-games.com/publications/index.html">Guerrilla: The Creation of Killzone 3</a> (via Naty)</p>

<h4>Talks</h4>

<p><a href="http://www.neulander.org/work/">Adaptive Importance Sampling for Multi-Ray Gathering</a><br/>
<a href="http://students.viz.tamu.edu/qingxing/">Band Decomposition of 2-Manifold Meshes For Physical Construction of Large Structures</a><br/>
<a href="http://magnuswrenninge.com/publications/attachment/wrenninge-capturingthinfeaturesinsmokesimulations-2">Capturing Thin Features In Smoke Simulations</a><br/>
<a href="http://mrl.snu.ac.kr/~ejjoo/">Data-driven Bird Simulation</a><br/>
<a href="http://library.imageworks.com/">Decoupled Ray Marching of Heterogeneous Participating Media</a><br/>
<a href="http://faculty.kaust.edu.sa/sites/MarkusHadwiger/Pages/home.aspx">Demand-Driven Volume Rendering of Terascale EM Data</a> (slides)<br/>
<a href="http://www.jku.at/cg/content/e48343">Display Pixel Caching</a><br/>
<a href="http://gl.ict.usc.edu/Research/FC/">Facial Cartography: Interactive High-Resolution Scan Correspondence</a><br/>
<a href="http://altdevblogaday.com/2011/08/26/pixeljunk-shooter-2-siggraph-talk/">Fluid Dynamics and Lighting Implementation in PixelJunk Shooter 2</a> (video presentation and slides) (via <a href="http://twitter.com/#!/okonomiyonda/status/107115998414520321">@okonomiyonda</a>)<br/>
<a href="http://developer.nvidia.com/siggraph-2011">Generating Displacement From Normal Map for Use in 3D Games</a><br/>
<a href="http://docs.google.com/present/view?id=d4wf4t2_251g4kjtwgs">Google Body: 3D Human Anatomy in the Browser</a> (via Naty)<br/>
<a href="http://baileydan.com/">GPU Fluids in Production: A Compiler Approach to Parallelism</a><br/>
<a href="http://pismosoftware.co.uk/mashhuda/publications.htm">High-Resolution Relightable Buildings from Photographs</a><br/>
<a href="http://www-ljk.imag.fr/Publications/Basilic/com.lmc.publi.PUBLI_Inproceedings@12f67a0f733_f4cc23/index_en.html">Implicit FEM and Fluid Coupling on GPU for Interactive Multiphysics Simulation</a><br/>
<a href="http://library.imageworks.com/">Importance Sampling of Area Lights in Participating Media</a><br/>
<a href="http://cs.unc.edu/~sewall/">Interactive Hybrid Simulation of Large-Scale Traffic</a><br/>
<a href="http://artis.imag.fr/Publications/2011/CNSGE11a/">Interactive Indirect Illumination Using Voxel Cone Tracing: An Insight</a> (full paper <a href="http://artis.imag.fr/Publications/2011/CNSGE11b/">here</a>)<br/>
<a href="http://library.imageworks.com/">Kami Geometry Instancer: Putting the &#8220;Smurfy&#8221; in Smurf Village</a><br/>
<a href="http://www.youtube.com/watch?v=quGhaggn3cQ">KinectFusion: Real-Time Dynamic 3D Surface Reconstruction and Interaction</a> (video)<br/>
<a href="http://cybertron.cg.tu-berlin.de/eitz/">Learning to Classify Human Object Sketches</a><br/>
<a href="http://bpeers.com/blog/index.php?itemid=972">Making Faces - Eve Online&#8217;s New Portrait Rendering</a> (via <a href="http://twitter.com/#!/tuan_kuranes_rs/status/102717112551866369">@tuan_kuranes_rs</a>)<br/>
<a href="http://hdrv.itn.liu.se/papers/2011-SiggraphTalk.pdf">Next Generation Image Based Lighting using HDR Video</a> (pdf)<br/>
<a href="http://home.postech.ac.kr/~sodomau/">Non-uniform Motion Deblurring for Camera Shakes using Image Registration</a><br/>
<a href="http://www.clownfrogfish.com/2011/10/11/occlusion-culling-in-alan-wake/">Occlusion Culling in Alan Wake</a> (new link, via Naty)<br/>
<a href="http://garanzha.com">Out-of-core GPU Ray Tracing of Complex Scenes</a><br/>
<a href="http://perso.telecom-paristech.fr/~boubek/papers/TCoCo-talk/">Parameterizing Animated Lines for Stylized Rendering</a> (links to NPAR paper)<br/>
<a href="http://students.viz.tamu.edu/qingxing/">Pattern Mapping with Quad-Pattern-Coverable Quad-Meshes</a><br/>
<a href="http://developer.nvidia.com/siggraph-2011">Per-Face Texture Mapping for Real-time Rendering</a> (see &#8220;Real-time Ptex&#8221;)<br/>
<a href="Practical%20Occlusion%20Culling%20in%20Killzone%203">Practical Occlusion Culling in Killzone 3</a> (via Naty)<br/>
<a href="http://www.cs.utah.edu/~loos/publications/">Run-Time Implementation of Modular Radiance Transfer</a> (slides, <a href="http://youtu.be/8vMFwXMR3Dg">video</a>)<br/>
<a href="http://perso.telecom-paristech.fr/~boubek/papers/SBL/">SBL Mesh Filter: A Fast Separable Approximation of Bilateral Mesh Filtering</a><br/>
<a href="http://vimeo.com/23469574">SpeedFur: A GPU-Based Procedural Hair and Fur Modeling System</a> (video)<br/>
<a href="http://www.crytek.com/cryengine/presentations">Spherical Skinning with Dual-Quaternions and QTangents</a> (via <a href="http://twitter.com/#!/NIV_Anteru/status/108491502350635009">@NIV_Anteru</a>)<br/>
<a href="http://gautron.pascal.free.fr/publications.htm">Triple Depth Culling</a><br/>
<a href="http://www.youtube.com/watch?v=A_3bQsO4nFA">Who Do You Think You Really Are</a> (trailer)</p>

<h4>Technical Papers</h4>

<p>As always, Ke-Sen Huang has tirelessly gathered links to all of the available technical papers and associated media <a href="http://kesen.realtimerendering.com/sig2011.html">here</a>. There&#8217;s also a <a href="http://www.youtube.com/watch?v=JK9EEE3RsKM">video preview</a> online, as well as a <a href="http://www.siggraph.org/s2011/sites/org.s2011/files/papers-first-pages-siggraph-2011.pdf">PDF</a> (warning: 184MB) containing the first page of every paper. (Via the <a href="http://www.realtimerendering.com/blog/some-info-on-the-siggraph-2011-papers/">Real-Time Rendering blog</a>.)</p>

<h4>The Studio</h4>

<p><a href="http://aras-p.info/blog/2011/08/17/fast-mobile-shaders-or-i-did-a-talk-at-siggraph/">Fast Mobile Shaders</a> (originally <em>How to Write Fast iPhone and Android Shaders in Unity</em>)<br/>
Follow <a href="http://blogs.unity3d.com/2011/08/18/fast-mobile-shaders-talk-at-siggraph/">this link</a> for further Q&amp;A.<br/>
<a href="http://www.chrisevans3d.com/pub_blog/?p=724">Introduction to Python Scripting</a> (via Naty)<br/>
<a href="http://blogs.unity3d.com/2011/09/08/special-effects-with-depth-talk-at-siggraph/">Special Effects With Depth</a> (via <a href="http://twitter.com/#!/kubacupisz/status/111894837858537472">@kubacupisz</a>, Naty)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Specular Showdown in the Wild West]]></title>
    <link href="http://blog.selfshadow.com/2011/07/22/specular-showdown/"/>
    <updated>2011-07-22T10:45:34-04:00</updated>
    <id>http://blog.selfshadow.com/2011/07/22/specular-showdown</id>
    <content type="html"><![CDATA[<p><img class="center" src="http://blog.selfshadow.com/images/showdown/goodbaduglyset.jpg"></p>

<p style="text-align: center;"><em>&#8220;You see, in this world there&#8217;s two kinds of people, my friend:<br>Those with loaded guns and those who dig. You dig.&#8221; - Blondie</em></p>


<h2>Saddle Up!</h2>

<p>In this post, I&#8217;ll be reviewing some existing methods for attaining well-behaved specular lighting. I&#8217;ll also cover a simple twist on these that fits better with current game lighting approaches and console memory constraints.</p>

<p>What do I mean by <em>well-behaved</em>? I&#8217;m talking about avoiding specular highlight <em>shimmering</em> on bumpy surfaces, as well as achieving the right appearance in the distance: the combined effect of these bumps as individual wrinkles and irregularities become too small to make out. Can we do all of this on a budget? Let&#8217;s hit the trail and find out!<!--more--></p>

<h2>The Good, the Bad and the Ugly</h2>

<p>Some days it feels like those of us involved in videogame rendering are a bunch of cowboys: we play fast and loose with the laws of the land as we rush to get things up on screen in time, wrangling pixels along the way to produce the look that we (and our artists) want.</p>

<p>Lately we&#8217;ve been righting some wrongs by adopting linear lighting [1] and physically-based shading models [2] - something even our more civilised neighbours in film have only recently been transitioning to (see other presentations from the same course). That&#8217;s all well and good, but before we get ahead of ourselves and start believing that the Wild West days are over, there&#8217;s another major area that we need to be tackling better: <em>aliasing</em>.</p>

<p>As Dan Baker rightly argues [3], aliasing is one key differentiator between us real-time folk and the offline guys. Whilst they can afford to throw more samples at the problem, heavy supersampling would be too slow for us. (Admittedly, I haven&#8217;t tried [4] in production, but I&#8217;m not expecting it to perform well on current consoles.) MSAA is also ineffective as it only handles edge aliasing, so under-sampling artefacts within shading will remain - specular shimmering being a prime example. Post-process AA (for which there are now many potential options [5]) isn&#8217;t really helpful either since it does nothing to address sharp highlights popping in and out of existence as the camera or objects move. Finally, you might think that temporal AA could be a solution, but that&#8217;s really just a poor man&#8217;s supersampling across frames, with added reliance on temporal coherency [6].</p>

<p>In summary, the standard AA techniques we use in games are pretty hopeless for combating shimmering - particularly for higher specular powers - and none of them achieve the distance behaviour that we want either. Performance aside, I strongly suspect that even supersampling falls short in that case unless an obscene number of samples are used together with custom texture filtering, otherwise bump information will be averaged away. (Well, not entirely; more on this later.)</p>

<p>So, what other options do we have? At an earlier conference [7], Dan covered two workarounds commonly employed by developers: scaling down bumpiness or glossiness in the distance. The first is really wrong though, as it gives us the opposite of what we want: rather than a bumpy surface looking duller when further from the camera, flattening the normal map leads to a <em>more</em> glossy appearance! Instead, reducing the specular power is - to a first approximation - the right thing to do. Although it&#8217;s something that needs to be tweaked on a case-by-case basis and doesn&#8217;t work correctly for normal maps with both bumpy and flat areas, it&#8217;s still better than simply living with aliasing or avoiding high powers altogether. I should add that texture-space lighting also gets a mention, but it&#8217;s another heavyweight alternative with its own set of problems, so I won&#8217;t discuss it further here. (The idea of marrying this with a dynamic virtual texture cache boggles my mind though!)</p>

<p>All told, are we resigned to being ransacked by badly behaving specular and ugly shimmering? Maybe not, as there&#8217;s a new sheriff in town&#8230;</p>

<h2>CLEANing up the Streets</h2>

<p><em>LEAN Mapping</em> [8] is a recent approach for robust filtering of specular highlights across all scales (with the possible exception of magnification - more on this in a bit). Not only does it model the macro effect of surface roughness in the distance really well - even generating anisotropic highlights from ridged normal maps - but it also supports combining layers of dynamic bumps at runtime (for a few dollars more, naturally).</p>

<p>There isn&#8217;t really the space to go into the gritty details of how it works - for that, you can check the references - but it&#8217;s definitely at the more practical end of the solution spectrum compared to many previous techniques. It shipped with Civilization V after all, and from my experience so far the results are impressive.</p>

<p>That said, there are some significant roadblocks preventing immediate, widespread adoption:</p>

<ul>
<li>Heavy storage demands</li>
<li>Anisotropic, tangent-space Beckmann formulation</li>
</ul>


<p>Off the bat, memory requirements will be a limiting factor for many developers. Not only does LEAN Mapping need two textures in place of a standard normal map (and maybe more to combine layers), but the overhead is compounded by pesky precision requirements, for similar reasons as Variance Shadow Maps [9]. Firaxis did manage to squeeze things down to 8-bit per-channel storage in some cases, but the paper advises: &#8220;In general, 8-bit textures only make sense if absolutely needed for speed or space&#8221;.</p>

<p>Unsurprisingly, this rules out DXT1 or DXT5, which are two of the most common cross-platform formats for normal maps on current consoles. By comparison, we could be facing at least 8 times the storage cost (possibly less if the normal is recovered from the core LEAN terms and other data is packed in its place). Yowzers!</p>

<p>Things get even rougher for deferred rendering, where G-buffer space is typically a scarce commodity and to make matters worse, the lighting formulation further complicates things. Even assuming that everything could be moved to a common space (such as world space), we would still need to store an additional tangent vector! There&#8217;s also more maths involved with LEAN Mapping, so that&#8217;s another important consideration regardless of how we choose to light our environments.</p>

<p>So, overall, it&#8217;s certainly not the drop-in replacement that it might first appear. Fortunately the sheriff has a deputy who&#8217;s quicker on the draw: at this year&#8217;s GDC, Dan showed off a cut down version, <em>CLEAN </em>(Cheap LEAN) <em>Mapping</em> [3], that sacrifices anisotropy for lower storage requirements - roughly half - and slightly higher performance. That&#8217;s a significant improvement and losing anisotropy effectively resolves the second issue as well, but the footprint over DXT is still rather steep for my taste.</p>

<p>What I&#8217;d really like is something that can be applied liberally without having to seriously &#8216;re-evaluate&#8217; art budgets (and by that, I of course mean making cuts elsewhere). Don&#8217;t get me wrong, I think that (C)LEAN Mapping is a really exciting advance, I just don&#8217;t expect to see it being used with wild abandon on current generation consoles as things stand. Still, it&#8217;s a great option to keep in mind for specific situations, with Civ 5&#8217;s water serving as a case in point.</p>

<h2>We Need To Go <em>Cheaper</em></h2>

<p>Is there anything out there that could give us more bang for our buck? Actually, way back in 2004, Michael Toksvig presented a beautifully simple technique [10] that - much like (C)LEAN Mapping - takes advantage of MIP-mapping and hardware texture filtering, but estimates bump variance directly from the lengths of the averaged normal vectors stored in an existing normal map. Since it doesn&#8217;t have a catchy name as such, I&#8217;ll refer to it as <em>Toksvig AA</em>.</p>

<p>From the paper, the original Blinn-Phong formulation is:</p>

<p>$$spec = \frac{1+f_ts}{1 + s}\left(\frac{N_a.H}{|N_a|}\right)^{f_ts} \text{, where } f_t = \frac{|N_a|}{|N_a| + s(1 - |N_a|)}$$</p>

<p>$N_a$ is the averaged normal read from the texture, from which we calculate the so-called <em>Toksvig Factor</em>, $f_t$. This is then used to modulate the specular exponent, $s$, and the overall intensity depending on the variance (roughness).</p>

<p>Here&#8217;s the same thing in code, with some minor tweaks:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float</span> <span class="n">len</span> <span class="o">=</span> <span class="nb">length</span><span class="p">(</span><span class="n">Na</span><span class="p">);</span>
</span><span class='line'><span class="kt">float</span> <span class="n">ft</span> <span class="o">=</span> <span class="n">len</span><span class="o">/</span><span class="nb">lerp</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
</span><span class='line'><span class="kt">float</span> <span class="n">scale</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">ft</span><span class="o">*</span><span class="n">s</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">s</span><span class="p">);</span>
</span><span class='line'><span class="kt">float</span> <span class="n">spec</span> <span class="o">=</span> <span class="n">scale</span><span class="o">*</span><span class="nb">pow</span><span class="p">(</span><span class="nb">saturate</span><span class="p">(</span><span class="nb">dot</span><span class="p">(</span><span class="n">Na</span><span class="p">,</span> <span class="n">H</span><span class="p">))</span><span class="o">/</span><span class="n">len</span><span class="p">,</span> <span class="n">ft</span><span class="o">*</span><span class="n">s</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>


<p>In the original version, all of this is stored in a 2D LUT that&#8217;s indexed by <code>dot(Na, H)</code> and <code>dot(Na, Na)</code>, in which case the <code>CLAMP</code> texture address mode would take care of the <code>saturate()</code> for us. Whereas using a LUT made more sense back in the days of Pixel Shader version 2.0 and lower, we can just evaluate things directly in the shader, which removes the need for a LUT per specular exponent and thus sidesteps potential cache and precision issues.</p>

<p>The first part of the equation, $\frac{1+f_ts}{1 + s}$, also looks a bit bogus since it&#8217;s only handling energy conservation in one direction and I&#8217;m pretty sure that those 1s should really be 2s [2]. Treating Blinn-Phong as a <em>Normal Distribution Function</em> (NDF) [2], what I suspect you really want is:</p>

<p>$$spec = \frac{p + 2}{8}\left(N.H\right)^{p} \text{, where } N = \frac{N_a}{|N_a|} \text{ and } p = f_ts$$</p>

<p>For those of you already using energy conserving Blinn-Phong, this should look familiar and also quite elegant: you&#8217;re just scaling the specular exponent and then everything <em>just works</em>.</p>

<p>This may be obvious, but for deferred rendering you&#8217;ll want to do this during your initial scene pass, as then you&#8217;re scaling just the once before packing the exponent into the G-buffer, rather than for every light. It also means that you&#8217;re still free to use clever G-buffer encodings of normals, such as BFN [11] or two components in view space [12].</p>

<p>Unfortunately, this brings us on to a major problem with Toksvig AA: you can&#8217;t use it with two-component <em>input</em> normals, such as 3Dc and (typically) DXT5. The whole basis of the technique is that local roughness (divergent normals) is approximately captured by the length of the filtered normal, so it&#8217;s no good trying to do this with encodings that reconstruct a unit vector! Furthermore, even if we use DXT1 exclusively for normal maps, compression will still mess with vector length. It&#8217;s easy to forget this since we often get acceptable results for lighting after renormalisation, but it can really play havoc with Toksvig AA. Given all that, it&#8217;s not surprising that the technique hasn&#8217;t been picked up by developers.</p>

<p>Was this yet another dead end? Actually, no, as it&#8217;s brought us a bit closer to a solution.</p>

<h2>The Wild Hunch</h2>

<p>Here&#8217;s an embarrassingly simple idea: Toksvig and LEAN Mapping exploit texture filtering, so why don&#8217;t we do this offline? Let&#8217;s see what happens if we take our normal map (Figure 1) and pre-compute the <em>Toksvig Factors</em> from a gaussian-filtered version of each MIP level.</p>

<p><a href="http://blog.selfshadow.com/images/showdown/bump.png"><img class="center" src="http://blog.selfshadow.com/images/showdown/bump_300.png"></a></p>

<p style="text-align: center;"><em>Figure 1: Normal map</em></p>


<p>What we end up with is shown in Figure 2. The results are immediately intuitive: areas in the original normal map that were flat are white (glossy), whereas noisy, bumpy sections are darker.</p>

<p><a href="http://blog.selfshadow.com/images/showdown/toksmap.png"><img class="center" src="http://blog.selfshadow.com/images/showdown/toksmap_300.png"></a></p>

<p style="text-align: center;"><em>Figure 2: Toksvig map</em></p>


<p>With the right kernel size, this <em>Toksvig Map</em> matches up really well against the run-time Toskvig method, which you can see for yourself with the demo at the end. It&#8217;s even better under magnification (compare Figure 3a and 3b), as Toksvig AA shows blockiness due to the discontinuous nature of bilinear filtering - cubic interpolation would fix this. (C)LEAN mapping suffers from the same kind of artefacts, but we don&#8217;t get this with the baked version because we&#8217;ve pre-filtered with a gaussian.</p>

<p><a href="http://blog.selfshadow.com/images/showdown/mag_toksvig.png"><img class="center" src="http://blog.selfshadow.com/images/showdown/mag_toksvig_300.png"></a></p>

<p style="text-align: center;"><em>Figure 3a: Toksvig AA under magnification</em></p>


<p><a href="http://blog.selfshadow.com/images/showdown/mag_toksmap.png"><img class="center" src="http://blog.selfshadow.com/images/showdown/mag_toksmap_300.png"></a></p>

<p style="text-align: center;"><em>Figure 3b: Toksvig map under magnification</em></p>


<p>What we effectively have here is an <em>auto-generated anti-aliasing gloss map</em>, and unlike the (C)LEAN terms, it compresses really well! Also, if this map is generated as part of the art import pipeline, then we&#8217;re free to use high precision three-component normals, prior to the compression method of our choosing. So, with this straightforward change, we&#8217;ve overcome the two big obstacles of memory consumption and precision.</p>

<p>Using this map is trivial as it&#8217;s just like any other gloss map:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="kt">float</span> <span class="n">ft</span> <span class="o">=</span> <span class="nb">tex2D</span><span class="p">(</span><span class="n">gloss_map</span><span class="p">,</span> <span class="n">uv</span><span class="p">).</span><span class="n">x</span><span class="p">;</span>
</span><span class='line'><span class="kt">float</span> <span class="n">p</span> <span class="o">=</span> <span class="n">ft</span><span class="o">*</span><span class="n">s</span><span class="p">;</span>
</span><span class='line'><span class="kt">float</span> <span class="n">scale</span> <span class="o">=</span> <span class="p">(</span><span class="n">p</span> <span class="o">+</span> <span class="mi">2</span><span class="p">)</span><span class="o">/</span><span class="mi">8</span><span class="p">;</span>
</span><span class='line'><span class="kt">float</span> <span class="n">spec</span> <span class="o">=</span> <span class="n">scale</span><span class="o">*</span><span class="nb">pow</span><span class="p">(</span><span class="nb">saturate</span><span class="p">(</span><span class="nb">dot</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">H</span><span class="p">)),</span> <span class="n">p</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>


<p>However, the observant reader will notice that the Toksvig Map has been generated with a particular specular power in mind. Fortunately, you can bake with a fixed power that gives reasonable contrast - around 100 seems to work well - and then convert later if you need that flexibility (e.g. for texture reuse or material property changes):</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='hlsl'><span class='line'><span class="n">ft</span> <span class="o">/=</span> <span class="nb">lerp</span><span class="p">(</span><span class="n">s</span><span class="o">/</span><span class="n">fixed_s</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">ft</span><span class="p">);</span>
</span></code></pre></td></tr></table></div></figure>


<p>Another option is to store the adjusted power, $f_ts$, as the exponent (range [0,1]) of a maximum specular power, as suggested by Naty Hoffman [2]. This log-space glossiness term could be more intuitive for artists to work with and may remove the need for a separate material property for the exponent, plus it&#8217;s also a convenient format for direct G-buffer storage [13]. A third option is to store the variance or the length of the normal instead and do the remaining maths at runtime.</p>

<p>The data itself could be packed alongside a specular mask or used to modulate an existing gloss map. I could even imagine the map being used directly as a starting point for artists to paint on top of, but in that case care will be needed to avoid reintroducing aliasing!</p>

<p>Gloss maps are arguably more important than specular masks in the context of physically-based shading [2], but artists are typically more comfortable with the latter. Additionally, with the high intensity range that&#8217;s possible with energy-conserving specular, anti-aliasing is even more critical. For those reasons, I find the whole idea of an auto-generated texture to be really appealing, even though I haven&#8217;t tested all of this out in production yet. I&#8217;m a big fan of art tools that do 80-90% of the work automatically, but still provide a way to go in and directly, locally tweak the output. The result is less hair pulling and more time dedicated to polishing. Hopefully this is another example of that.</p>

<h2>Gunfight at the O.K. Corral</h2>

<p>How does Toksvig Mapping compare to LEAN Mapping? Well, it&#8217;s certainly not as good on account of the lack of anisotropy, which can mean over-broadening of the specular highlight in some cases and not enough (leaving dampened aliasing) in others. However, it&#8217;s possible to bake CLEAN Mapping instead, which reduces remaining shimmering at the cost of a little more highlight blooming, since it tends to conservatively attenuate.</p>

<p>The reason we can do this is that the main thing separating Toksvig and CLEAN in practice is the measure of variance ($\sigma^2$):</p>

<div markdown="0">$$
\begin{array}{lcl}
\sigma_{toksvig}^2 &=& \frac{1 - |N_a|}{|N_a|} &#92;&#92;
\sigma_{clean}^2 &=& M_z - (M_x^2 + M_y^2) &#92;&#92;
p &=& \frac{1}{1+s\sigma^2}s
\end{array}
$$</div>


<p>Everything I already mentioned earlier with respect to Toksvig Maps - baking, evaluating, converting between powers, etc. - applies here too. I&#8217;ve even had some success baking LEAN, but I&#8217;ll leave talking about that for another time.</p>

<p>Even though LEAN Mapping is closer to the ground truth, baked Toksvig/CLEAN Mapping is still a hell of a lot better than doing nothing, which is precisely what most of us are doing at the moment.</p>

<h2>Ride &#8216;em, Cowboy!</h2>

<p>Here&#8217;s a simple <a href="http://www.selfshadow.com/sandbox/gloss.html"><strong>WebGL demo</strong></a> that allows you to toggle between standard Blinn-Phong (default), Toksvig AA and Toksvig Map. You can also edit the shader code on the fly!</p>

<p><em>Demo update: I&#8217;ve had reports of visual issues with the Toksvig Map option and NVIDIA GPUs. If it appears as though you&#8217;re missing MIP-maps, then upgrading to the latest drivers (285.62+) should fix the problem.</em></p>

<p>Besides the shimmering with Blinn-Phong, notice how the teapot remains shiny when you zoom out (mouse wheel) as bumps disappear. In contrast, the material maintains its appearance with Toksvig.</p>

<p>I&#8217;ve also created another little example that demonstrates the <a href="http://www.selfshadow.com/sandbox/toksvig.html"><strong>filtering process</strong></a>.</p>

<h2>&#8230;Into the Sunset</h2>

<p>Phew, that was a rather long post! Perhaps I could have just said: &#8220;Store the Toksvig Factor in a texture&#8221;, but I enjoyed the journey of getting there and I hope you found it interesting too!</p>

<p>I&#8217;ll follow up with more thoughts and an expanded demo at a later date once I&#8217;ve had time to investigate further alternatives. In the meantime, I&#8217;m interested to hear from anyone who&#8217;s explored this area. Besides clearly being a subject that keeps Dan Baker up at night, I spotted that Jason Mitchell had experimented with SpecVar maps [14] for Team Fortress 2 [15] and Naty Hoffman also shared some thoughts on earlier work here [16]. Beyond that though, I haven&#8217;t seen much discussion outside of the referenced literature, so I&#8217;m all ears!</p>

<h2>Acknowledgements</h2>

<p>In addition to the great work of all the cited authors, I&#8217;d also like to acknowledge sources of inspiration and code for the demo. The live editing environment is (or will be) heavily inspired by Iñigo Quilez&#8217;s <a href="http://www.iquilezles.org/apps/shadertoy/">Shader Toy</a> and <a href="http://www.subblue.com/">Tom Beddard</a>&#8217;s <a href="http://fractal.io/">Fractal Lab</a>. It makes use of several open source components such as <a href="http://ace.ajax.org/">Ace</a> and <a href="http://jquery.com/">jQuery</a>, plus various UI widgets. Some WebGL utility code was taken from Mike Acton&#8217;s <a href="http://altdevblogaday.com/2011/04/18/mike_acton-pokes-around-webgl-and-jquery/">#AltDevBlogADay post</a>, which builds on/refactors this <a href="https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/demos/google/shiny-teapot/index.html">Chromium demo</a>. I&#8217;ll save more details on the editor for a future post.</p>

<p>Finally, the normal map is borrowed from <a href="http://developer.amd.com/archive/gpu/rendermonkey/pages/default.aspx">RenderMonkey</a>. I trust that this is okay as I couldn&#8217;t find a licence anywhere, but please let me know if that isn&#8217;t the case. There isn&#8217;t a whole lot of freely available data out there and that particular texture, with its rough and smooth areas, makes for a good example.</p>

<h2>References</h2>

<p>[1] d&#8217;Eon, E., Gritz, L., <a href="http://http.developer.nvidia.com/GPUGems3/gpugems3_ch24.html">&#8220;The Importance of Being Linear&#8221;</a>, GPU Gems 3.<br/>
[2] Hoffman, N., &#8220;Crafting Physically Motivated Shading Models for Game Development&#8221;, <a href="http://renderwonk.com/publications/s2010-shading-course/">Physically-Based Shading Models in Film and Game Production</a>, SIGGRAPH Course, 2010.<br/>
[3] Baker, D., <a href="http://www.gdcvault.com/play/1014558/Spectacular-Specular-LEAN-and-CLEAN+lean+and+mean+specular">&#8220;Spectacular Specular  - LEAN and CLEAN specular highlights&#8221;</a>, GDC 2011.<br/>
[4] Persson, E., <a href="http://www.humus.name/index.php?page=3D&amp;ID=64">&#8220;Selective supersampling&#8221;</a>, 2006.<br/>
[5] <a href="http://iryoku.com/aacourse/">Filtering Approaches for Real-Time Anti-Aliasing</a>, SIGGRAPH Course, 2011.<br/>
[6] Swoboda, M., <a href="http://directtovideo.wordpress.com/2009/11/13/deferred-rendering-in-frameranger/">&#8220;Deferred Rendering in Frameranger&#8221;</a>, 2009.<br/>
[7] Baker, D., &#8220;Reflectance Rendering with Point Lights&#8221;, <a href="http://www.cs.ucl.ac.uk/staff/j.kautz/GameCourse/">Physically-Based Reflectance for Games</a>, SIGGRAPH Course, 2006.<br/>
[8] Olano, M., Baker, D., <a href="http://www.cs.umbc.edu/%7Eolano/papers/lean/">&#8220;LEAN Mapping&#8221;</a>, I3D 2010.<br/>
[9] Donnelly, W., Lauritzen, A., <a href="http://www.punkuser.net/vsm/">&#8220;Variance Shadow Maps&#8221;</a>, I3D 2006.<br/>
[10] Toksvig, M., <a href="http://developer.nvidia.com/content/mipmapping-normal-maps">&#8220;Mipmapping Normal Maps&#8221;</a>, 2004.<br/>
[11] Kaplanyan, A., &#8220;CryENGINE 3: Reaching the Speed of Light&#8221;, <a href="http://advances.realtimerendering.com/s2010/index.html">Advances In Real-Time Rendering</a>, SIGGRAPH Course, 2010.<br/>
[12] Pranckevičius, A., <a href="http://aras-p.info/texts/CompactNormalStorage.html">&#8220;Compact Normal Storage for small G-Buffers&#8221;</a>, 2009.<br/>
[13] Coffin, C. <a href="http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3">&#8220;SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3&#8221;</a>, GDC 2011.<br/>
[14] Conran, P., &#8220;SpecVar Maps: Baking Bump Maps into Specular Response&#8221;, SIGGRAPH Sketch, 2005.<br/>
[15] Mitchell, J., Francke, M., Eng, D., <a href="http://www.valvesoftware.com/company/publications.html">&#8220;Illustrative rendering in Team Fortress 2&#8221;</a>, NPAR 2007.<br/>
[16] Hoffman, N., <a href="http://renderwonk.com/blog/index.php/archive/lighting-papers/">&#8220;Lighting Papers&#8221;</a>, 2005.<br/>
[17] Akenine-Möller, T., Haines, E., Hoffman, N., <a href="http://www.realtimerendering.com/">Real-Time Rendering 3rd Edition</a>, A. K. Peters, Ltd., 2008.</p>
]]></content>
  </entry>
  
</feed>
