SSJX.CO.UK
Content

Porting the Postcard Sized Path Tracer to the D Programming Language

Introduction

This page details my porting of the Postcard Sized Path Tracer by Andrew Kensler to the D Programming Language. This started as 2037 byte obfuscated C++ program that was on the back of a Pixar recruitment flyer. Fabien Sanglard explains all about the C++ program and the differences between a path tracer and a ray tracer in the link below!

The main reason for doing this is that it would be cool if I could actually get it to work and it would be a good learning experience!

Update 15/02/2024: This program has been recompiled using the LDC D Compiler which is LLVM based and produces even faster programs! All the times haves been updated accordingly and are around 5 times faster than before!


This took 75,929ms to render...
(320 x 200 Sample count 16)

Download

To keep the download size small, this is provided as a 7zip archive. Archive includes the compiled exe and source.

Acknowledgements

As mentioned above, the original Postcard Path Tracer was created by Andrew Kensler. While making this, I made use of the work / websites by the following people:

The following sites were also used as a reference when trying to get the program to work:

Thanks to all the people above!

Porting to the D Programming Language

The following is a list of some the changes made:

Initial Benchmark

To get a rough idea of the time taken, we can use the StopWatch:

import std.datetime.stopwatch;
..
StopWatch sw;
sw.start();

// Rendering happens here...

long msecs = sw.peek.total!"msecs";
writeln("Elapsed time: ",msecs,"ms");

The settings used to test are a resolution of 320x200 and a sample count of 2, the program was compiled with ldc2 -O -i pixar.d

Non-Parallel Version : 18,248ms (LDC)
Non-Parallel Version : 93,405ms (DMD)

Parallelism

As mentioned above, the Rust version made use of parallelism to get a large boost in speed. I was able to use a similar method in D to get the same effect! The idea is to structure the rendering so that it can be run in a parallel foreach loop, this involve pre calculating where the pixels go.

struct mypoint{
	int x;
	int y;
	int pos;
}

mypoint[] points=new mypoint[w*h];

// Pre calc the locations of all the points
auto c=0;
foreach_reverse(y;0..h){
	auto px=y*(w*3);
	foreach_reverse(x;0..w){
		points[c].x=x;
		points[c].y=y;
		points[c].pos=px;
		px+=3;
		c++;
	}
}


// Parellel render the points
foreach(pt;parallel(points)){
	auto x=pt.x;
	auto y=pt.y;
	auto px=pt.pos;
	
	Vec color=0;
	foreach(p;0..samplesCount){
		color = color + Trace(position, inverse(goal + left * (x - w / 2 + randomVal()) + up * (y - h / 2 + randomVal())));
	}
	// Reinhard tone mapping
	color = color * (1.0f / samplesCount) + 14.0f / 241;
	Vec o = color + 1;
	color = Vec(color.x / o.x, color.y / o.y, color.z / o.z) * 255;

	// Write our bitmap
	output[px]=cast(ubyte)color.z;
	output[px+1]=cast(ubyte)color.y;
	output[px+2]=cast(ubyte)color.x;
}

The code within the parallel loop is self contained and not dependant on previous loops so can be done across multiple cores without any problems. The order the work is done in does not matter, it is not a problem if point #2 is finished before point #1 for example.

Final Benchmark

With the above changes and the same settings, the new time was:

Parallel Version : 9,328ms (LDC)
Parallel Version : 47,502ms (DMD)

Given I have two cores, it looks like the load was split fairly evenly between them as it did finish in almost half the time!

Conclusions

The aim was to make something cool and learn along the way, both missions accomplished! I have no doubt that this program can be improved upon and made even faster but I am pleased with the result and impressed by how easy it was to get the program using multiple cores!

Updated 15/02/24
Created 14/02/24