Articles

Benchmarking the Benchmarks

Author:Kyle Bennett

Editor:Brent Justice

Date: Monday , February 11, 2008

It is time to put your money where your mouth is, or maybe where your keyboard is. HardOCP sets out to prove that real world video card testing is where it's at. Beware, we may make you feel dirty every time your run a benchmark from here out!
continued...

Timedemos Tell Me What?

Timedemos tell you how fast a video card can play a demo file and how many frames a second it rendered while playing through that file.

"Canned" timedemos, or ones that come included in many games are notorious for being cheated in. The game developer or hardware vendor may "optimize" their software, hardware, and their drivers to score a higher framerate in this specific timedemo when run in a benchmark mode on a specific GPU. These optimizations that increase timedemo benchmark framerates do NOT always translate to the same increases when actually playing the game. In our recent 3870 X2 conclusion we told you that AMD explained their new driver netted Crysis GPU timdemo scores with 60% improvements, but actually playing the game we only noted increased framerates of 1 or 2 fps. Canned benchmarks are accessible to anyone who buys the game or downloads the demo and therefore they can be widely used for easily comparable framerate metrics. Many times it is as easy as clicking a couple of buttons and you get a score to share with the world. Certainly these timedemos can be great tools to use when you are tweaking your system as they can give you a quick look at whether or not your efforts are moving your system performance in the right direction.

But does running a canned benchmark tell you anything about what your actual framerates will be in a game? And does it represent the real gaming performance difference between video cards? If Card A scores a 50 and Card B scores a 25, what exactly does that tell me? Card A is twice as fast as Card B in the canned timedemo but what about in a game?

Canned Testing with Crysis

Crysis represents what is probably the most graphically challenging game on the market right now that is enjoying a fairly large install base. Luckily enough it comes with a built in GPU benchmark that pretty much anyone can run, and they do...a lot. Even large sites like Anandtech exclusively relied on this canned benchmark for testing Crysis in its recent ATI Radeon HD 3870 X2 review.

Crysis ships with a built-in GPU benchmark, unfortunately the game is still too stressful to run at the highest quality settings so we're left running at the "high" defaults with no AA and at only two resolutions.

I am not sure if this is supposed to point the reader to any type of real world expectations at all, and it leaves me a bit confused, but Anandtech does go on to say this as well:

The last driver drop ensured that the 3870 X2 was actually faster than any single NVIDIA card in our lineup. At 1920 x 1200, the X2 is around 18% faster than the 8800 GTS 512. You'd need a pair of these X2s or faster in order to actually run at smooth frame rates at these settings unfortunately. It looks like the perfect card for Crysis still doesn't exist.

Does that mean if I play Crysis at 1920x1200 with the settings Anandtech used, my game on the X2 should run 18% faster than the 8800 GTS 512? Again, I am a bit confused here as to what the value of the information is.

Using real world gameplay, we come to much different conclusions at HardOCP about settings we can utilize and still play Crysis comfortably.

Canning [H] Benchmarks

All of this has left a lot of people very confused. And rightly so. You have a multitude of sites telling you that a 3870 X2 is "faster" than an 8800 Ultra and HardOCP is telling you that the card does not perform up to 8800 GTX levels. The above is not to pick on Anandtech, but obviously it is the highest profile site to conduct "canned" testing like this and its editors have openly defended their methods.

We decided to use the same canned Crysis benchmark and see how it compared to our own real world gaming testing of Crysis and see if we could understand all the results a bit better.

Crysis Built in Canned GPU Benchmark

ATI HD 3870 X2

What you are looking at above is the built in "GPU" Crysis benchmark that you have seen so many people run and report numbers on. The settings used above are EXACTLY the settings that we used for real world in-game testing here on the 3870 X2. It is also the same exact hardware and driver setup. That said, this canned demo has to stand on its own since we cannot replicate the exact demo in real world gameplay (we're getting to that, be patient), but we can run the canned demo in REAL TIME and record the framerate with FRAPS.

The "Real Time Timedemo FRAPS" data you see is gleaned from running the canned GPU timedemo in real time, and recording the framerate with FRAPS. The "Traditional Timedemo Benchmark" results are as you might expect from running in timedemo mode where the recorded demo runs as fast as it can till completion then gives you your benchmark scores.

So to put it simply, one is the canned GPU demo run real time and the other is the demo run in timedemo benchmark mode.

Now what you will immediately notice is that the two sets of results using the Crysis canned GPU demo are not even close to the same. Simply running the timedemo as a traditional "timedemo benchmark" gives us a 38% increase in average framerate over running the canned demo at real time speed using the 3870 X2. Average framerate increased 38% going from a real time canned demo to a traditional "fast as it can draw it" timedemo benchmark. Same demo, same settings, same hardware, same driver.

GeForce 8800 GTX

This time we run the same canned demo tests, apples to apples graphical game settings, with the 8800 GTX (these settings do NOT represent the visual quality settings we used in our 3870 X2 evaluation).

With the 8800 GTX we see much of the same, but not quite as high of a percentage increase. Going from real time demo to timedemo benchmark mode, we see the average framerate increase 33% with the 8800 GTX.

What Does This Mean?

It would seem to us that when you run a canned benchmark in "traditional timedemo mode" that you do not get anywhere close to what the framerate of the exact same demo is when you play it back in real time and measure the frame rate. At least this is the way it happens in Crysis.

Keep in mind here that we have not even touched on real world gameplay...yet. We are seeing huge differences in simply running a recorded "canned" demo in real time and timedemo modes. Even we were surprised to increases of 33% and 38%!


GeForce 8800 GTX - High Shaders

Above we have visual settings that represent what use used in our 3870 X2 evaluation for the 8800 GTX. We have changed a single value, and that is "High Shader Quality." As you might expect Crysis is a very shader intensive game and this setting does give you a better overall image quality when playing the game.

Our framerate difference here is somewhat bottlenecked by the GPU shader limitations, but we are still seeing about a 20% increase in average framerate between the real time demo and the timedemo benchmark.

So it would seem that depending on the settings used, it is quite possible for the 3870 X2 to "benchmark" much better than the 8800 GTX in this example.