Date: Thursday , March 22, 2012
NVIDIA’s Kepler GPU is finally here, and we've been enjoying getting to know the brand new GeForce GTX 680 video card. The GeForce GTX 680 marks the introduction of NVIDIA's next generation of GPUs and will eventually be followed by others in the family. As of right now, only the GeForce GTX 680 is being launched. The GeForce GTX 680 will be NVIDIA's current flagship GPU, and is based on the new Kepler architecture. The GTX 570 and 560 GPUs are going to stay in the mix for a little while longer.
NVIDIA is positioning this video card to sell at the $499 price point. If you recall, the GeForce GTX 580 was also launched at $499. NVIDIA certainly sees this silicon to be fueling the high-end video card. Our hope is that the GeForce GTX 680 has moved Green performance upwards at the price point of $499. We will also see how it compares to the Radeon HD 7970, which is the closest competition to the GeForce GTX 680, though priced higher at $549.
The Kepler "GK104" architecture isn't much different from its Fermi predecessor. To first understand where NVIDIA went with Kepler, we must first look at what one of the SM clusters looked like in the GeForce GTX 580. In the GeForce GTX 580 we had 16 SM clusters which had 32 CUDA cores insides and a control logic. This provided the 512 total CUDA cores inside the GeForce GTX 580. For Kepler NVIDIA has crammed 192 CUDA cores into the SM cluster and dubbed it the SMX architecture. NVIDIA was also able to reduce the size of the control logic. The GeForce GTX 680 has 8 of these SMX clusters on board for a total of 1536 CUDA cores, which is quite a bit more than the GeForce GTX 580. The GeForce GTX 680 has a total of 8 geometry units and 128 texture units along with 32 ROPs and access to 256-bit GDDR5.
Each SMX has 2x the performance per watt over Fermi. Inside each SMX cluster are 16 texture units and the new Polymorph Engine 2.0. One way in which NVIDIA was able to cram so many CUDA cores into each SMX cluster is because NVIDIA has done away with separate clock domains in Kepler. In the GeForce GTX 580 the core clock would run at one frequency, along with the ROPs, but the individual CUDA cores would run independent of that at much faster clock speeds. With Kepler this is no longer the case, the 192 CUDA cores per SMX cluster now run at the same core clock as everything else. This means NVIDIA now runs at a unified clock speed throughout the architecture similar to the competition. You no longer have to worry about separate core clock and shader clock frequencies. By lowering the clock speed of each CUDA core, it reduces performance, but, with the addition of 1024 more CUDA cores it should more than make up for the difference in the loss of clock, plus NVIDIA has a new Polymorph Engine.
The new PolyMorph 2.0 Engine is capable of producing 2x the primitive and tessellation performance per SMX cluster versus Fermi. NVIDIA claims 4x the tessellation performance of the Radeon HD 7970, though we cannot verify that specifically. NVIDIA is confident in its tessellation performance over the competition with Kepler. The Raster Engine is a true 1:1: raster to ROP balance.
One feature that we found particular interesting is the "bindless" texture support in Kepler. In the past, GPUs were restricted to 128 total simultaneous textures (bound by the texture units available) at a given time. With Kelper, NVIDIA has introduced a bindless texture support that allows the GPU to have over one million simultaneous unique textures available to the shaders. This means game developers could go all out on textures in games without this previous restraint. This also sounds as if something like streaming textures might work well with. At any rate, it is an interesting bit of support on something we haven't seen much attention paid to lately, texture support.
These are not revolutionary changes over Fermi, but everything we see here is a natural evolution NVIDIA has taken with the Fermi architecture. We find it most interesting that NVIDIA has unified the core clock and shader clocks and gone with a larger number of shaders versus less shaders with higher clock speeds as NVIDIA did with GeForce GTX 400 and 500 series GPUs. This marks a distinct change in architecture philosophy. This is the same path AMD has taken with its architecture.
Let's get straight to the specifications so you know what you are dealing with before we look at other unique features. The GeForce GTX 680 is manufactured on TSMC's 28nm process and made up of 3.54 Billion transistors. Yes the GeForce GTX 680 supports DX11.1. Another checklist feature, PCI Express 3.0 is also supported.
As we mentioned, there are 1536 CUDA cores in the GeForce GTX 680. The clock speed, or baseclock of the GPU is 1006MHz. This means the CUDA cores, the ROPs, everything inside runs at a base clock of 1006MHz. This is the first flagship video card with a 1GHz clock speed as a reference frequency. There is something else called the boost clock, which we will talk about more below. Basically, this video card can dynamically adjust the GPU frequency, and you are almost guaranteed to hit 1058MHz in games thanks to the boost clock.
There is 2GB of GDDR5 on board on a 256-bit bus. The memory is clocked at 6GHz. This provides 192GB/sec of memory bandwidth, which is exactly the same as a GeForce GTX 580. However, it is slightly more memory capacity available compared to the GTX 580's 1.5GB. You will instantly note how this differs from the Radeon HD 7970 which has 3GB on board with a 384-bit memory bus and 5.5GHz memory providing 264GB/sec of memory bandwidth. Whether this memory difference will actually cause a difference in gaming performance we will find out.
In terms of power, the TDP of this video card is 195W, which is less than the 250W of the Radeon HD 7970. The GeForce GTX 680 only requires 2x 6-pin power connectors. In terms of output there are 2x dual-link DVI outputs, one HDMI and one full size DisplayPort on board. This video card does support NV Surround from a single-video card. This means we can run our triple display setup from this one video card. NVIDIA has designed the fan with acoustic dampening material and reshaped the heatsink for improved airflow.
GPU Boost is probably the most interesting feature of this video card, and one that cannot be disabled. Think of GPU Boost like Intel's Turbo Boost, but it is much more sophisticated. In fact it is so much different in NVIDIA’s eyes that those guys get a bit haughty referencing it as "NVIDIA’s Turbo Boost." But the fact remains that these two technologies are very much the same in what is done to clock speed, albeit NVIDIA’s is much more elegant. The GTX 680 GPU is able to dynamically change frequency in games to provide the best performance and use the power of the video card available to give you the faster gaming, basically overclocking your GPU automatically. NVIDIA has hardware monitoring on the video card that is able to sample different variables (13 variables given to us as the amount) that determine what the clock speed is in the game you are running. Such things as temperature, board power, voltage, load, and many other things are monitored in real-time and done on the fly. Not only GPU frequency is changed, but also GPU voltage.
Since the TDP of this video card is 195W, there might be some games that don't come close to tapping the full power of this video card. In these lower powered games, the GeForce GTX 680 is able to raise the GPU frequency to give you better performance until it reaches TDP. This means the GPU clock speed could increase from 1006MHz to 1.1GHz or 1.2GHz or potentially even higher. (Kyle saw a GTX 680 sample card reach over 1300MHz running live demos but it could not sustain this clock.) The actual limit of the GPU clock is unknown. As each video card is different, and with the addition of custom cooling, your maximum GPU Boost clock speed could be anythingآ…within reason. This is going to make overclocking, and finding the maximum overclock a bit harder.
GPU Boost is guaranteed to hit 1058MHz in most games. Typically, the GPU will be going much higher. We experienced clock speeds in demo sessions that would raise to 1.150GHz and even 1.2GHz in such games as Battlefield 3. With a GPU frequency increase of that much over base clock you can rest assured it will be a noticeable performance difference. The greatest thing about GPU Boost is that you don't have to do a thing, it's all automatic. The only way to truly know what your GPU frequency is in games is to monitor the GPU frequency in real-time and see what happens. It will fluctuate depending on the scene, it can go from the baseclock all the way up to who knows how high. The clock speed will be dynamically changing on you while gaming.
There is so much more to talk about with GPU Boost, but we are going to save it for a future article where we will look at it in-depth. We will also perform overclocking and figure out exactly how to do it with GPU Boost. The thing to remember is that you cannot turn GPU Boost off, it is with you for life, so we have to get use to it.
NVIDIA has released an SDK to Add-in-Board partners so that those companies can build their on GPU Boost control panels. I know EVGA’s has already been leaked, and is a bit clunky to use, but all these GPU Boost controls will share commonalities. NVIDIA has asked that companies do not use their own nomenclature for the technology and the switches and buttons associated with it. So using one company’s GPU Boost should translate to the next fairly easily. Monitoring software in the SDK uses a very familiar format that is easy to use and customize.
Adaptive VSYNC is a brand new technology from NVIDIA that will only work on the GeForce GTX 680. This is another technology we are excited about as it improves the gamers’ immersion and experience. It directly affects the smoothness of a game. Until now, (Editor’s Note: Lucid actually has a similar technology.) gamers’ have had two choices when it comes to VSYNC. You can either disable VSYNC and experience tearing, or you can enable VSYNC and if the FPS drops below your refresh rate the FPS instantly drops to 30 FPS, or lower.
With Adaptive VSYNC turned on your games will maximize the framerates to your monitors refresh rate, therefore you won't experience tearing. However, if your framerate drops below the refresh rate VSYNC will kick into real-time FPS mode and deliver the real-time FPS being delivered rather than instantly drop to 30 FPS. You won't experience tearing below your refresh rate, and you also won't get large drops in framerate. It is the best of both worlds, you get to have your no tearing, and still experience the best possible performance in games. This is a feature that directly affects your gameplay experience in a positive way. These are the kinds of things we like to see.
(Editor’s Note: NVIDIA’s Adaptive Vsync could be a very big deal. However it is simply going to take a lot of quality time gaming to come up with a "conclusion" on its worth. While framerate numbers, like those shown on the slides above, will have some value, the true value in Adaptive Vsync is how it impacts your gaming experience and level of immersion. Our suggestion to NVIDIA has been to give the feature a bit more exposure in the control panel instead of hiding it deep inside so that people beside [H]’ers might actually get to see and use it. Rest assured though, we will be covering this feature in the future, it just however is not something that is easy to get a handle on in a short period of time. Interestingly as well, GPU Boost has a nifty little slider in it labeled "Target Framerate" as well. Using GPU Boost to reign in some of your "extra" framerate power while using Adaptive Vsync may be just what the doctor ordered.)
We have long been supporters of the new shader based AA methods that have come around these past couple of years. We do believe that the future of AA lies in shader based programs that can do a lot more than standard traditional MSAA. Shader based AA is faster, and able to remove aliasing in places traditional MSAA and SSAA cannot touch. There are several different methods for doing shader based AA, and one that NVIDIA has created is known as FXAA. We've talked about FXAA extensively in the past, but what is new is that NVIDIA now has a control panel option for it. You will be able to turn on FXAA in almost any game you want, whether it has native FXAA support or not. This is akin to AMD's MLAA option in its drivers, to force on MLAA in every game. This option gives users another choice in AA options. For example if a game is performing poorly because 2X, or 4X, or 8X AA is reducing performance by a great deal, just turn on FXAA instead and get 4X AA image quality at no performance hit.
NVIDIA isn't stopping with FXAA. NVIDIA has another trick up its sleeve called "TXAA." This method differs in that it is not entirely a shader program. It does require hardware acceleration since it utilizes bits of multisampling technology. The first slide above shows you how NVIDIA has positioned the performance and image quality of TXAA. With TXAA 1 enabled NVIDIA claims you'll get better the quality of 8X MSAA (or slightly better) but at the same performance level of 2X MSAA. Therefore, it takes a greater performance hit than FXAA, but should also give you better image quality, with only the performance impact of 2X AA being enabled. There is another mode called TXAA 2 which gives you better than 8X MSAA quality at the performance level of 4X MSAA. TXAA has to be implemented by the game developer in a game.
The GeForce GTX 680 is smaller than the Radeon HD 7970. The GTX 680 measures exactly 10" while the Radeon HD 7970 measures 10.5" in length. Two 6-pin power connectors are required, and you will see that the connectors are facing each other in a stacked format. We found this stacked format to be rather difficult in removing attached cables. You have to press your finger in-between the cables to push the tab to release the power cables; if you have big fingers it’s harder. Personally, I liked the format with the power connectors lined up next to each other in a row with the tabs facing the same direction. We also had to twist our power supply cable in a weird way it wasn't used to in order to connect these facing the other cable. Seems a bit strange to make this stacked format, it's just a bit more difficult in attaching and detaching cables. NVIDIA did mention that this stacked power plug format did give it more options when designing the PCB and cooling system, so given that the new plugs get a pass.
As you can see the GTX 680 has two dual-link DVI connectors and an HDMI and full sized Displayport. We were able to setup NV Surround with three displays by using both DVI connectors and the Displayport. It should also be noted that a fourth display can be enabled with the three displays being used for gaming. We could have hooked up a display to the HDMI connector and used it as a fourth accessory display, for example having IM'ing or chat programs, or web browsers, or email running on it while gaming across the other three displays. You can also run 3-display NV Surround using the two DVI and the HDMI as well. But no matter how you use the outputs, only 3 screens are supported in NV Surround, with the fourth acting as a "sidecar" if you wish.