EXTRAS
AMD’s answer on graphics could be… 3D V-Cache on GPU?
The new from AMD may be a double blow to NVIDIA. We have been talking about a rumor for some time, which we have mentioned again today, and that has to do with the option of a double GPU on the same PCB, that is, a dual graphics card as there was in some models in the past. As Navi 31 is the complete chip as such, AMD’s options are reduced to that, but it also seems that there will be one more surprise, since the company would have an ace up its sleeve. May AMD launch graphics cards with 3D V-Cache?
It is something that has been in our heads for a long time and Tom wassick offers some answers thanks to the scan of an RX 7900 XT. How did you come to this conclusion? The answer is neither quick nor simple, but we will try to explain it carefully.
AMD could have graphics with 3D V-Cache
What else can you see? A linear array of “spots” that look remarkably like the keep out zones on X3D, and that are on the same 17-18 um pitch. Could they be considering stacked MCD functionality (or maybe they’re something else)?
—Tom Wassick (@wassickt) January 27, 2023
The reflection comes after the analysis of an RX 7900 XT, but why not an RX 7900 XTX in particular? Well, because AMD left an inactive MCD in the first one and thanks to this sixth die it has been possible display RDL output wiring clearly thanks to infrared.
The image itself is not available for reasons we don’t know, but we do have Tom’s impressions, which ensures that they have been able to see a linear matrix of exclusion points, which is exactly the same thing that AMD does with the X3D in its Ryzen . In addition, the zone for wiring has the same 17-18 um.
Therefore, the theory before this is that AMD designed RDNA 3 thinking of adding 3D V-Cache to its graphics from the beginning, perhaps understanding that they were not going to launch larger and more powerful chips as such, which has all the potential. sense of the world if we look at architecture.
More dependence on the vector side than on the scalar
The proportion of instructions to be processed will obviously depend on the workload, but in AMD for two generations the vector side has always prevailed over the scalar side. This is important to understand because when it comes to accessing the VRAM or the cache, the “power” of the ALUs and how they work is key to such access.
In fact, AMD has achieved something really interesting, since although it has removed the Infinity Cache from the main die as an MCD, due to its higher frequencies it has managed to reduce latency, something fundamental and that obviously has a cost in consumption.
It’s gone from 17.4 ns average to 15.4 ns If we compare RDNA 2 with RDNA 3, we must add to this the increase in size of L1 and L2 as well (double in the first, 50% more in the second).
Greater power in the GPU means having caches at the level, hence the increase in size and the reduction in latency, but what about it when you have to access VRAM? Well, in the case of RDNA 3 it has increased compared to RDNA 2, which is a big problem. Segmenting the cache as AMD has done is what has allowed this step back by launching it as MCD, so the solution comes from vector access.
32 ways to alleviate the deficit
Although the number of available pathways is the same as in RDNA 2 and the ratio remains between the vector and scalar accesses 32 vs 4the higher core frequency means that while working on the die the latency is reduced in RDNA 3, but when going out to the Infinity Cache it increases, although not too much.
If we add to this a greater total bandwidth for the VRAM, what we have is a similar calculation in cache and latency, but being able to move more information per second. How to take advantage of this when you have more performance on vectors than on scalars? Well, as AMD has done against NVIDIA: increasing the L3.
NVIDIA opted for the same, but with the L2, so it wins the game there, although at the cost of a larger chip, more “expensive” and more expensive to manufacture. AMD is going to make up for this with 3D V-Cache, obtaining greater profitability and now, less dependence on the VRAM of the graphics card. What is sought with this? Well several things:
- Reduce internal bandwidth.
- Reduce the use of the reuse of the spatial and temporal data.
- Minimize data movement.
- Reduce the total latency of the system.
- lower consumption of the card.
Increasing the Cache at this point further alleviates the problems, allows less access to VRAM and therefore less dependency on it, all with the same bandwidth. The problem will be the cost of implementing the vertical dies and, as expected, the difference in height with the GDC, something that AMD must already have solved by carrying out the same process that it does with the Ryzen. If everything said happens, the performance jump can be quite significant and significant, but we cannot specify it because there are no precedents for this.