AMD Unveils “Barcelona” Architecture

AMD is prepared to launch its next-generation Barcelona CPU architecture this Monday. Barcelona is the first K8-based product to feature a substantial amount of architectural changes since the original launch of AMD’s Opteron and Athlon 64 processors. Substantial architectural changes aside, Barcelona features evolutionary enhancements to the existing K8.

Barcelona is the company’s first quad-core CPU architecture and features a native quad-core design. Intel’s previously released Clovertown, Kentsfield and upcoming Harpertownand Yorkfieldquad-core processors feature two Core-architecture dies on a single package – effectively quad-core, however, not a native design like Barcelona.

AMD equips Barcelona with plenty of new tweaks and features to boost performance. New features of Barcelona include tweaked cache, memory controller, branch predictors, prefetch logic, power management and additional AMD-V extensions.

Barcelona’s cache configuration includes L3-cache – a feat AMD has not taken advantage of since its K6-III+ and K6-2+ processors. All CPU cores on Barcelona-based processors share 2MB of L3-cache. L1 and L2-cache remain unchanged with 128KB of L1-cache per core and 512KB of L2-cache per core. The cache configuration is unchanged with 2-way associative L1-cache and 16-way associative L2-cache. The shared L3-cache is 32-way associative.

Barcelona-based processors feature a total of 4.5MB of on-die cache. In comparison, Intel’s Clovertown and Kentsfield quad-core architectures feature 64KB of L1-cache, 4MB of shared L2-cache per pair of cores for a total of 8.25MB on-die cache.

AMD tweaked Barcelona’s memory controller for greater bandwidth efficiency and lower latency. This time around, AMD took a different approach for the memory controller. Instead of a single 128-bit wide memory controller, AMD split the memory controller into two 64-bit wide memory controllers. This allows the memory controllers to achieve greater efficiency by operating independently.

AMD designed the new memory controller with future memory technologies in mind. Barcelona will initially debut with support for DDR2 memory, but it’s first refresh, in the form of Shanghai, will support DDR3 memory.

New to the memory controller is a DRAM prefetcher. The DRAM prefetcher intelligently prefetches data it deems useful in the future. DRAM prefetching does not store data in the L1, L2 or L3-caches as it has access to its own buffer.

Barcelona features a new 512-entry indirect branch predictor – a feat Intel debuted on its Pentium M processor. The new indirect branch predictor reduces mispredicted branches for greater efficiency. Greater efficiency also translates into lower power consumption as well.

In addition to the new 512-entry indirect branch predictor, Barcelona has improved prefetcher logic too. The new prefetcher logic retains the same two prefetchers per core as the K8 architecture; however, AMD has tweaked it for greater performance. With the new improved prefetcher logic, Barcelona brings prefetched data directly into the L1-cache. AMD’s K8 architecture brought prefetched data into L2 cache.

AMD’s SSE implementation sees substantial upgrades as well. Barcelona increases the SSE execution width to 128-bits. K8 featured an SSE execution width of 64-bits that can execute two 64-bit SSE instructions at the same time.

Although K8 featured parallel 64-bit SSE instruction execution capabilities, 128-bit SSE instruction execution required extra time to divide the 128-bit instructions into two 64-bit operations. This allows Barcelona to execute SSE instructions quicker than K8.

SSE instruction fetch bandwidth is also improved. Instruction fetch bandwidth increases to 32-bytes per cycle over the previous 16-bytes per cycle of K8 with Barcelona. AMD increased the internal interconnect between the memory controller to L2-cache to 128-bits per cycle over K8’s 64-bits per cycle too.

AMD’s Barcelona has power management changes too. With Barcelona, the power planes are split, allowing the processor and memory controller to operate independently at different speeds and voltages. However, to take advantage of split power planes, a new motherboard is required, as current motherboards lack the required power circuitry.

Each processor core can dynamically adjust its clock speed depending on load too. The new power management features allow Barcelona quad-core processors to operate with the same thermal envelope as current dual-core Opteron processors.

Lastly, AMD has added new AMD-V instructions. The new instructions provide hardware acceleration of shadow paging – allowing guest operating systems to have independent memory management. AMD refers to the new feature as nested paging.

All the architectural improvements and quad-cores bring the Barcelona transistor count to 463-million transistors. Intel’s Kentsfield features 582-million transistors, though it has nearly twice as much cache. Nevertheless, Barcelona-based processors will be manufactured on a 65nm fabrication process.

Power consumption of quad-core Barcelona processors is identical to dual-core counterparts. AMD has three thermal bins for Barcelona, similar to dual-core models. Standard, HE low-power and SE high-performance thermal bins will be available. However, AMD will not launch SE models until Q4’07.

AMD’s ACP measures the entire CPU’s power draw, including cores, memory controller and HyperTransport links. The measurements are conducted using “commercially useful high utilization workloads,” according to AMD’s Barcelona presentation.

The workloads used to measure ACP include TPC-C, SPECcpu2006, SPECjbb2005 and STREAM. AMD ACP ratings result in lower power consumption numbers, which the company claims is more reflective of real world use, instead of the overestimation of the TDP rating system.

AMD Opteron 2300 Series

Frequency TDP

2.0 GHz 95W$372

23471.9 GHz 95W$312
2347 HE
1.9 GHz 68W$372
2346 HE
1.8 GHz 68W$251
2344 HE
1.7 GHz

AMD Opteron 8300 Series

Frequency TDP

2.0 GHz 95W$1,004

83471.9 GHz 95W$774
8347 HE
1.9 GHz 68W$861
8346 HE
1.8 GHz 68W$688

AMD has nine Barcelona-based Opteron 2300 and 8300 series models set for launch. Launch clock speeds range from 1.7 GHz-to-2.0 GHz, with higher speeds available Q4’07. The company also expects speeds to ramp up to 2.3 GHz and above in Q4’07 with SE-bin models, which typically have 120-Watt TDPs.

Expect AMD to debut Barcelona-based Opteron 2300 and 8300-series on September 10. Socket AM2 users looking for a quad-core processor will have to wait until later this year for Budapest-based single-socket Opteron or Agena-based Phenom X4 and FX processors.