Xeon 14 nm Broadwell-EP arrive. They provide lot of optimizations and features enable Intel to keep its advantage in this highly lucrative market.
Broadwell-EP, its new Xeon v4 that replace Haswell-EP (Xeon v3). Unsurprisingly, the move to 14 nm allows the addition of more cores while maintaining the same TDP. The processor is also compatible with the socket LGA2011-3 provided to update the BIOS on the motherboard. That is why we find the QPI, PCI-Express lines or Ethernet controller Haswell-EP.
Frequencies less traumatized by AVX: Broadwell-EP Xeon E5 review
The architecture has changed, however. One of the major innovations is the optimization of frequency management when running AVX instructions. Indeed, a processor dealing classical or SSE instructions will run at a higher frequency than if we send him AVX commands. On our review sample (a Xeon E5-2697 v4 with 18 cores), the frequency was normal was 2.3 GHz. It came down to 2 GHz when using AVX instruction.
Calculation optimized floating point: Broadwell-EP Xeon E5 review
The transition to Broadwell has allowed the execution of 5.5% more instructions per clock cycle on average. Intel specifically optimized floating point calculations. Latency vector multiplications and floating point from 5 clock cycles to 3. The new chip can also treat 32 bits per cycle in its divisions (Radix-1024), due in part to the use of a partial pipeline, that is to say, it is possible to send information in the pipeline before the old data already sent to finish his journey in the pipeline.
A memory controller updated: Broadwell-EP Xeon E5 review
The new memory controller finally supports faster memory providing a performance gain of about 15% , according to Intel. It is compatible with DDR4 Write CRC, a new form of error correction optimized for this type of memory. It is also compatible 3DS LRDIMM, RAM using dies stacked and an architecture composed of dies masters and diesslaves. Only a die master communicates with the memory controller. Other dies (slaves) receive their information from die master thereby improving timing, optimizing the management of the memory bus and increase the total bandwidth.
|Characteristics||Xeon E5-2600 v3 (Haswell-EP)||Xeon E5-2600 v4 (Broadwell-EP)|
|fineness||22 nm||14 nm|
|Hearts / Threads per CPU||until 18/36||until 22/44|
|L3 cache||Up to 45 MB||Up to 55 MB|
|QPI||2 channel QPI 1.1 to 6.4, 8 or 9.6 GT / s|
|PCI lines||40 PCI-Express 3.0 (2.5, 5, 8 GT / s)|
|memory controller||4 Channel DDR4 (2133 MT / s)||4 Channel DDR4 (2400 MT / s)|
|memory configurations||Up to 8 channels, 24 slots and a capacity of 1538 GB|
|chipset||Intel C610 PCH Wellsburg|
|Ethernet controller||Up to 40 Gbit (Intel XL710 Fortville)|
|TDP||160 W for workstations to 55 W|
Architecture of Intel Broadwell-EP Xeon E5-2600 V4
The XeonBroadwell-EP are available in three versions: HCC (High Count Cores) measuring 18.1 mm x 25.1 mm and have 7.2 billion transistors. Then there is the MCC (Medium Core Count) measuring 16.2 mm x 18.9 mm and includes 4.7 billion transistors. Finally, the LCC (Low Core Count) have 3.2 billion transistors in a die of 16.2 mm x 15.2 mm. They all use a ring called architecture (ring), that is to say that hearts are structured into two columns communicating with each other using a ring that also passes through the memory controller and QPI Module / PCI-Express.
New symmetrical rings for more performance Broadwell-EP Xeon E5 review
This provision of Hearts is not new, but unlike the Haswell-EP, the columns are now symmetrical. On Haswell-EP, the HCC version had two more hearts right and left. Intel now offers the same number of cores on both sides to optimize interconnections and access to caches. Concretely, the HCC provides two rings 2 x 11 hearts, MCC includes a ring of 10 hearts and a half-ring of 5 cores. Finally, the LCC version offers a ring can reach up to 10 cores. Broadwell-EP Xeon E5 review
Each core has 2.5 MB of shared L3 cache on the ring and access to the memory request an average of 12 clock cycles. Each ring has its own scheduler that will take care of coordinating access . The ring was created to be bidirectional. We can decide to raise or down to find the information as quickly as possible. Intel offers several rings on his HCC and MCC configurations to reduce access time to the cache and parallelize if the information is on different rings. Traffic between the rings is provided by a bridge and switch from one ring to the other represents a penalty of about 5 clock cycles. Broadwell-EP Xeon E5 review
A Broadwell-EP can hide another processor
The most perceptive will have noted that the pattern of HCC version has a total of 24 hearts, but that the most efficient Xeon E5 v4 “only” 22 cores. Intel admits having two off , but that there 24 good hearts on the die . One imagines that the chipmaker will release a more powerful version of its Broadwell using both today off hearts.
Finally, it is interesting to note that there is a ring memory controller and they all four channels. This means that there could be eight channels on HCC versions and MCC, but Intel uses only two channels per controller in these configurations, presumably to ensure compatibility with motherboards Haswell-EP. Using two controllers increases performance, since each of them has its scheduler. Thus, even if the controllers are identical, it is expected that the assembly is more efficient than on the CCA configuration that has only one controller. Finally, only the ring from left to access to QPI and PCI-Express lines.
Optimisation and Virtualization: Broadwell-EP Xeon E5 review
Intel introduced several technologies to improve the performance of its business in Broadwell-EP.
Hardware Controlled Power Management (HWPM)
Normally, the operating system of a server processor that will prevent certain tasks do not require a lot of resources and advise it to reduce its frequency. Even if the processor accepts, the time it takes this communication is often too long. With HWPM, the chip does not need to go through the operating system. Just enable the feature in the BIOS by selecting one of four preset profiles for enjoying a management entirely determined by the processor consumption .
Communications within the die is significantly faster, it saves or increase the frequency faster when the CPU load changes. The profiles range from “full power” to “minimum consumption”. In addition, the processor uses information that is not available to the software, which provides an exemplary optimization, according to Intel. The presence profiles shows that the chipmaker continues to optimize this technology.
According to our benchmarks for a number of cores and identical frequencies, the Broadwell-EP offers slightly less than a Haswell-EP consumption. It is likely that the fine engraving is there for something, but do not sweep too fast HWPM. This could be an important system for Intel, when the chipmaker will have perfected. Broadwell-EP Xeon E5 review
In computing, an interrupt ( interrupt in English) is a warning message sent to the processor to tell an event demand immediate attention. This traditionally causes the suspension of the execution of a program, saving the current state and launching a function that will handle the event that caused the interruption. Some interruptions are completely benign while others are much more serious and can be caused by serious errors.
The functionality interrupt posted new Xeon allows the processor to handle interruptions that may occur within potentialities environments. Previously, an external interrupt necessarily passed by the hypervisor that made up the affected virtual machine. This had the great disadvantage of requiring the system to leave the virtual machine to switch to the hypervisor and then return to the virtual operating system. With functionality posted interrupt , it is the processor which is to send directly load the interruption to the virtual machine that will handle it when it is active again. This will greatly reduce the latency and increase performance. Broadwell-EP Xeon E5 review
Benchmarks and Power: Intel Broadwell-EP Xeon E5-2600 V4
Sysbench CPU: Broadwell-EP Xeon E5 review
Sysbench CPU is a general theoretical test that attempts to measure the performance of the entire system. In this benchmark, the E5-2699 v3 takes a slight lead over the E5-2697 v4 without being sufficiently important that it reflects a problem. Both systems just chips are very close to one another.
Unix Bench, the Whetstone test highlights the processor performance during floating point calculations. It shows a gain of 20% when switching on the Broadwell-EP compared to Haswell-EP, possibly due to optimizations in the architecture. On the Dhrystone test 2, which has no floating point calculation, the increase is limited by about 4% against.
NAMD and NAS Parallel: Broadwell-EP Xeon E5 review
NAMD The test is designed to measure the performance when running an optimized software to take advantage of chip parallelism. The optimizations made by the ring architecture of Broadwell-EP offers here a gain of about 14% . By cons, all the software do not show the same gain. Under NAS (NASA Advanced Supercomputing) Parallel, which is supposed to measure the same thing, there is virtually no gain between Haswell-EP architecture and the new Broadwell-EP
Redis: Broadwell-EP Xeon E5 review
Redis (REmote Dictionary Server) is a benchmark that tests memory bandwidth and CPU performance. It shows that the use of a faster memory controller in combination with the architectural changes, provides a performance gain of between 8% and 12% , according to the operations. Broadwell-EP Xeon E5 review
Conclusion and Result: Broadwell-EP Xeon E5 review
The new Xeon E5 chips for servers v4 (Broadwell-EP) bring new exciting features and beautiful architectural optimizations. Nevertheless, this generally means that a gain of rather low performance or non-existent in some cases, but Intel is playing a different game.
Intel does not play the game performance
The Broadwell-EP are not there to win the crown of performance puisqu’Intel already possess, and no calls its hegemony into question. ARM or PowerPC servers are a species apart. AMD Opteron are so far away that nobody thinks. In short, the purpose of Broadwell-EP is not to establish new records. In the best cases they are 20% more efficient, especially when the application takes advantage of optimizations for floating point calculations. In most traditional situations, one should have a gain of 10% or less. Broadwell-EP Xeon E5 review
Intel speaks the language of business
Intel is expected by against seduce many companies wishing to update their systems Ivy Bridge and Sandy Bridge or keep their Haswell-EP systems and simply update their processor to extend the life of their servers or workstations. Those who seek more hearts will find 4 more (and 6 in a future more or less close). Those who wish to optimize their virtual machines or the security of their systems will find interesting features. Those who need to replace their systems or Skylake Ivy Bridge will have sufficient performance gain to justify their purchase. In the end, we see that the founder really understood how to talk to companies to remain true to its ecosystem. The hegemony of Intel is not ready to erode. Broadwell-EP Xeon E5 review
- 1 Architecture of Intel Broadwell-EP Xeon E5-2600 V4
- 2 Optimisation and Virtualization: Broadwell-EP Xeon E5 review
- 3 Benchmarks and Power: Intel Broadwell-EP Xeon E5-2600 V4
- 4 Conclusion and Result: Broadwell-EP Xeon E5 review