QUIK, sensor hubs and the "adjacent possible"

Bringing Light to Dark Silicon
Gray Silicon and FPGAs
by Kevin Morris

For the past few years, we’ve all been hearing the discussions about “Dark Silicon.” Besides being a really cool and ominous-sounding label, dark silicon is an issue that threatens to end multicore scaling on ICs. The reasoning goes like this: “Dennard Scaling” has ended. Dennard Scaling is the concept that power density remains constant as transistors shrink, which gives “Koomey’s Law” its teeth. Koomey’s Law says that performance-per-watt in computation has been improving by approximately a factor of two every 1.57 years.

At the most recent process nodes, the amount of leakage current has caused Dennard Scaling to break down, and power density has been increasing rapidly. Increasing power density on top of Moore’s Law means that even though we can put more transistors on a chip, we can’t let them operate all at the same time without thermal runaway. We have to leave some of them dark at all times - hence, “dark silicon.”

If your chip is a big Swiss Army knife, with many functions but only one of them active at a time, dark silicon is your friend. You don’t have to power up the bottle opener when you’re using the screwdriver. But if your chip is (to choose a not-so-random example) a multi-core processor, it really doesn’t make sense to try to boost performance by adding additional processor cores if you already are at the limit of the number that can be powered on at any given time. If eight cores is the most you can have active at once, the ninth and tenth cores would be just so much wasted silicon. Therefore, dark silicon puts an end to computational performance improvement by multi-core scaling (at least homogeneous multi-core scaling, but we’ll discuss that detail later).

As we have discussed with our coverage of Intel’s bid to buy Altera, power has become the dominant factor in computing. In most cases - from mobile phones to laptops to servers to supercomputers - our computing performance is limited by the amount of energy available. This is why FPGAs are positioned to play such a crucial role in compute acceleration. They have the ability to deliver much more computation per watt than conventional processor architectures. FPGAs can pair with conventional processors to create a heterogeneous computing platform that can deliver much higher performance with lower power consumption when compared to conventional processors working alone.

But we have now reached a time when FPGAs themselves can reach the threshold where dark silicon becomes an issue. With the latest generations of FPGAs, you can’t utilize all of the logic on the chip operating at maximum frequency all at once. In order to stay below the thermal limits of the device, you have to have some of your design idle and/or operating more slowly. In most types of chips, you can simply gate the portions of your design that are not in use, sending them to dark silicon land while the currently working part of your design does its job.

But FPGAs have always been very restricted when it comes to power gating. Since most FPGAs are configured with SRAM-like cells defining the LUT functions and the interconnect logic, you can’t simply shut off power to big sections of your chip without losing the configuration. Furthermore, unlike most types of chips, the chip designers can’t pre-plan what areas go dark under what conditions because the function of the chip is not known when it is built. The FPGA’s function is added later, and the parts that need to be active at any given time are determined by the particular design and even by the particular set of data and inputs on which the design is working. This makes it extremely challenging to build an FPGA with the kind of dark-silicon power-saving strategy that works in other types of devices.

In FPGAs, it’s more like “gray silicon” - not completely light or dark, but somewhere in between.

If we back off to the fundamentals of computation, however, we can look past the dark silicon issue and see what really ultimately drives energy use in computation. All of us learned way back in EE school that power in MOSFETs depends on switching. Each time a transistor is switched, power is consumed. For a given size and technology of transistor, this is a fairly constant unit of energy. So, it really doesn’t matter if that transistor is part of a conventional processor or part of some FPGA fabric. If we could take the particular computation we want to perform and calculate the number of transistor toggles for each different architectural approach, we would know the energy efficiency of each architecture for that computation.

If we apply this thinking to the spectrum of architectures available, we see that the conventional processor is clearly one of the least efficient. It spends a lot of cycles, toggles a lot of transistors, and fills a lot of wires with charge doing things like fetching instructions from memory, interpreting those instructions, incrementing program counters, and performing other housekeeping activities that are not associated directly with the computation at hand. At the other end of the scale, custom-designed hardware architectures are the most energy efficient. If you want to multiply two numbers together, you won’t get any more efficient than a custom-designed hardware multiplier.

This is a very close match of QUIK's Dr Saxe on MCUs...70% of the energy is fetching stuff from memory, its why the charts on the power consumption put out by the company( MCU) that do not include this stuff aren't so useful; it is the reason for the FFE.( common task-specific hardwired logic blocks) What else to like as an investor in their biz? Their approach is supported by the hard science expressed in this essay.

FPGA fabric provides a point in between those two extremes. The configuration logic in FPGAs is overhead that the hard-wired logic doesn’t require, but that configuration logic applies far less penalty than the overhead in the von Neumann processor. This difference in overhead accounts for the difference in energy efficiency between FPGA fabric and conventional von Neumann processors at any given technology node.

Modern FPGAs gain even more leverage by the inclusion of task-specific hard-wired logic blocks. For common tasks that are compute intensive, dedicated hardware can perform those functions with the lowest possible energy consumption. When we take all of this into account, we can see that FPGAs won’t hit the dark silicon wall as soon or as definitively as architectures like multi-core processors. FPGAs will approach darkness somewhat asymptotically, fading through shades of gray along the way.

Recently, both Xilinx and Altera have added substantial new power optimization capabilities to their FPGAs and to their design software. These capabilities permit much better isolation of logic that isn’t in use and much finer-grained control over power gating. The net result will be even more efficient FPGA-based designs, and a much more effective dimmer switch to keep the dark at bay.

The combination of FPGA fabric with conventional processors is one of the most compelling prospective advances in computation today. It has the potential to extend Koomey’s Law beyond the failure of Dennard Scaling, and even beyond Moore’s Law itself, and therefore to drive exciting new levels of computational performance before we run afoul of dark silicon or some other cunning limitation of physics.

Dr Saxe has spoken of much of this in his talks these past months.... and we can see where the S3 fits in.

QUIK, sensor hubs and the "adjacent possible"

Friday, July 10, 2015

No comments:

Post a Comment

Blog Archive