# Cogency Pushes Asynchronous Logic Startup Company Develops Asynchronous stDSP for LG Semicon

# by Jim Turley

Proving that asynchronous logic really can make it out of the laboratory and into the market, Cogency has created an asynchronous 16-bit DSP that lowers power consumption by nearly half compared with a similar, but synchronous, version while matching it in performance. The small English company's first product, called stDSP, shows off its design talent and hopefully will bring more customers who are interested in its unusual approach to lowering power consumption and reducing electronic noise emissions.

Cogency is building its business as a fabless semiconductor vendor around the concept of asynchronous (or, in the company's parlance, "self-timed") logic, designing and producing low-power chips for its customers. For the company's first (and so far, only) customer, LG Semicon, Cogency designed an asynchronous chip that consumes about half the power of LG's own synchronous version. The stDSP, which is now sampling, is tailored for fax/modem applications and serves as Cogency's calling card for new customers interested in testing the self-timed waters.

# From England to ASIA

Founded in 1995, Cogency is made up of former LSI Logic employees and researchers from the University of Manchester. As with Amulet, which is an asynchronous implementa-



Figure 1. Cogency's ASIA design framework links several asynchronous execution units to a central decode-and-dispatch unit via asynchronous request/acknowledge handshake signals.

tion of an ARM7 microprocessor (see MPR 10/6/97, p. 18), Cogency started by taking an existing synchronous design and modifying it for asynchronous operation as a proof of concept. Unlike Amulet, however, Cogency's stDSP chip is designed to produce revenue.

The device is a 16-bit fixed-point DSP chip designed for its customer, LG Semicon. Actually, the chip was designed *by* LG Semicon, which contracted with Cogency to produce a version of the chip using asynchronous logic to reduce power consumption. This Cogency did, with the redesigned part consuming 47% less power, on average, than the original design. In all other respects, the two versions of the chip—LG's original synchronous design and Cogency's asynchronous implementation—are functionally interchangeable and pin-compatible. LG uses its synchronous version internally; the asynchronous part is currently available only from Cogency.

Cogency has two core competencies: asynchronous logic design and its ASIA (application-specific integrated architecture) design platform. ASIA is a hardware framework upon which the company hangs its asynchronous logic blocks. With ASIA, Cogency can offer customers a modular framework for DSPs, microprocessors, and microcontrollers that can implement different instruction sets, peripherals, and execution units for specific applications.

#### Basic Hardware Handshaking Paces Logic

Cogency's self-timed logic relies on a straightforward twowire, four-phase handshake familiar to designers of VMEbus boards or other asynchronous systems. After presenting the data, the bus master asserts a request signal. Receivers latch data on the asserted edge of the request and signal their completion by asserting their own acknowledge signal. When all acknowledges have been received, the master negates its request and receivers negate their acknowledges.

The stDSP architecture is fairly conventional for a 16-bit DSP. The programmer's model includes a single 32-bit accumulator, eight address registers, four registers used for circular addressing, a memory-page register, and a global control/status register. Most instructions are encoded in 16 bits, with some 32-bit instructions. Most operations execute in a single clock cycle (although "clock cycle" is a misnomer in this case; it is equivalent to a single execute stage), with some operations taking longer to complete.

Figure 1 shows a block diagram of the stDSP's internal core design and serves as a good example of ASIA partitioning. A central instruction decoder manages control flow for the half-dozen independent execution units (multiplier, ALU, RAM controller, etc.). Each execution unit is independent and asynchronous to the other units, as well as running asynchronously internally.

With ASIA, each execution unit is separate from the others, with little communication among units. Because the chip design is asynchronous, each function unit can run at its own speed, independent of the others. All execution units connect to a central bus, which Cogency calls the schedule control bus. A single request signal is driven by the instruction decoder and bused to all of the execution units. Each function unit drives its own set of acknowledge signals back to the decoder and to each of the other execution units.

The instruction decoder fetches and decodes each instruction, converts it to an intermediate format, and drives the result onto the schedule control bus. The instruction decoder then drives its request signal active to indicate that a valid instruction is on the bus.

Each execution unit then examines the bus to determine whether it needs to participate in executing the instruction. Part of the decoded instruction format includes a small (2–3 bits) field for each execution unit that indicates its participation and any dependencies that may exist. If, for example, the ALU is dependent on the results of the multiplier, that dependence is indicated in the decoded instruction.

Any execution unit that is not participating in the current instruction signals its completion immediately via its acknowledge signal. When the instruction decoder senses that all its acknowledge inputs are active, the instruction is complete and another instruction can be decoded.

If the instruction includes dependencies between execution units, the dependent unit waits for its predecessor (which is identified in the decoded instruction) to assert its acknowledge signal before beginning its own execution. When the dependent execution unit finishes, it signals its completion to the instruction decoder.

#### Asynchronous Code Is Done When It's Done

One of the peculiarities of an asynchronous processor or DSP is that its performance is difficult to gauge. Unlike synchronous designs, it's not possible to quantify performance with MIPS or MOPS, much less MHz. An asynchronous execution unit is done when it's done, and a processor is either fast enough for a given task or it isn't.

According to Cogency, the stDSP's overall performance is about equal to that of the LG chip running at 30 MHz. That is, tasks that could be completed on the 30-MHz part will generally run successfully on Cogency's stDSP, although portions of code may complete either faster or slower than they do on the synchronous part. Cogency points out that for some functions, the stDSP performs more like a 40-MHz part; conversely, on some functions, it's closer to a 20-MHz DSP.

The stDSP's external bus interface is one aspect of the chip that didn't change significantly during the transmogrification to asynchronous logic. Signal timings and bus protocols, which were nominally asynchronous anyway, converted straight across. Consequently, the two DSPs are pin-compatible, running from the same voltage and delivering comparable performance with the same external memory.

#### Noise Emissions Lowered

All of Cogency's diligent work would be little more than an interesting academic exercise if the stDSP didn't deliver some tangible advantage to its users. Lacking superior performance, the chip's major advantage lies in doing the same with less. In this case, the chip consumes about 500 mW, just over half the power, on average, of its synchronous twin. The chip also emits much less radio-frequency (RF) energy, according to the company's simulations, a major boon to makers of sensitive low-power wireless equipment such as digital telephones and pagers.

Proceeding from the assumption that the noise generated by a circuit is proportional to the peak switched current and the frequency components of the circuit, the chart in Figure 2 displays simulated spectral content of both the stDSP and its synchronous predecessor, from DC to 1.8 GHz. Clearly, the asynchronous design emits an order of magnitude less energy at most frequencies.

Both chips demonstrate a primary frequency component at 60 MHz because of the two-phase, 30-MHz bus clock. However, the synchronous chip also emits harmonics from this frequency, which the asynchronous device does not. Although the amplitude of the harmonics diminishes with frequency, they are still in evidence above 1 GHz, which is well into the range of most digital cellular telephones, pagers, and other wireless devices.

Obviously, such noise emissions are not crippling, or else we wouldn't have digital cell phones today. Cogency likes to point out, however, that any noise is bad noise, and less RF interference could translate into more sensitive receivers and longer battery life.



Figure 2. A plot of RF emissions from both the synchronous and asynchronous implementations shows the stDSP doesn't radiate RF energy at harmonics of the clock frequency.

# Price & Availability

Cogency's stDSP chip, which runs at an equivalent speed of 30 MHz, sells for \$38 in 10,000-unit quantities. The part is sampling now, with production scheduled for 1Q98. For more information, contact Cogency (Toronto, Ont.) at 416.487.6314 or Cogency UK (Manchester) at 44.161.428.9444, or visit *www.cogency.com*.

## Both Versions Use Same Fabrication Technology

Cogency's stDSP chip is fabricated by LG on its 0.6-micron two-layer-metal 5-V CMOS process. The part measures about 107 mm<sup>2</sup> and, as the die photo in Figure 3 shows, its size is dominated mostly by the 18K of memory on the chip.

Compared with LG's original synchronous design, Cogency's chip has fewer logic transistors, although it is actually about 5.5 mm<sup>2</sup> larger in area. The asynchronous core has 54,839 transistors, while the more conventional version has 69,474—an extra 14,635 transistors, or 26% more logic. Both chips use the same RAM, ROM, and peripherals blocks, so the core design is the only different between them.

Cogency's core is physically larger than LG's, even though it has significantly fewer transistors, because the layout was not optimized for area. The company expects a future version will shrink the silicon area somewhat.

# **Design Flow Is Also Asynchronous**

A fabless semiconductor company, Cogency invites customers to bring in their design ideas but understands that most—if not all—potential customers will be unfamiliar with asynchronous design techniques. Cogency provides design assistance and farms out the fabrication to foundries, such as LG.



Figure 3. Cogency's asynchronous DSP includes 18K of SRAM and 2K of ROM. The chip measures about 107 mm<sup>2</sup> in LG's 0.6-micron three-layer-metal CMOS process.

Design input can be as vague as a schematic or as concrete as a Verilog model. In fact, Cogency prefers working from Verilog descriptions even though they can't be used directly in the company's work flow, because they provide an unambiguous definition from which to work.

To get the best results from Cogency's ASIA framework, additional execution units should be asynchronous. Conventional synchronous blocks can be used as well, with the addition of an asynchronous "wrapper" around the logic that provides the request/acknowledge handshake with other execution units and the central instruction decoder.

Adding new execution units implies changing the instruction set, and this is exactly what Cogency encourages. Users can implement their own instructions, adding or subtracting operations that map to the hardware resources and suit the application requirements. All instructions are microcoded in the instruction decode block. The microcode provides explicit execution and dependency information to each of the function units; individual units are completely unaware of the other resources available and don't need to be modified if the instruction set changes. Cogency sees this modular design approach as one of ASIA's strengths.

## Following a Different Drummer

Cogency's first product, the stDSP, meets the company's goals nicely: it performs just as well as the original DSP while consuming much less power and generating less RF noise. As an existence proof, it should go far toward convincing potential customers that Cogency can deliver on some of the claims made for asynchronous logic.

Where the company needs to expand is in its tool chain and its logic library. Currently, all of the stDSP's logic was designed by hand, using Verilog design files from the original synchronous DSP and painstakingly hand-converting them. For commercial success, this process will have to be streamlined. Also, the stDSP incorporates just about every logic block Cogency has designed to date. Making a simpler, less capable part would be trivial; making a more ambitious one will involve considerable design effort.

Finally, Cogency's current design methods are tweaked for LG's fab lines. Although LG is a perfectly good supplier, Cogency may need to branch out in order to serve a broader range of customers.

The microprocessor and DSP industries are not wanting for engineers and designers with new and innovative ideas. Occasionally, some of those ideas make it beyond the halls of the university or the research lab. Even more occasionally they make it into mainstream, commercial parts. Very rarely do they become accepted practice.

To displace entrenched design methods, new techniques must consistently and clearly demonstrate their superiority in some dimension. Amulet and Cogency's stDSP are two small steps in that direction. In time, asynchronous logic could even become a common and accepted way to beat the clock.

EMBEDDED