

# POWERPC 440GP: GREAT COMMUNICATOR

IBM's First Book-E PowerPC Combines Speed and Network Integration By Tom R. Halfhill {7/31/00-01}

Merge a Corvette and a Cadillac and you'll get a Detroit disaster. Yet IBM has successfully created a similar hybrid by crossing a fast PowerPC 440 core with the luxury features of a highly integrated communications chip. The result is the PowerPC 440GP, which IBM

disclosed last month at Embedded Processor Forum.

The 440GP is the first chip to use the PowerPC 440 embedded-processor core, which IBM previously unveiled at last fall's Microprocessor Forum (see *MPR 10/25/99-03*, "IBM PowerPC 440 Hits 1,000 Mips"). The 440GP is also the first implementation of Book E, the embedded PowerPC architecture defined by IBM and Motorola (see *MPR 5/10/99-02*, "PowerPC Architecture Gets Makeover"). In another first, it has a 128-bit version of IBM's on-chip CoreConnect bus (see *MPR 7/12/99-03*, "PowerPC 405GP Has CoreConnect Bus"). Reaching clock speeds of 400–500MHz, the 32-bit 440GP is also one of the fastest embedded processors on the market.

But performance is only half the story. As Figure 1 shows, the 440GP comes loaded with on-chip peripherals and interfaces: dual Ethernet media-access controllers (MACs), a double-data-rate (DDR) SDRAM controller, a PCI/PCI-X controller, 64K of cache, and 8K of 128-bit SRAM, plus a liberal assortment of UARTs, I<sup>2</sup>C interfaces, general-purpose I/O ports, and timers. IBM designed the 440GP for network-oriented embedded applications that need high performance, such as cellular base stations, storage area networks, RAID controllers, and network printers.

#### **Deeper Pipeline Enables Higher Frequencies**

Unlike the desktop PowerPC chips that Apple buys from IBM and Motorola, the 440GP has a seven-stage pipeline—two stages longer than the PowerPC G3 and G4 pipelines. That design provides some extra breathing room for higher clock



**Figure 1.** Well appointed with controllers and peripherals, the PowerPC 440GP ties everything together with the CoreConnect bus which consists of the processor local bus (PLB) and the on-chip peripheral bus (OPB).

frequencies. Indeed, in many respects, the 440GP would not be out of place in a desktop computer. It has a dual-issue superscalar core, out-of-order execution, dynamic branch prediction (with a 4K-entry branch history table), a 64-entry TLB, 64K of cache (32K each for instructions and data, 64way set-associative), and 36-bit addressing. The only signifi-

cant omission is an FPU. The 440GP has two integer units and one load/store unit.

IBM Microelectronics will manufacture the 440GP in its CMOS SA-27E process, which is nominally a 0.18-micron geometry. However, the effective gate length  $(L_{eff})$  is 0.11 micron. The 440GP uses five metal layers, all copper. Its die size is 59mm<sup>2</sup>. Core voltage is 1.8V; the SDRAM interface requires 2.5V, and all other I/O tolerates 3.3V. Typical power consumption is about 1W for the processor core and 1–2W for the peripherals, or a total of less than 3W for the whole chip, according to IBM's estimates. The chip is packaged in a 552-contact ceramic BGA that measures 25mm square. Due to the large number of integrated peripherals and controllers, 412 of the chip's 552 pins are devoted to signal I/O.



Donald Senzig, senior PowerPC system engineer at IBM Microelectronics, describes the PowerPC 440GP at EPF2000.

At 400MHz, the 440GP delivers about 720 mips (Dhrystone 2.1). At 500MHz, performance increases to about 900 mips. IBM's goal of reaching 1,000 mips must wait until the processor hits 555MHz, which the initial chips will not quite attain. Samples of the 440GP will be available in 4Q00, with full production commencing in mid-2001. IBM has not yet announced pricing or the availability of faster versions of the part. However, we expect the 440GP to cost less than \$100 at 400MHz.



**Figure 2.** IBM simulated a 440GP with a CoreConnect-PLB bus of three different widths, settling on 128 bits as the best combination of efficiency and economy. Note that a 128-bit PLB actually has two independent 128-bit datapaths for simultaneous reads and writes.

Several features identify the 440GP as a versatile embedded processor. The interrupt controller supports 14 external and 48 internal interrupts—and with so many onchip peripherals, it's easy to see why. The caches have three modes of operation: normal, locked, and transient. The first two modes are common in embedded processors, but the

> transient mode is a little unusual. Under program control, any part of a cache (in increments of 16 lines) can be set aside as a transient region. The cache controller protects the normal region from transient-victim replacements without locking it entirely. This is valuable for packet processing or MPEG decoding, because the 440GP can perform repetitive operations on blocks of data that move sequentially through the region.

> In another nod to network processing, the caches are content-addressable memories (CAMRAMs), not ordinary SRAM arrays. CAMRAMs are faster for table lookups, so the 440GP can access cached portions of router tables more efficiently than other processors can.

### World's Fastest CoreConnect Bus

To provide enough internal bandwidth for the 440GP's panoply of peripherals, IBM implemented the first 128-bit version of its CoreConnect bus. As Figure 1 shows, CoreConnect consists of two independent buses joined by a bridge: the processor local bus (PLB) and the onchip peripheral bus (OPB). (Not shown in the figure is a tributary of the CoreConnect bus that links the peripherals' device-control registers.)

In the 440GP, the PLB can run at 1:2.5, 1:3, 1:3.5, or 1:4 ratios of the core frequency, up to 133MHz. Because the PLB actually has two 128-bit datapaths for simultaneous reads and writes, the peak bandwidth is 4.2GB/s. The 32-bit OPB runs at 66MHz, providing up to 266MB/s of peak bandwidth. Dynamic bus sizing allows the OPB to work with 8-, 16-, or 32-bit devices. As is evident in the figure, the PLB handles high-bandwidth devices like the SDRAM and PCI controllers, while the OPB keeps slower devices from impeding traffic on the PLB. Both buses have 36-bit addressing.

The memory controller works with 32- or 64-bit DDR-SDRAM modules at bus frequencies up to 133MHz. With two data phases per bus cycle, that would be equivalent to single-data-rate SDRAMs at 266MHz, or 2.1GB/s of peak bandwidth to main memory. Eight-bit error correction is optional.

The PCI controller runs asynchronously with the PLB and works with 32- or 64-bit PCI devices at 33MHz or 66MHz, or with 64-bit PCI-X devices up to 133MHz. Compliance with the PCI-X 1.0 specification and a 64-bit PCI bus give the 440GP an edge over other PCI-enabled embedded processors, such as IDT's RC32334, Hitachi's

JULY 31, 2000

SH7751, Motorola's MPC8240, and QED's RM5720. Those chips don't support the PCI-X standard, and they have 32-bit PCI buses.

For off-chip SRAM, ROM, and peripherals, the 440GP has another bus controller with eight chip selects and separate 32-bit address and data buses. (The data bus also works with 8- and 16-bit devices.) A four-channel DMA controller supports scatter/gather operations and burst/nonburst transfers.

Dual Ethernet MACs complete the picture. Each MAC supports 10- or 100Mb/s transfer rates (regular Ethernet and Fast Ethernet). The only thing missing is a PHY (physicallayer interface), but Ethernet PHY chips can be had for about \$5. An affordable multiport Ethernet switch could be built around the 440GP and a multiport PHY.

The 440GP's unusually rich endowment of bus interfaces explains why IBM expanded the CoreConnect bus to 128 bits. Figure 2 shows the results of IBM's simulations with bus widths of 64 bits, 128 bits, and 256 bits. A 64-bit CoreConnect bus is quickly overwhelmed by the combined demands of the DDR-SDRAM, PCI-X, dual Ethernet, and peripheral interfaces. A 256-bit CoreConnect bus would provide the most bandwidth, of course. But a 128-bit bus is easier to route and less costly to implement, and it's capable of managing the 440GP's heavy I/O loads without significant transaction delays.

#### **Fast Speeds and Feeds**

Few embedded processors have Ethernet interfaces, and the 440GP's combination of dual Ethernet MACs with DDR-SDRAM and PCI-X controllers—all wrapped around a fast, superscalar CPU core—is unique. Table 1 shows some of the potential competitors, including IBM's own PowerPC 405GP, Hitachi's SH7615 (see *MPR* 1/24/00-02, "Hitachi

SH7615 Adds Ethernet"), Infineon's TriCore-based Harrier-XT, and Motorola's PowerQUICC II MPC8260 (see *MPR 9/14/98-02*, "MPC8260 Masters Network Control" and *MPR 7/24/00-05*, "New Motorola PowerQUICC II Costs Less").

All these processors have DSP extensions in the form of multiply-accumulate instructions. All the competing chips from other vendors are slower than the 440GP, but they're also shipping today, whereas the 440GP won't be available in production quantities until mid-2001. And until IBM announces pricing, it's uncertain how many of the competing chips are less expensive than the 440GP. Only the 440GP has PCI-X, but Motorola's PowerQUICC

## Price & Availability

Samples of IBM's PowerPC 440GP will be available in 4Q00, and full production is scheduled for mid-2001. IBM has not yet announced pricing, but the 400MHz parts will probably cost less than \$100 in 10,000-unit quantities. For more information, see *www.chips.ibm.com/products/ powerpc/*.

processors and Infineon's Harrier-XT have some networking features not found in the 440GP, such as high-level datalink control (HDLC) ports, ATM segment-and-reassembly (SAR) units, and more Ethernet ports. The Harrier-XT, however, lacks an SDRAM controller. For low-power applications, Hitachi's SH7615 is the clear winner, because it consumes a miserly 690mW.

One puzzling omission from the 440GP is the Code-Pack decompression engine that's included in the lower-end 405GP. CodePack is IBM's scheme for decompressing program code on the fly, and it's a useful feature for an embedded processor with a 32-bit RISC instruction set (see *MPR* 10/26/98-05, "PowerPC Adopts Code Compression"). When code density matters more than performance, the Hitachi and Infineon processors will have an advantage over the 440GP. Although the SH7615 and Harrier-XT are 32-bit processors, the SuperH architecture uses compact 16-bit instructions, and the TriCore architecture has a mixture of 16- and 32-bit instructions.

The 440GP's closest competition will probably come from Motorola's highly integrated PowerQUICC chips, which are similarly well endowed with network interfaces

|               | IBM          | IBM          | Hitachi      | Infineon         | Motorola                 |
|---------------|--------------|--------------|--------------|------------------|--------------------------|
| Feature       | PPC 440GP    | PPC 405GP    | SH7615       | Harrier-XT       | MPC8260                  |
| Architecture  | PowerPC      | PowerPC      | SuperH       | TriCore          | PowerPC                  |
| CPU Core      | PPC 440      | PPC 405      | SH-DSP       | TriCore V1       | PPC 603e                 |
| Core Freq     | 400–500MHz   | 200–266MHz   | 60MHz        | 50MHz            | 200MHz                   |
| Superscalar?  | 2-way        | No           | No           | No               | 2-way <sup>‡</sup>       |
| FPU?          | No           | No           | No           | No               | Yes                      |
| Ethernet MACs | 2 x 10/100Mb | 1 x 10/100Mb | 1 x 10/100Mb | 1 x 10/100Mb     | 3 x 100Mb<br>or 4 x 10Mb |
| SDRAM Ctrl    | DDR-266      | SDR-133      | SDR-60       | SDR-50           | SDR-66                   |
| PCI Bus       | 32/64 bits   | 32 bits      | 32 bits      | —                | —                        |
| PCI Freq      | 33/66MHz     | 33/66MHz     | 33/66MHz     | —                | —                        |
| PCI-X?        | Yes          | No           | No           | No               | No                       |
| Cache (I/D)   | 32K/32K      | 16K/8K       | 4K unified   | —                | 16K/16K                  |
| On-Chip SRAM  | 8K           | 4K           | 8K           | 48K <sup>+</sup> | 24K                      |
| Dhrystone 2.1 | 720–900 mips | 375 mips     | n/a          | n/a              | 280 mips                 |
| Power (typ)   | 2W–3W*       | 1.1W–1.5W    | 690mW        | 1.9W             | 3.8W                     |
| Price (10K)   | n/a          | \$29-\$51    | \$28         | \$39 (1K)        | \$100                    |
| Availability  | Mid-2001     | 3Q00         | Now          | Now              | Now                      |

**Table 1.** These chips have widely varying features, but all of them aspire to win designs for network-oriented embedded systems. (\*IBM's estimated power consumption. <sup>†</sup>The Harrier-XT's 48K SRAM works in various configurations as cache and scratchpad memory. <sup>‡</sup>The MPC8260 can issue three instructions in parallel if one is a branch. n/a = data not available.)

3

and peripheral I/O. Although PowerQUICC II processors are based on an older, slower PowerPC core (the 603e), they do have an FPU—useful for cryptography—and a communications processor module that offloads some low-level networking tasks from the main CPU. Some PowerQUICC chips have four Ethernet MACs, twice as many as the 440GP.

IBM appears to be building a family of networkoriented embedded processors that someday could be as broad as Motorola's PowerQUICC line. The 440GP and 405GP are early members of that family. Future members will reach for higher clock speeds and likely integrate even more features. It will be interesting to see how Motorola's first implementation of Book E stacks up against IBM's— and how well the two companies can manage their odd-couple relationship as simultaneous PowerPC partners and rivals. ♦

To subscribe to Microprocessor Report, phone 408.328.3900 or visit www.MDRonline.com

© MICRODESIGN RESOURCES ◇ JULY 31, 2000 🔷 MICROPROCESSOR REPORT