# THE INSIDERS' GUIDE TO MICROPROCESSOR HARDWARE

# MAP1000 Unfolds at Equator New Media Processor Covers a Lot of Territory

## by Peter N. Glaskowsky

After many years of research and development, Equator Technologies has announced its first media processor. The MAP1000 is the result of research into very long instruction word (VLIW) processing that began at Yale University in the late 1970s and produced minisupercomputers at Multiflow in the 1980s (see MPR 2/14/94, p. 18).

When Multiflow shut down in 1990, John O'Donnell and other Multiflow veterans established Equator Consulting to continue their research into VLIW chip design and compiler technology. In 1996, Equator Consulting was incorporated as Equator Technologies and, with Hitachi, began to develop the MAP1000. Hitachi will use the new chip

in digital television products; both Hitachi and Equator are pursuing applications for media processors in personal computers, consumer electronics, telecommunications systems, and other markets.

# First Product Emphasizes Flexibility

The MAP1000 embodies the many hopes of its creators. Figure 1 shows the chip's complex internal structure, with a pair of 128-bit VLIW processor clusters with shared MMUs and caches, a video/graphics coprocessor, memory and dual PCI-bus controllers, and an array of peripheral controllers fed by a sophisticated data-streaming engine.

Equator does not expect that any customer will use all of the MAP1000's onchip peripherals. The device's MPEG-2 and 3D graphics features will not be needed in a remote-access modem concentrator connected to a T-1 line, for example. The plethora of peripherals imposes a price penalty for most uses, but cost-reduced versions of the chip can easily be created for specific high-volume applications. The heart of the chip is a 128-bit VLIW engine that consists of two identical clusters, each able to perform two operations per clock. All four operations are encoded in a single 136-bit instruction word. Initial chips will run at 150 or 200 MHz. On 32-bit floating-point tasks, the MAP-1000 hits a peak rate of 1.6 GFLOPS. For 16-bit integer multiply-accumulate operations, the MAP1000 peaks at 3.2 billion operations per second. Eight-bit graphics and multimedia operations can be processed at up to 20 billion operations per clock. At these rates, the MAP1000 is much faster than Philips's current TriMedia TM-1100 but somewhat slower than the TM-1400, due to ship in mid-2000 and expected to hit a speed of 300 MHz (see MPR 10/26/98, p. 33).



Figure 1. The MAP1000 includes bus and memory interfaces, graphics and video units, a VLIW processor, a data streamer engine, and many on-chip peripherals.

### 2 🚫 MAP1000 UNFOLDS AT EQUATOR

Each of the two clusters has one integer ALU execution unit and one IFG (integer, floating-point, graphics) execution unit. Both units can execute 32-bit integer, logical, and string-search operations. The I-ALU also handles address calculation, load and store, branch, and system-control operations. The IFG unit performs all floating-point and 64-bit integer operations, additional logical operations including shift, extract, and merge tasks, and SIMD calculations on 8-, 16-, 32-, and 64-bit data.

Each cluster has sixty-four 32-bit general-purpose registers. Even/odd register pairs may be used as 64-bit registers for integer or floating-point data. Clusters also have sixteen 1-bit predicate registers, and almost all instructions can be predicated using a 4-bit predicate identifier. The IFG unit also has a pair of 128-bit registers for SIMD calculations.

The MAP1000 includes a four-way set-associative 16K data cache and a two-way set-associative 16K instruction cache. Instructions are stored in main memory and in the instruction cache in a compressed form to reduce bandwidth demands. The nonblocking data cache can handle any mix of four 64-bit accesses per cycle.

The chip also includes separate data and instruction MMUs that support up to 4G of memory with four page sizes, from 16K to 8M. The external SDRAM interface can be 32 or 64 bits wide and runs at up to 135 MHz. The SDRAM interface also supports SGRAM, flash, and ROM memory devices.

Also provided are two PCI interfaces that can operate at up to 66 MHz. One of the PCI ports can be configured as a 2×AGP master interface for connection to PC core logic. The other PCI port allows the MAP1000 to act as an intelligent I/O controller for PCI peripheral chips. The same port can be used to connect multiple MAP1000s for more demanding tasks such as HDTV video encoding.

| I/O                      | Item                                         | Speed         |
|--------------------------|----------------------------------------------|---------------|
| System Bus               | AGP master                                   | 133 MHz       |
|                          | (or) Primary PCI                             | 66 MHz        |
|                          | Secondary PCI                                | 66 MHz        |
| Memory                   | SDRAM/SGRAM                                  | 135 MHz       |
|                          | Flash ROM                                    | 5 MHz         |
| CRT                      | Analog DAC                                   | 230 MHz       |
|                          | Horizontal/Vertical sync                     | 80 kHz/100 Hz |
|                          | I <sup>2</sup> C/DCC                         | 400 kHz       |
| Video                    | CCIR 656 out                                 | 54 MHz        |
|                          | CCIR 656 in                                  | 54 MHz        |
| Audio                    | AC97 AC link                                 | 12 MHz        |
|                          | l <sup>2</sup> S                             | 6 MHz         |
|                          | IED 958                                      | 25 MHz        |
| Comm                     | POTS (AC link)                               | n/a           |
| Cable/Satellite<br>Modem | Transport channel<br>Interface (serial mode) | 150 MHz       |
| Serial                   | UART                                         | 115 kHz       |
|                          | USB                                          | 12 MHz        |

Table 1. The MAP1000 includes a wide array of on-chip peripheral controllers, suiting it to many different applications.

### Data Streamer Offloads CPU Core

The MAP1000's data streamer is essentially a powerful, flexible DMA engine. Transfers are managed by descriptors that define 32-bit source and destination addresses and an optional next-descriptor pointer to allow chaining. The descriptors can define 2D arrays of data using width and pitch values, handling 2D bit-block transfers (BLTs) in a windowed graphical user interface. The chaining feature facilitates scatter/gather DMA and batching of multiple transfers.

When necessary, cache coherency is preserved by the data streamer, with various coherency modes available for both the source and destination data. The data streamer signals the VLIW core upon completion of a sequence of transfers; the core can also halt, then resume, transfers in progress. A 4K SRAM buffer within the data streamer can be allocated to as many as 64 FIFO queues managed by hardware in the data streamer.

The streamer sits between the MAP1000's pair of internal 64-bit, 200-MHz data buses and the block of on-chip peripherals. With a single controller for data transfer, the peripherals themselves can be simpler—in some cases, little more than serializer/deserializer circuits. The data streamer has 800 Mbytes/s of aggregate bandwidth, greater than the combined peak bandwidth demand of all the peripherals on the MAP1000.

Table 1 shows the MAP1000's many peripheral interfaces. Other peripherals can be connected through the 53-pin Versa Port, a multiplexer that connects a subset of the onchip peripheral signals to external devices. The Versa Port allows the MAP1000 to be configured for specific applications. The chip's T-1 framing circuit, for example, allows the MAP1000 to be used as the heart of a digital cellular base station, performing voice compression and decompression to interface a digital radio transceiver with a T-1 link to the cellphone company.

### Graphics and Video Handled in Hardware

Unlike the first-generation media-processor chips from Chromatic and Philips, Equator's MAP1000 includes fixedfunction logic blocks for most graphics and video functions. The 3D graphics accelerator in the MAP1000 uses a tilebased rendering architecture similar to that of Microsoft's Talisman initiative (see MPR 8/26/96, p. 5). Other elements of Talisman, including texture compression and the dynamic compositing mechanism, were deemed unnecessary for a general-purpose media processor and are not included.

Equator has not characterized the features or performance of its 2D/3D graphics core. We believe it will be significantly slower than current mainstream 3D chips, which themselves are about as complex as the entire MAP1000. This deficiency will prevent the MAP1000 from being used as a graphics accelerator in any but the cheapest personal computers, but this is not a serious problem for Equator. Indeed, it may be a blessing in disguise; Chromatic's pursuit of the PC market was an expensive failure. For most practical applications of media processors, such as digital TVs or VCRs, video support is much more important than 3D graphics. Here, Equator has a more compelling performance story. The MAP1000 can handle simultaneous video encoding and decoding, a capability that previously required expensive fixed-function codecs such as C-Cube's DVx (see MPR 12/8/97, p. 1). The MAP1000 can perform these tasks along with system control functions.

The MAP1000 provides NTSC/PAL digital video input and output ports. Both require external analog converters. The chip also offers an analog RGB output to drive computer monitors or component-input video displays. The 230-MHz on-chip RAMDAC is among the peripherals managed by the data streamer. Up to 400 Mbytes/s of data can be sent to the RAMDAC, enough for a 24-bit true-color display at 1,280  $\times$  1,024-pixel resolution.

To display  $1,920 \times 1,080$ -pixel 30-Hz interlaced HDTV video, the MAP1000 must first downsample then upconvert the video, thus applying a low-pass filter to the image. This limitation may keep the chip out of some high-end digital-television decoders, but such products will be rare for the first few years of the DTV transition because of the extremely high cost of true HDTV-resolution displays.

The MAP1000 can also be used to decode and display digital television signals on conventional analog televisions, a function that is sure to be in high demand during the transition from analog to digital broadcasting. Combined with real-time video encoding and an external hard-disk interface, the MAP1000 provides all the processing required for a disk-based digital VCR like the recently announced ReplayTV product (*www.replaytv.com*), which uses a fixed-function MPEG-2 encoder/decoder chip and works only with standard-definition analog video.

### Equator Has New Attitude on Programming

Though the MAP1000's hardware architecture is sophisticated and innovative, perhaps the most interesting aspect of the new chip is Equator's strategy for software development. Other media processors are generally programmed in a mix of high-level C code and low-level assembly language. Repudiating this strategy entirely, O'Donnell describes his team as religiously opposed to assembly-language programming.

Equator has built its strategy around high-level programmability, and it has taken specific steps to both demonstrate and enforce its reliance on C. Outside software developers, such as the group at Hitachi that is developing the all-format decoding (AFD) software for digital television, use only C. Equator has provided these developers with tools that they claim produces highly efficient object code without manual optimization. To protect developers' intellectual property, Equator generally has no access to their source code.

This development model is different from those of other media-processor vendors. Chromatic handled all of the software development for its Mpact program, and Philips's internal developers must work closely with TriMedia customers. Mpact failed in part because Chromatic could not develop each multimedia codec independently from the others that would run on Mpact at the same time. For example, DVD playback requires code for MPEG-2 video, MPEG audio, Dolby Digital, and subtitle decoding. The Mpact tool chain lacked the ability to partition CPU resources in a way that would allow these routines to be written independently. This made it impossible for Chromatic's customers to add differentiating features to Mpact-based products.

Philips provides both C and assembly-language programming tools to TriMedia customers, but critical inner loops often must be written or optimized by Philips's own software engineers. Philips provides libraries of common multimedia codecs for TriMedia, but integrating these codecs with customer code requires substantial cooperation between Philips and its customers.

### **Binary Compatibility Not Guaranteed**

Equator also hopes to avoid another problem common to previous media processors—the need to preserve binary compatibility between product generations. Just as new x86 CPUs must remain fully backward-compatible with older x86 chips, Chromatic's Mpact 2 retained the ability to run Mpact 1 code. Similarly, Philips's forthcoming TriMedia CPU64 core (see MPR 10/26/98, p. 33) can still run 32-bit code written for the original TriMedia core.

Maintaining backward compatibility is even more difficult for media processors than for x86 CPUs. PC application software is written to work well on contemporary hardware and can usually expect to be the only task running on the system. The inevitable clock-speed increases from one CPU generation to the next ensure that older software will run acceptably, even if legacy modes (virtual 8086 mode, 80286 addressing modes, etc.) are not implemented as efficiently in newer chips.

In contrast, newer media processors are expected to be faster for older code as well as newer code, and they must run the old code in parallel with new tasks. For example, a new media processor may be expected to run existing MPEG-2 video-decoding logic in parallel with code for a new 3D graphical user interface.

Future Equator media processors will be fully compatible with the source code being written for the MAP1000, but Equator has made it clear to its customers that binary compatibility is not guaranteed from one product generation to the next. This will force customers to recompile their source code for each new chip—but by doing so, they will reap the benefits of any improvements in architectural parallelism.

Even with Equator's sophisticated compiler technology, the MAP1000's programming model is much more complex than that of a mainstream microprocessor. Equator provides a real-time operating system for the MAP1000 based on MMOSA, a software reference platform from Microsoft originally meant to support media processors on its Talisman reference design. MMOSA also handles

### Pricing and Availability

The MAP1000 is available now. Equator has set the initial price at \$750 for single samples or \$200 each in high volume. The PC-based set-top reference boards cost \$6,333 per set, while the company's iMMedia development tools are \$12,000 per seat. More information is available from Equator's Web site, *www.equator.com*.

communication between the MAP1000 and the host CPU in PC-based applications, such as digital TV-tuner cards.

### Equator Has Designs on Set Tops

In a concrete expression of Equator and Hitachi's interest in consumer applications, Equator is developing reference designs for digital television controllers. The first design, a two-board set available now, is PC based. Equator will also offer a standalone "Super Set Top" platform that dispenses with the PC host processor, leveraging the MAP1000's ability to run sequential code and act as a PCI host. Both reference designs provide interfaces to satellite and cable TV systems, a telephone modem, video and audio I/O, a smart-card controller, and an IDE disk-drive controller for optional DVD and hard-disk drives.

Portions of these reference designs could be extracted to produce anything from a low-end cable-TV tuner with a smart user interface but no local storage to a complete home-theater control system. Customers who desire to run a PC operating system will start with the PC-based



Figure 2. The VLIW core and caches occupy roughly 55% of the MAP1000's die. MDR estimates the chip's die size at 260 mm<sup>2</sup>.

configuration. The standalone set-top design will be more interesting to consumer-electronics companies.

The MAP1000 will compete with several different architectures and many different chips for set-top designs. With little or no need to maintain compatibility with existing products, set-top designers are free to select among conventional processors with discrete hardwired decoder chips, programmable media processors such as TriMedia, and conventional CPUs equipped with integrated media-processing elements, such as the PowerPC G4 with AltiVec (see MPR 11/16/98, p. 17).

Equator believes it already has significant advantages over alternatives that combine a general-purpose CPU with hardwired decoding logic. The more multimedia algorithms a platform must support, the greater Equator's advantage. The most sophisticated digital TV products, however, will likely need to run a fully featured graphical user interface and handle the full range of multimedia data types available over the World Wide Web as well as from broadcast, cable, and satellite TV providers. While the MAP1000 appears able to handle the processing needs of such a system, we believe it will always lag behind PC-based alternatives in the availability of critical codec and application software.

### High Initial Chip Cost Will Decline Rapidly

As noted earlier, the MAP1000 has on-chip logic to support several different target applications. This logic imposes a cost penalty for any specific use. Figure 2 shows a die photo of the MAP1000, about 55% of which is devoted to the VLIW core and caches; the rest is occupied by peripherals and the pad ring. Most of the chip was designed with a fullcustom layout. Some of the peripheral blocks were not performance-critical and were designed using standard-cell methods. Equator used Quickturn simulators to test the device prior to tapeout and has been working with actual silicon for six months.

We estimate the MAP1000 die size to be 260 mm<sup>2</sup>. Manufactured in Hitachi's 0.25-micron process and packaged in a 399-contact BGA, the MAP1000 has a manufacturing cost estimated at about \$100, according to the MDR Cost Model. Equator would not comment on these estimates or disclose the chip's transistor count.

At \$200, the MAP1000 is priced to compete with highend digital-TV chip sets from C-Cube (see MPR 11/16/98, p. 4) and other vendors. Equator's offering is more expensive but more complete than most. The MAP1000's integrated CPU and 2D/3D graphics features justify its higher price. Hitachi and Equator plan a DTV-optimized chip for 2H99 with better price/performance and a third chip with better graphics support for 2000.

Equator appears to have solved most of the serious problems that plagued the pioneers in the media-processor market. If Equator can achieve the full potential of its compiler technology and deliver on its roadmap, it could break the media-processor jinx and achieve genuine success.