Source: BYTE, Fourth Annual Special Edition, 1987 Page 83+
(local copy)
Authors: Ciro Cornejo and Raymond Lee
Note: With the limited amount of accessible
technical information on IBM's Micro Channel Architecture, looking at the
period articles on MCA provides context. However, IBM's tech writing was not
infallible, "Technoslovakian" can be jilted and terminology and concepts were
IBM-centric. Plus, IBM clarified "Reserved" functions after the initial release
of Micro Channel. So when you read articles like this, always remember that it
might have been true AT THAT TIME... YMMV, LFO.
The 32-bit bus has finally arrived for the personal computer in the form of
Apple's Macintosh II and IBM's Personal System/2. Central to each of these
machines is a 32-bit bus capable of high speed operations at the bandwidth
required for today's 16-megahertz processors.
As you might expect, though, Apple and IBM have adopted dissimilar bus
architectures. NuBus, developed by MIT and Texas Instruments, has been adapted
by Apple as the Macintosh II system bus. It supplements the Macintosh II's
private 68020 processor bus, and six slots open the microcomputer for
expansion. IBM, on the other hand, has revamped the older IBM PC and AT bus to
handle higher speeds and 32-bit processing. The new IBM bus, the Micro Channel,
serves as both a CPU bus and a system bus.
It's no accident that these new computers contain new bus architectures.
Today's new microcomputers require more than just increased processor power and
expanded memory. Investing in these products means committing to computing
platforms that must be stable up to and perhaps through the mid-1990s. This
requires a bus-based architecture capable of adapting to expanding processing
rates, coprocessing or multiprocessing, and adapting to new peripherals such as
advanced graphics terminals. I'll examine the two buses with regard to these
capabilities.
Why a Bus?
Why have a special bus at all? Most processors actually define a bus
structure of address and data paths called a local, or CPU, bus. The reason for
this is straightforward: to build generality into systems.
A local bus is structured to optimize the processor-to-memory bandwidth. It
is therefore highly processor-dependent: It is tightly linked to its processor,
memory, and specific support peripherals. The cost of this performance is a
loss of flexibility. A local bus might be unable to take advantage of newer
technologies if they differ significantly from the local bus's design.
The existing IBM PC or AT buses are examples of an expanded local bus. An
expanded local bus is a local bus with extensions that provide a set of
generalized signals. These additional signals offer a general architecture that
is easy to interface with. Since they use many of the processor's signals,
expanded local buses are still processor-specific. For example, the IBM PC and
AT buses are designed around the Intel 80x8x microprocessor architecture, and
they have problems accommodating large memory expansions. The PC is limited to
1 megabyte of RAM (the 640Kbyte limit is imposed by the layout of the PC BIOS),
and the AT to 16 megabytes.
Unlike local buses, system buses are designed to maximize hardware
subsystem-to-subsystem transfers. System buses offer a general protocol, or
transfer method, for system CPUs or peripherals to interchange data. This is
accomplished by treating the bus as a resource. To get control of the resource,
a peripheral or processor must request its use formally, in competition with
others. With this general approach, you can add peripherals, special functions,
and even full computer subsystems easily to a system bus. The bus integrates
hardware, cards, and subsystems into one smoothly running machine, much as an
operating system integrates applications programs. The more general the
integrating mechanism, the easier it is to add functionality and avoid
obsolescence. Moreover, these buses are processor-independent. For example,
NuBus defines a generalized address space that requires no processor-specific
signals for peripheral or I/O accesses.
Overview of the Buses
Table 1: A comparison of the two buses. Not all bus signals are included.
The new IBM Micro Channel has evolved from the earlier PC and AT buses. Like
them, it is a CPU or local bus once removed. As a local bus, it optimizes the
host CPU-to-memory bandwidth, using a special transfer method termed "matched
memory cycles," which I'll discuss later. It is also a system bus, in that it
is treated as a system resource.
Taking an opposite tack, the Apple NuBus is a full system bus. It is
independent of the Macintosh II's host processor; in fact, in the Mac II the
motherboard is treated as a NuBus slot.
Both buses create a memory-mapped system. Each card or hardware entity is
addressed within this bus address space. The 16-bit PS/2 systems, the Models 50
and 60, address a 24-bit space, or 16 megabytes; the PS/2 32-bit Model 80 and
the Macintosh II NuBus address a full 32-bit space, or 4 gigabytes. Like its
predecessors, the Micro Channel also has a 64K-byte I/O space
Each bus entity can be defined as a master or a slave. A master entity can
request and get control of the bus. A master must own the bus to send or
receive data from another target entity on the bus, which can be another master
or a slave unit. A bus slave unit cannot own the bus, but it can request
service through an interrupt signal to one of the bus masters.
The masters contend for ownership of the bus resource via an arbitration
protocol, which I'll describe later. Both buses allow multiple masters.
However, only the Apple NuBus provides mechanisms for true multiprocessing: bus
and resource locks. Bus locking allows a processor to lock a bus for exclusive
access. With resource locking, a shared resource, such as RAM on a card with
its own local processor, is locked so that the local processor can't access it.
Both types of locks are necessary to prevent one processor from interfering
with or corrupting memory that another processor is using.
While the IBM Micro Channel does permit multiple masters, there is not much
to be gained in going to multiple host processors. This is because the Micro
Channel is also an extension of the CPU bus. Processor memory operations tie up
the Micro Channel, making the bus a bottleneck for the concurrent operation of
two host processors. It should be noted that IBM, for efficiency, allows the
host processor to access system motherboard memory without passing through the
Micro Channel bus.
Also hindering multiprocessing on the Micro Channel is the absence of any
direct provisions for bus or resource locking, although the 80386 in the IBM
PS/2 Model 80 has hardware for bus locking. While not intended for true
host-level multiprocessing, the Micro Channel does offer a general interface
for drop-in co-processing. The PS/2 host processors can be easily supplemented
by powerful co-processors, such as array and floating point processors, or AI
compute engines.
Timing the Critical Element
As a logic designer once said, "There are three important aspects of a
digital design that must be carefully monitored: timing, timing, and timing."
This is still true, especially for computer and bus designs. Difficulties
usually start when one block of logic has to talk to another block, especially
if they each rely on different clock signals. This requires that the signals be
synchronized to be passed from one logic block to another. A transmitting
signal from a flip-flop strobed with one clock must be picked up and strobed
into a receiving flip-flop using a second clock. The two clocks, transmitting
and receiving, are asynchronous; no fixed relationship exists between them.
Thus, it can take one receiving clock period to synch up to the transmitting
data.
Buses, like logic, define synchronous or asynchronous interactions. In a
synchronous bus, all interactions are defined in terms of a fixed bus clock or
cycle. The bus clock edges define when data is valid and when to strobe it.
Moreover, all transactions are in multiples of these bus cycles. The Apple
NuBus is a synchronous bus.
Instead of relying on a fixed clock, an asynchronous bus is controlled by
handshaking signals. A command signal is sent to a target adapter or card that
responds with an acknowledge signal upon completion of a data transfer. All bus
timing is dependent on the signals themselves. The IBM Micro Channel is an
asynchronous bus, although it supports certain synchronous transfers.
Both the IBM Micro Channel and the Apple NuBus pass a common clock through
the bus to minimize the synchronization problem among bus entities. However,
there is a clock mismatch between the Macintosh II's local bus and NuBus,
requiring synchronization before a transfer can occur.
The Macintosh II's 68020 runs with a 15.7-MHz clock, while NuBus runs with a
10-MHz clock. Synchronization delays between these bus clocks is minimized by
using high-frequency clock signals. The NuBus 10-MHz clock is divided down from
a 40-MHz crystal; the 68020 15.7-MHz clock is divided down from a 31.4-MHz
crystal. The cost of clock synchronization is thus held to one clock period,
either 25 or 31.5 nanoseconds. Clock synchronization is accomplished through
the application-specific integrated circuit (ASIC) - the "GLU" custom gate
array on the Mac II motherboard-and the NuBus timing control logic.
Synching up between the bus processes (i.e., bus reads or writes) also
exacts a time penalty. The requesting bus must wait for the other bus to
complete its current transaction cycle before it can attempt a transfer. All
NuBus operations are defined with respect to its 10-MHz system clock. This
clock has a 25 percent duty cycle: It is false (or high) for 75 ns and true (or
low) for 25 ns.
Normally, a transfer from NuBus to the local bus takes a full 68020
instruction cycle (about 400 to 500 ns) to synch up. Going the other way, a
Macintosh II request can take a typical NuBus transaction of 2 bus cycles
(about 200 ns) to synch. It must be noted that this type of delay is not out of
the ordinary; it is the time penalty paid by the communications protocol
between the CPU bus and the system bus.
The IBM Micro Channel is an asynchronous bus, and all operations are gauged
by the transmitted and returned signals. A common 14.3-MHz clock, OSC, is
provided on the bus, eliminating the problem of signal synching. Moreover, a
delayed signal will be picked up by the next clock, providing a built-in safety
net for bus operations.
Bus to Bus
To distinguish between the two sets of bus signals, I'll stick to each bus's
naming conventions. NuBus active low signals are labeled as signal_name*, while
IBM uses its own convention for labeling an active low signal:
-signal_name.
The NuBus is a simple and elegant bus that matches Apple's minimalist
approach toward hardware. The NuBus has only 51 signals, including two parity
signals not used by Apple. The IBM Micro Channel has 77 and 111 signals for the
16- and 32-bit versions, respectively. All Micro Channel signals are TTL-logic
compatible. Table 1 compares signals between the NuBus and the Micro Channel,
and you can see a great deal of similarity between the two buses. The
arbitration and utility signals almost match.
But there are differences. NuBus is multiplexed, sharing data and address on
common lines, while the Micro Channel is non-multiplexed, providing lines for
both address and data. The IBM Micro Channel defines a number of discrete
interrupts (-IRQ 3-7, 9-12, and 14-15) that can be shared among the boards. The
Apple implementation, on the other hand, defines an interrupt (NMRQ*) per slot
that is fed separately into the Macintosh II interrupt logic for
processing.
The Micro Channel has a number of signals for coordinating asynchronous
handshakes: The signals -ADL, -CMD, and -MMC CMD provide the basic bus
handshake edges. Hardware signals are also used to delineate bus sizing (-BE0
through -BE3), 32-bit operation (-CD DS 32(n)), and 24-bit addressing (MADE
24). See the text box "Micro Channel Timing" on page 85 for more information on
the bus cycles.
Figure A: A Micro Channel Timing
A special set of signals (-MMC, -MMCR, and -MMR CMD) is used in matched
memory cycles to ensure fast CPU-to-memory accesses for the 80386. A matched
memory cycle is started by the target slave returning an -MMCR request signal
after being addressed by the system CPU. The 80386 responds by driving the
faster -MMCR CMD handshake signal instead of the -CMD during a bus cycle.
Matched memory cycles provide a bus read transaction in three clocks at 16 MHz,
or 187.5 ns, while standard cycles using the -CMD handshake signal run four or
more system clocks for a minimum of 250 ns. Matched memory cycles can be run
with both 16- and 32-bit channel devices.
In contrast, the NuBus synchronous operations are relatively simple,
requiring no special signals or exception processing. NuBus timing, however, is
more stringent than the Micro Channel's, fitting sending and strobing of
signals and data within 75 ns in the 100-ns clock cycle. See the text box
"Apple NuBus Timing" below for more details on NuBus bus cycles.
Figure B: Apple NuBus Timing
NuBus defines a byte/word structure that matches the Intel 80x8x addressing
schemes (byte order 0, 1, 2, 3), not the Macintosh's 68020 scheme (byte order
3, 2, 1, 0). The bus transceivers are wired to map the data from NuBus order
into the Macintosh byte order. Bus sizing is handled automatically; the bus
handles byte (8 bits), half-word (16 bits) and word (32-bits) sizes.
The NuBus specification defines a block, or burst mode, that can move up to
sixteen 32-bit words in a transaction, but Apple has not implemented it in the
Mac II NuBus design. IBM, however, has implemented a burst mode in the Micro
Channel in conjunction with direct memory access. This DMA burst capability
allows large blocks of data to be moved while minimizing bus overhead. In fact,
each peripheral on the channel can be viewed as a DMA channel.
When accessed by the DMA controller, a card can assert -BURST, guaranteeing
bus ownership for block transfers. Thereafter, data is transferred using only
the -CMD signal to define data valid for both the read and write stages. The
block transfer ends when the card deasserts the -BURST line for the last cycle.
For predefined transfers, the DMA controller marks the last cycle by asserting
the terminal count line (-TC).
A DMA controller can transfer 64K bytes of data between a peripheral and
memory, the same as in an IBM PC. The PS/2 DMA controller can handle 24-bit
read and write addresses, unlike the PC's 20-bit address limit. Unfortunately,
this DMA capability is limited to transfers of 8- or 16-bit data.
Bus Address Space
Both the NuBus and the Micro Channel map bus addresses into a full bus
address space that includes system memory and ROM, setup ROM, and device buffer
space. Analogous to a CPU bus, these buses provide access to locations in that
space.
The IBM implementation maps into a 16-megabyte or a 4-gigabyte address
space. The bus address space is the same as the CPU address space. In this
respect, the Micro Channel acts as a local CPU bus. The system board RAM,
either 512K bytes or 640K bytes, starts at 00000 hexadecimal. The 128K-byte
video RAM and channel ROM are mapped into the lower address pages. Topping off
the memory space at E0000h through FFFFFh is the 128K bytes of system board ROM
or RAM, depending upon how the computer's resources have been allocated. RAM
memory mappings above address FFFFFh are managed in 1-megabyte chunks. See
Figure 1 for a memory map of an IBM PS/2 Model 80. Bits in a memory-encoding
register and a split-address register determine how and where memory will be
allocated.
Figure 1: The PS/2 Model 80 Memory Map
The memory arrangement is determined by the contents of the memory-encoding
and split-address registers. SBR is system board RAM; CR is channel RAM. System
board RAM and channel RAM are allocated in 1-megabyte chunks above address
FFFFFh, with the exception of the split-system RAM. The system ROM at addresses
E0000 through FFFFF is a copy of the system ROM at addresses FFFE0000 through
FFFFFFFF.
Bus memory space for Apple NuBus implementation doesn't match the Macintosh
II's 68020 processor address space. The upper one-sixteenth, or 256 megabytes,
of the NuBus 4-gigabyte address space is called the slot space. This slot space
is divided into 16 sections, one for each NuBus slot, and each slot owns 16
megabytes of the space. The top of each slot address space is reserved for a
slot-declaration ROM that is accessed at that address. The slot a card occupies
on NuBus determines its slot identification, which in turn determines its
arbitration level and its location in the slot address space.
NuBus defines 16 slots, but the Macintosh II provides six. The six slots
have IDs of 9h through Eh. Slot 0 is the Mac II motherboard, and slot F (which
does not have a physical slot) is reserved. One slot becomes the video buffer
for the machine, depending upon which slot the video card is placed in. Slots 1
through 8 are unused, because no room exists in the 24-bit address space for
them. For this reason, the existing slots are limited to 1 megabyte of slot
space instead of 16 megabytes.
Apple's implementation of NuBus allows a slot to own a "superslot" space of
256 megabytes, as well as its 16-megabyte slot space at the top of NuBus
memory. We won't discuss superslots further, since they aren't accessible by
the Mac II, although you should note that other cards on NuBus could use these
areas. See Figure 2 for a detailed look at the Macintosh II memory map and its
arrangement in the NuBus address space.
Figure 2: The Macintosh II Memory Map
The 24-bit address space for the Macintosh II starts at 0h with 8 megabytes
of RAM, followed by 1 megabyte of ROM, then 6 megabytes of slot space, and
topped by a 1 -megabyte region of memory-mapped I/O devices. The Mac II's
24-bit address space is mapped into the 32-bit NuBus address space by placing
the RAM, ROM, and I/O areas at the bottom of the NuBus address space. However,
from the NuBus side, the Mac II's ROM appears at addresses F0800000h to
F0FFFFFFh, and the I/O area maps to F0000000h through F07FFFFFh.
Under this scheme, the maximum RAM that can be accessed on the local bus is
8 megabytes, using 1-megabyte single in-line memory modules (SIMMs). The Mac
II's motherboard RAM can be expanded to 128 megabytes if and when
higher-density SIMMs are available. However, you can add more RAM to the system
through the NuBus slots, and vendors are now supplying NuBus memory cards.
The Macintosh II is currently restricted to 24-bit addressing or 16
megabytes when running with the current operating system. An Apple Unix
implementation (A/UX) is in the works that will handle 32-bit addressing and
requires a memory-management unit for virtual-memory processing.
Bus Ownership
Both buses use arbitration to allocate ownership of the bus to a single
master when several masters request use of the bus. Arbitration typically takes
place concurrently with bus transactions on both buses, but the Micro Channel
allows a system configuration that restricts arbitration to non-concurrent
operation.
NuBus arbitrations take two full bus cycles, or 200 ns, to select the next
bus owner. On the Micro Channel, arbitrations typically take 300 ns.
Each bus uses distributed arbitration to select the next bus owner; that is,
logic on each card outputs the arbitration level on four arbitration lines
(either ARB0* through ARB3* , or -ARB0 through -ARB3) and determines the winner
of each arbitration contest based on the signals on these lines. The
arbitration level is determined in NuBus by the card's slot ID, with 0 being
the lowest priority and Fh being the highest. For the Micro Channel, the
arbitration level is stored on the card when it is configured into the system.
The highest priority a card can have is level 0, and the lowest is Fh. See
Table 2 for a comparison of the arbitration levels. The Micro Channel also has
a Central Arbitration Control Point, which is some logic on the PS/2
motherboard, that controls the start and winner of an arbitration contest.
Table 2: Priority Levels and Device Assignments
The priority levels are programmed into Micro Channel cards when they
are configured into the system; NuBus priorities depend upon the slot the
card is in.
To compete for ownership, the master asserts its request line (RQST* for
NuBus, -PREEMPT for Micro Channel). For the Micro Channel, the Central
Arbitration Control Point drives the ARB/-GNT line to the arbitrate state,
allowing the arbitration contest to begin. Each master then places its
arbitration level onto the 4-bit arbitration bus. If a competing master has
output a higher level, the master will cease to compete for ownership for the
next bus transaction. It will, however, hold its asserted request line to
compete for the following bus transaction. On NuBus, at this point, the winner
of the contest owns the bus. On the Micro Channel, the Central Arbitration
Control Point lowers the ARB/-GNT line to the -GNT state, allowing the winner
to own the bus.
Both buses ensure fairness by preventing a higher-priority-level card or
channel from continuously withholding ownership of the bus from lower-priority
level entities. Card or channel logic prevents the card just serviced from
requesting bus ownership until all pending requests are honored. In a sense,
there are no arbitration priority levels for NuBus cards, since the NuBus
strictly enforces fair bus access. However, for special cases, a channel can be
configured on the Micro Channel without fairness to ensure continued ownership
of the bus.
The NuBus has explicit mechanisms for continued bus and resource ownership.
Using an attention cycle (START* and ACK* both asserted), a master can request
continuing bus ownership. It can also request a resource lock. A resource such
as a memory card can be locked, denying access to any other master.
Both locks are extremely useful for multiprocessing; they allow a processor
to do an uninterrupted test and set, as well as control access to a critical
resource. For example, the Macintosh II motherboard uses bus locking to lock
out the NuBus for critical local processing, including disk transfers and
interrupt•processing.
Card Configuration
Both the IBM Micro Channel and the Apple NuBus define high-level mechanisms
to integrate cards or devices into the bus system. This eliminates the need for
jumpers or switches to set either a card's interrupt level or its address
space, which is the cause of a lot of bus problems on typical microcomputer
systems.
The Micro Channel's Programmable Option Select (POS) eliminates switches
from the system board and adapters by replacing them with programmable
registers. Automatic configuration routines store the POS data into a
battery-powered CMOS memory for system configuration and operations. The
configuration utilities rely on adapter description files that contain the
configuration data for a card. Configuration files define system operation
including system memory maps, video-processing options, and the individual
adapter configurations.
At boot-up, the PS/2 Model 80 first validates the contents of the POS memory
by examining a check character stored there. If the memory passes this test,
the system then selects a card using the -CD SETUP lines. The card responds
with its ID number. The system then loads the appropriate configuration data
from CMOS memory into the card, as determined by the card's ID. This data sets
the card's arbitration level and fairness, the address range of the card's I/O
ROM, and the I/O address range. Cards that fail to configure properly are
disabled by the system.
The Macintosh II relies on a slot manager to configure and maintain NuBus
cards. Each card is required to have a special declaration ROM that holds the
card specific configuration information. Information in the declaration ROM
includes byte lanes (which bytes of the NuBus data path are used), a test
pattern, a revision level, a ROM cyclic redundancy check for validating the
contents of the declaration ROM, and a resource directory.
The resource directory points to various resource lists, such as the device
icon, the device boot record, and the driver directory, which in turn points to
blocks of code for the driver. The slot manager reads the declaration code at
boot-up to configure the card into the system and installs any drivers or
interrupt routines into system memory. The slot manager can also recognize a
card as a bootable device and transfer control to the card when the system
starts up: A card that fails to configure properly will be ignored, or a system
error is posted.
A Future with a Past
As you can see, both buses break new ground to optimize bus performance and
minimize the user's effort to add a new card to the system. However, these
buses must also deal with their past: providing compatibility with the existing
market of software and hardware.
IBM faced the dilemma of maintaining compatibility with existing AT bus
cards and limiting bus throughput to about 8 MHz, or redesigning the bus to
optimize throughput at the expense of hardware compatibility. Looking toward a
future of higher-speed processors and computing needs that require the handling
of vast amounts of data, IBM chose to redesign the bus. However, the Micro
Channel is, in a sense, still a CPU bus; throughput is optimized, since few bus
clocks are lost synchronizing dissimilar components in the system. Its
asynchronous nature allows future cards, operating at those higher speeds, to
be installed with little to no change to the PS/2 system, while bus operations
on NuBus are bound to its 10-MHz clock.
However, since the Micro Channel is a CPU bus, it's difficult to allow for
multiple processors on the bus without interfering with the 80386's operation.
NuBus, being a system bus, readily allows other processors to operate on it.
Cards on NuBus can communicate and share data with one another without
interfering with operations on the Mac II's local bus. In fact, AST Research
offers a NuBus card that is essentially an IBM PC AT that runs independently in
the Macintosh II but can share data with the 68020 CPU when necessary. Finally,
the slot manager in the Mac II allows a NuBus card to be a boot device. You
could drop a NuBus card with the next-generation CPU into a Mac II and let it
take control of the machine-the ultimate in hardware expandability.
Both machines still have some of their past built into them. A look at the
memory maps shows that both systems were designed to be compatible with their
current operating systems, while providing a gateway to the next generation of
software. The Macintosh II is the first machine in the Macintosh line to have
slots, so Apple at least did not have to confront the problem of bus
compatibility. But there's a certain irony in the fact that Apple must migrate
from a 24-bit to a 32-bit operating system, similar to what IBM faces in the
move to OS/2.
Ciro Cornejo is an engineer with AST Research (2121 Alton Ave., Irvine,
CA 92714). He was born in Chile, and his interests are nature, computers, math,
and physics. Raymond Lee, a technical advisor at AST Research, is interested in
computer architecture.
|