# Polynomial Testing of Packet Switching Networks 

JYH-CHARN LIU, student member, ieee, and KANG G. SHIN, senior member, ieee


#### Abstract

A functional testing method called polynomial testing is proposed to test packet switching networks (PSN's) used in multiprocessor systems. For the purpose of concreteness, we focus on applying the method to packet switching multistage interconnection networks (PMIN's). A multiple stuck-at (MSA) fault model is developed first, and then faults are diagnosed at two different levels: network level and switch level. The former uses each processor as a tester and can test part of the network concurrently with the normal operations on the remaining part of the network. On the other hand, the latter uses switches in the network as testers and is inherently an autonomous testing method. To facilitate the network level testing, the routing dynamic in a PMIN is eliminated by synchronizing switch operations. The network is then decomposed into routes, each of which is tested after transforming it into a polynomial calculator. For switch level testing, a built-in tester (BIT) is embedded into each switch's structure to provide self-testing capabilities. Network level testing is distributed and suitable for concurrent testing, whereas switch level testing is off-line with a small testing time.


Index Terms-Built-in tester, concurrent testing, linear feedback shift register, multistage interconnection network, packet switching, polynomial generator, polynomial testing, stuck-at routing fault, switch self-testing.

## I. Introduction

DESPITE the continuing improvement in semiconductor device speed, use of multiple processors and memories is an attractive alternative to meet ever-increasing needs of computing speed and reliability. Interconnection networks are one of the most important components of such multiprocessor systems and are made feasible by the advancement in VLSI technology. Since VLSI technology greatly degrades testability, an interconnection network must have a structure that is easily testable.

There are two well-known switching methods for interconnection networks: circuit switching and packet switching. To distinguish these two methods, a path and a route for a source-destination pair are defined as follows. A path is a physically-established communication medium between the source and destination to transfer a request/data. A route is a logical path which can transfer a request from a source to its

[^0]destination without total dedication to it; resources on a route are time-shared among several packets. In a circuit switching network (CSN), the path from a source to its destination is physically set up a priori and dedicated to a request until the request is completely serviced. By contrast, no complete physical path is established a priori for a request in a packet switching network (PSN). A packet switching multistage interconnection network (PMIN) is composed of a large number of links and switches with buffers. Each PMIN switch is essentially an $r \times r$ crossbar, in which a queue is placed at each input port to store packets. A request/message is decomposed into several packets, each of which is independently transferred through an available route.

Many PSN's have an undesirable effect called the routing dynamic: the order of arrival of packets at the destination may be different from the order of their transmission from the source. Although PMIN's can be designed not to have the routing dynamic, the routing dynamic will be considered in our testing method to provide better versatility. Clearly, a PMIN with the routing dynamic is an asynchronous sequential machine. Although a sequential machine can be fully tested with a checking sequence derived from its state-transition table [1], no feasible checking sequence seems to be derivable for large scale asynchronous sequential machines like PMIN's. Functional testing is an alternative to prove the correctness of some of the machine's functions within a finite time period.

Several researchers have proposed functional testing procedures for specific networks. Error control codes are popular for on-line fault detection [2], [3]. A comprehensive method for diagnosing the baseline CSN's with $2 \times 2$ switches was introduced by Feng and Wu [4]. A simplified version of the fault model in [4] and the corresponding testing strategy can be found in [5]. Davis et al. proposed some fault location techniques for distributed routing control networks [6]. Lee and Shen modeled a CSN using $2 \times 2$ switches as an ILA [7]. Low-order switches, e.g., $2 \times 2$ switches, can be completely tested with a constant number of patterns. Agrawal and Leu used the dynamic full accessibility of MIN's to test their connectivity [8]. Several high-level testing strategies for general PMIN's have also been studied [9]-[15], most of which are adaptive procedures requiring human assistance.

Most existing methods are centralized and off-line, i.e., the whole network is tested off-line by one tester. Since there are $N / r \log _{r} N$ switches in a PMIN, the complexity of the network testing problem is $O\left(N / r \log _{r} N\right)$. Centralized testing methods are usually very inefficient for large networks, because the problem to be handled by the tester grows exponentially with the size of the network. To improve testing efficiency, we
propose a two-level testing strategy: network level and switch level testing. ${ }^{1}$ In the network level, every processor can serve as a tester to test part of the network; thus, there are $N$ testers for the network. Assuming that testers are homogeneous, the complexity of the testing problem in each tester is reduced to $O\left(1 / r \log _{r} N\right)$. In the switch level testing, switches are used as testers and designed to have autonomous testing capability [17]. In other words, the complexity of the testing problem in each tester is independent of the network size and is fixed. The characteristics of the network level testing are that it can test the network concurrently, but may have a lower fault coverage than the switch level testing. On the other hand, the switch level testing is an off-line method with a small testing time but has high fault coverage.

The network level testing is based on the topology and functions of the network, because processors must cooperate to test the network. To eliminate the routing dynamic, network operations are first synchronized. Then, an $N \times N$ blocking network is decomposed into $N^{2}$ routes, $N T=\left\{\mathrm{RT}_{i j} \mid 1 \leq i, j\right.$ $\leq N\}$, where $\mathrm{RT}_{i j}$ is the route from source $i$ to destination $j$. $\mathrm{RUT}_{i j}$ is the route $\mathrm{RT}_{i j}$ under test, and the testing processors are the processors connected to the route under test (RUT). In the network level testing, faults in $\mathrm{RUT}_{i j}$ are tested without stopping the normal operations on $N T-\left\{\mathrm{RUT}_{i j}\right\}$, where $\operatorname{RUT}_{i j} \in N T$ and $\left\{\operatorname{RUT}_{i j}\right\} \neq N T$. To test a route without interrupting, or being interrupted by, normal operations, the testing processors should be able to lock/unlock the RUT. A RUT can be locked by activating the busy signals of the switches on the RUT. Locking a route prevents unexpected packets from entering the route. As shown in Fig. 1, a route can be viewed as a cascaded shift register array. The register array can then be easily modified into divisors, multipliers, or other similar structures for polynomial testing.

In the switch level testing, each switch is a tester and switches are assumed to be homogeneous. Thus, the logic structure, instead of its topology, of the network is the main concern. To obtain high fault coverage with a small testing time, each switch is designed to have self-testing capabilities. A switch is composed of buffers, a routing control unit (RCU), and output ports consisting of multiplexers-demultiplexers (MUDEX's). Since thorough testing of the RCU may require an intractable testing length, an on-line checker is proposed to detect malfunctions in the RCU. For the rest of the network, queues are first self-tested by polynomial generation and comparison. If the queues are fault-free, they are then used to generate test patterns for links and MUDEX's. The testing responses of switches at one stage are verified at the next stage.

The rest of this paper consists of four sections. Section II gives a brief review of the polynomial operations necessary for our testing method. Section III introduces the network fault models which are to be tested with operations on polynomials. Testable designs and the corresponding network and switch level testing are presented in Section IV. The paper concludes with Section V.

[^1]

Fig. 1. A baseline PMIN with switch permutation $E_{0}$ and the corresponding cascaded shift register arrays. (a) A baseline PMIN with switch permutation $E_{0}$. (b) The corresponding cascaded shift register array of the PMIN.

## II. Principles of Polynomial Testing

Basic polynomial operations and their implementations are briefly discussed below. Use of the polynomial ring $G F(2)[x]$ is well-known for error control codes [18]. Only those properties useful for testing PSN's will be introduced below for completeness.
Definition 1: A polynomial $P_{b}(x)=\sum_{i=0}^{n} b_{i} x^{i}$ in $G F(2)[x]$ is said to be a bit polynomial if each of its coefficients is a bit, i.e., $b_{i} \in\{0,1\}, \forall 0 \leq i \leq n$. A word polynomial is the one whose coefficients are words instead of bits, i.e., $P_{w}(x)=\sum_{i=0}^{n} w_{i} x^{i}$, where for every $i \in I_{n}=\{0$, $1, \cdots, n\}, w_{i}=$ ONE or ZERO, and ONE is a $b$-bit vector of arbitrary pattern and ZERO $=\overline{\mathrm{ONE}}$, i.e., ZERO is bitwise complemented to ONE. Thus, any two words with maximum Hamming distance can be used as ONE and ZERO, respectively.

For notational convenience, let $W_{n}(x)$ denote a polynomial $\sum_{i=0}^{n} c_{i} x^{i}, c_{i}=1$ or ONE, $\forall i \in I_{n} . \bar{P}(x)=\sum_{i=0}^{n} \bar{c}_{i} x^{i}=P(x)$ $\oplus W_{n}(x)$ is the complement of $P(x)$, and the symbol " $\oplus$ " represents the addition in $G F(2)$. Unless otherwise specified, we will use the term "polynomial" to represent both bit and word polynomials. The mechanisms to manipulate polynomials are called their calculators. The contents of a calculator before operating on its input are called the initial state, which
will always be assumed, for clarity of presentation, to be all zeros. A calculator with the zero initial state is called an inert linear machine [19]. When a word polynomial operation is applied to a faulty circuit, the closure property of $G F(2)[x]$ may not hold. However, when a word polynomial is applied to a nonfaulty circuit, the resulting polynomial belongs to $G F(2)[x]$. Calculators are more hardware efficient if ONE and ZERO are composed of all 1 's and 0 's, respectively, because for each operation every bit will require an identical circuit.

## A. Operations on Polynomials

A periodic polynomial with period $p$ is the series $\sum_{i=1}^{\infty} c_{i} x^{i}$ where $c_{i}=c_{i+p}, \forall i \in I$, and $I$ is the set of integers. It can be generated by a linear (or nonlinear) feedback shift register (LFSR) called a polynomial generator (PG). Registers in a PG can be implemented by different types of flip-flops, and apparently different test patterns are needed for different implementations. However, as shown in Appendix A, at most two inputs are needed to detect faults in a master-slave SR flip-flop. Since the network level testing deals with the network topology, we will consider only the input and output stuck-at faults of registers, i.e., not the stuck-at faults inside registers. However, the same test patterns can test all the faults in those registers implemented with the master-slave flip-flops shown in Appendix A.

Two polynomials $P_{1}(x)=\sum_{i=0}^{n} c_{1, i} x^{i}$ and $P_{2}(x)=\sum_{i=0}^{n}$ $c_{2, i} x^{i}$, are equal iff $c_{1, i}=c_{2, i}, \forall i \in I_{n}$. Two polynomials can be compared for equality by xor gates. The following operations are useful for our discussion.

Addition and Boolean: Let $\left\{P_{j}(x)=\sum_{i=0}^{n} c_{j, i} x^{i}\right\}_{j=1}^{k}$ be $k$ polynomials in $G F(2)[x] . P_{3}(x)$ is the addition of $P_{1}(x)$ and $P_{2}(x)$, denoted by $P_{3}(x)=\sum_{i=0}^{n} c_{3, i} x^{i}=P_{1}(x) \oplus P_{2}(x)$, if for each $i \in I_{n} c_{3, i}=c_{1, i} \oplus c_{2, i}$. Addition can be implemented with xOR gates. If a Boolean operation $\Delta$ is applied to $P_{1}(x)$, $P_{2}(x), \cdots P_{k}(x)$, the resulting polynomial $P(x)=\sum_{i=0}^{n} c_{i} x^{i}$ is calculated by $c_{i}=c_{i}^{1} \Delta c_{i}^{2} \Delta \cdots \Delta c_{i}^{k}, \forall i \in I_{n}$, and $\Delta$ is a bitwise operation when $c_{i}$ is a word. Only and and or, the two most important operations, will be considered in this paper.

Division and Multiplication: Given $P(x)=\sum_{i=0}^{n} p_{i} x^{i}$ and $M(x)=\sum_{i=0}^{n} m_{i} x^{i}$ in $G F(2)[x]$, the multiplication of $M(x)$ (multiplier) to $P(x)$ (multiplicand) is $P_{3}(x)=P(x) M(x)=$ $\sum_{i=0}^{n} p_{3, i} x^{i}$, where $\forall i \in I_{n}, p_{3, i}=p_{i} m_{0} \oplus p_{i-1} m_{1} \oplus \cdots \oplus$ $p_{1} m_{i-1} \oplus p_{0} m_{i}$. On the other hand, given two polynomials $P(x)$ (dividend) and $D(x)=\sum_{i=0}^{r} d_{i} x^{i} \neq 0$ (divisor) in $G F(2)[x]$ there exist two polynomials $Q(x)$ and $R(x)$ in $G F(2)[x]$ such that $P(x)=D(x) Q(x)+R(x)$ where $R(x)$ $=0$ or $\operatorname{deg} R(x)<\operatorname{deg} D(x)$. In this process, $P(x)$ is said to be divided by $D(x)$, yielding a quotient $Q(x)$ and a remainder $R(x)$.

A bit divisor (multiplier) divides (multiplies) an input stream by a fixed bit polynomial. Similarly, a word divisor ( multiplier) performs divisions (multiplications) between two word polynomials. In a word polynomial divisor/multiplier (PDM), operands are ONE or ZERO instead of 1 or 0 . It has logic operations similar to those of a bit PDM, but special mechanisms are necessary to preserve the properties of the polynomial ring.


Fig. 2. The structure of faulty and nonfaulty multipliers and divisors. (a) Normal multiplier and divisor. (b) Two faulty multipliers.

The final contents of a PDM will henceforth be represented by $R(x)$, the input stream will be represented by $P(x)$, and the output stream by $Q(x)$. The general structures of a bit divisor and a bit multiplier are shown in Fig. 2(a). $M(x)$ and $D(x)$ in Fig. 2(a) are $1+x^{2}+x^{5}+x^{7}+x^{8}$ and $x^{8}+x^{6}+x^{3}+x$ +1 , respectively. The lowest order position is located in the input (output) port of the divisor (multiplier). There is an xor gate, denoted by $\oplus$, at the $D$-type flip-flop's (DFF's) output of stage $i$ only when $m_{i}$ or $d_{i}$ is 1 . A block $B_{i}$ is the collection of DFF's between the ( $i-1$ )th and $i$ th xor gates, counting from the lowest order position, in a PDM. Thus, a PDM is composed of a set of blocks $\left\{B_{i}\right\}$. Let the order (the number of stages) of $B_{i}$ be $r_{i}$. In the multiplier of Fig. 2(a), $r_{1}=2, r_{2}=$ $3, r_{3}=2$, and $r_{4}=1$.

Since a RUT is to be transformed into a polynomial calculator for testing, the effects of DFF's multiple stuck-at (MSA) faults on a PDM are discussed as follows. An MSA fault $f_{M}$ in a block $B_{i}$ is composed of multiple single stuck-at (SSA) faults, i.e., $f_{M}=\left\{f_{s}^{i}\right\}$, where $f_{s}^{i}$ is an SSA fault in $B_{i}$. Let $k$ be the faulty position nearest to the output port of $B_{i}$. Then, $f_{s}^{k}$ $\in f_{M}$ will block the effects of all the other SSA faults in $f_{M}$. Such an $f_{s}^{k}$ is called the leading SSA fault in $B_{i}$. There are $2 r_{i}$ possible leading faults in $B_{i}$, and, thus, there are $2 r_{i}$ distinguishable stuck-at faults in the block, where $r_{i}$ is the number of stages in $B_{i}$.

An s-a-0 FL changes the attached xor gates into null operators. Thus, for a multiplier, $M^{\prime}(x)=\sum_{i=1}^{k} m_{i}^{\prime} x^{i}$, where $m_{i}^{\prime}=0$ if the FL attached to the xor gate at $x^{i}$ is stuck at 0 , and $m_{i}^{\prime}=m_{i}$ otherwise. It is shown in Lemma 1 that multiple $\mathrm{s}-\mathrm{a}-0$ at the FL-inputs of xor gates can be tested by the impulse
polynomial. On the other hand, when the FL is $\mathrm{s}-\mathrm{a}-1, M^{\prime}(x)$ becomes $\Sigma_{i=1, i \neq m_{f}}^{\dot{k}} m_{i} x^{i}+\Sigma_{\left\{m_{f}\right\}} x^{m_{f}} W(x)$, where $m_{f}$ is the location of an xor gate whose input from the FL is fixed at 1 due to an $\mathrm{s}-\mathrm{a}-1$ fault. Thus, when an all zero input stream, i.e., $\sum_{i=1}^{k} 0 x^{i}$, is applied to the PDM, the locations of those xor gates affected by the FL $\mathrm{s}-\mathrm{a}-1$ faults can be uniquely determined by the corresponding output stream.
Lemma 1 [19]: The impulse response of a multiplier is $m_{0} m_{1} \cdots m_{n} 0 \cdots 0$, where the impulse polynomial is $P_{I}(x)=$ $10 \cdots 0$.
Clearly, an unknown multiplier can be uniquely identified by its impulse response, and the multiplication of $P_{I}(x)$ to $M(x)$ can be viewed as a discrete convolution between them.

Lemma 2: When a PDM is an inert machine and an $\mathrm{s}-\mathrm{a}-0$ fault occurs in $B_{k}$, the multiplier $M(x)=\sum_{i=0}^{n} m_{i} x^{i}$ is changed to a new multiplier $M^{\prime}(x)=\sum_{i=0}^{k-1} m_{i} x^{i}$.
Lemma 3: Let $l$ be the number of fault-free DFF's between $B_{i}$ 's output and the leading faulty DFF. Then, when the leading faulty $\mathrm{DFF}^{l}$ in $B_{k}$ is s-a-1, $Q(x)=M^{\prime}(x) P(x) \oplus$ $x^{r} W_{n}(x)$, where $r^{\prime}=l+\Sigma_{j<k} r_{j}$ and $M^{\prime}(x)$ is the new multiplier whose highest order position is located at the leading faulty DFF in $B_{k}$.

Proof: Let the input (or output) of $B_{i}$ be $I_{i}$ (or $O_{i}$ ). Then we have $I_{i-1}=O_{i} \oplus P(x)$ and $O_{i-1}=x^{r_{i-1}} I_{i-1}$. When $\mathrm{DFF}^{\prime}$ is $\mathrm{s}-\mathrm{a}-1, O_{k}=W_{n}(x)$. Thus, $I_{k-1}=P(x) \oplus x^{l} W_{n}(x)$ and $O_{k-1}=x^{{ }^{\prime} k-1}\left(P(x)+x^{l} W_{n}(x)\right)$. By induction, we can show that $Q(x)=M^{\prime}(x) P(x) \oplus x^{r^{\prime}} W_{n}(x)$.
When $x^{k}$ of a divisor is s-a-1, the output $Q(x)=P^{\prime}(x) /$ $D^{\prime}(x)$, where $P^{\prime}(x)=\left\{x^{l}\left(\sum_{i=k+1}^{r} n_{i} x^{i-k}\right) \oplus W_{l}(x)\right\}$, $\sum_{i=k+1}^{r} n_{i} x^{i-k}$ is the initial state of the divisor and $l$ is the polynomial length that is sufficient for testing, and $D^{\prime}(x)$ is the new divisor with its lowest order position at the output of $B_{k}$. Similarly, an s-a-0 fault at $x^{k}$ makes $Q(x)$ periodic, i.e., $Q(x)=x^{l}\left(\Sigma_{i=k+1}^{r} n_{i} x^{i-k}\right) / D^{\prime}(x)$, where the degree of the faulty divisor is $r_{d}=\Sigma_{j>k} r_{j}$. Note that the output $Q(x)$ is independent of the input stream. The structures of $\mathrm{s}-\mathrm{a}-0$ and $\mathrm{s}-$ a-1 multipliers are shown in Fig. 2(b). The resulting $M_{0}^{\prime}(x)$ (for $\mathrm{s}-\mathrm{a}-0$ ) and $M_{1}^{\prime}\left(\right.$ for $\mathrm{s}-\mathrm{a}-1$ ) are $1+x^{2}+x^{5}$ and $M_{0}^{\prime} \oplus x^{r}$ $W(x)$, respectively.

For testing purposes, it is assumed that every DFF on a route can be simultaneously set to ZERO by an external signal. Signature analysis examines $R(x)$ after the testing polynomial $P(x)$ is applied to a circuit under test. The final contents of each DFF must be directly read out for signature analysis. Unfortunately, this will greatly increase the number of I/O terminals of a network. Thus, signature analysis or other similar methods requiring direct access to DFF's are not followed here and interested readers are referred to other articles, such as [20].
The proposed network level testing is to diagnose the network by appropriate operations on the output stream. After the testing polynomial $P(x)$ is applied to a RUT, a fault $f_{i}$ changes $Q(x)$ into $Q_{i}(x)$, where $Q(x)$ [or $\left.Q_{i}(x)\right]$ is the correct (or faulty) output polynomial of the RUT. The procedure is then to find a testing polynomial $P(x)$ and an operation $\Theta_{f_{i}}$ such that $\Theta_{f_{i}}(P(x), Q(x))=Q_{i}(x)$. The combination of $P(x)$, its output $Q(x)$, and the operation $\Theta_{f_{i}}$ is called a testing routine for the fault $f_{i}$.

## III. Fault Models

A PSN is composed of links and switches. There are $r$ ! possible interconnection patterns within an $r \times r$ switch. There are then $(r!)^{(N / r) \log _{r} N}$ different conflict-free interconnection patterns in an $N \times N$ PMIN. Links' stuck-at faults are equivalent to stuck-at faults of the switches to which they are attached. Thus, only switch faults are considered for the network level testing. That is, link stuck-at faults are implicitly included in the switch fault models.
Permanent multiple stuck-at, delay, partial setting, blocking, merging, broadcasting, and misrouting faults are all considered in this paper. An MSA fault occurs when one or more signal lines are fixed at 0 or 1 . A delay fault occurs when the operation speed of some component(s) is slower than the specified and, thus, erroneous operations result. A partial setting fault occurs when some of the identical components in a unit do not provide the same operation as the others. A blocking fault occurs when an appropriate route within a switch cannot be established for a request. A handshake signal deadlock is an example of blocking fault. A switch has a merging (broadcasting) fault when two or more input (output) ports are connected to one output (input) port. A misrouting fault represents the case when packets are misdirected to incorrect output ports. Stuck-line faults at gate level are tested at the switch level testing.

## IV. PMIN Diagnosis

As mentioned earlier, our testing strategy is divided into two levels: network and switch levels. At each of these two levels, we present testable designs and testing methods on the basis of the polynomial operations and the fault models introduced in Sections II and III, respectively. The network is designed such that all signal lines have only two states, i.e., 1 or 0 , whether or not they are used to transfer data. The output port of a switch is a combination of multiplexers and demultiplexers (MUDEX's). A MUDEX is basically composed of and and or gates. When multiple requests are assigned to an output port, a combination of or/AND functions among the requests will take place.

## A. Network Level Diagnosis

Assume that the PMIN under test connects $N$ sources and $N$ destinations and is built with $r \times r$ switches. The number of stages in the PMIN is $k \equiv \log _{r} N$. To describe the PMIN's topology and permutation, the input (output) ports of all switches in each stage are vertically indexed. The number assigned to an input (output) port is called its global index. For each $r \times r$ switch, there is a one-to-one correspondence between the global index and the input/output port number: $f_{i}(j)=m$, where $j$ is the port number of the $i$ th switch at a stage, and $m$ is the port's global index. A link permutation $T_{i}, 1 \leq i \leq k$, is a one-to-one mapping from the output ports at stage $i-1$ to the input ports at stage $i$. On the other hand, a switch permutation $E_{m}^{i}: f_{i}(j) \rightarrow f_{i}((j+m)$ MOD $r)$ is a one-to-one mapping from input ports of a switch to its output ports, $0 \leq m \leq r-1$. For simplicity, all the switches on the RUT are assumed to have an identical permutation, i.e., $i_{1}=$
$i_{2}$ for all $E_{m}^{i_{1}}, E_{m}^{i_{2}} \in$ RUT, and $E_{m}$ will henceforth be used to denote $E_{m}^{i}$. More general cases than this can be easily derived by using the actual permutation at each stage. To allow for simultaneous diagnosis and normal operation at the network level, the testing processors should be equipped with complete information of link and switch permutations.

1) Testable Design: Links are passive components and can be treated as data paths of switches, whereas switches make all switching decisions and also contain memory elements. To make the network easily testable, switches are designed to have two operational modes: normal and testing modes.

As mentioned in the Introduction, a RUT can be viewed as a cascaded shift register array. FL and xor gates must be added to transform a 1-bit wide RUT into a bit PDM. Since links are the predominating cost factor of a PMIN, the link overhead in improving testability must be kept as small as possible. A tracer in each switch is thus proposed to minimize the width of FL. A tracer is composed of a testing pattern masker and mapper, a feedback/feedforward selector ( $F$-selector) and a modulo TWO adder, where TWO $=\{$ ONE, ZERO $\}$. The masker examines if bits of the testing pattern are identical and maps the testing pattern from ONE (ZERO) to $1(0)$ for FL. The mapper transforms $1(0)$ to ONE (ZERO) to use the adder. The $F$-selector determines the transmission direction of FL. ${ }^{2}$ An adder is necessary for each switch to form a block on a route for data path diagnosis.

Four possible operational states, $S, A, X$, and $N$, are assigned to a switch when the network is being tested. Once a switch in a RUT is in state $S$, the switch will not allow any packets, except those from the same RUT, to enter the RUT, and the operations of switches on the route are synchronized. State $S$ can be taken as a suboperation of the other states, because the tracer in the other states is activated and switch operations are synchronized. When the switch is in state $N$, only FL and the $F$-selector are activated. When a switch at stage $i$ is in state $A$, the $F$-selector blocks the FL signals from stage $i+1$, and the current switch's output is led to FL. When the switch is in state $X$, the data on FL are mapped, by the mapper, from $1(0)$ to ONE (ZERO), and the logic operation $\mathrm{BU}_{0} \leftarrow P_{\text {in }} \oplus \mathrm{FL}$ is performed at the input of the queue, where $\mathrm{BU}_{0}$ is the input of the queue and $P_{\text {in }}$ is the input packet. Fig. 3 shows these switch operations in different states. The logic diagram in Fig. 4 shows a switch design example of the network level testing.

A switch can enter/exit the testing mode by command packets. Two formats, data packets and command packets, are used to control the switch operations. A command packet is composed of routing tags and a command array $\{\mathbf{C A}(1)$, $\cdots, \mathbf{C A}(k)\}$, where $k$ is the number of stages of the network and $\mathbf{C A}(i)$ is a 2-bit command word associated with stage $i$. A switch at stage $i$ will enter states $S, A, N$, and $X$, when CA (i) $=00,11,10$, and 01 , respectively. The type of packets can be identified by a one-bit flag in each packet. As shown below, this testing method can also identify a misinterpreted command array (by a faulty switch).

[^2]

Fig. 3. Switches on a RUT and the corresponding word divisor.

Theorem 1: All misinterpreted command packets can be tested in one testing routine.

Proof: Once a RUT is transformed into a multiplier, the test pattern for misinterpreted command packets becomes an impulse polynomial. From Lemma $1, M(x)$ of the RUT can be uniquely identified.
2) Data Path Stuck-at Faults: All switches are in state $X$ when data path stuck-at faults are being tested. An SSA fault at the network level represents a stuck-at fault(s) in a single switch. But an MSA fault at the network level implies stuck-at faults in more than one switch. In a conventional approach, upon detection of a fault on some route, test patterns must be submitted from processors on different routes to locate the fault. It is shown below that the fault location with the polynomial testing is much easier than that with the conventional approach.
SSA Faults: Every switch is set to an identical permutation. When $r \times r$ switches are used, $r$ different switch permutations $\left\{E_{i} \mid 0 \leq i \leq r-1\right\}$ are necessary to test every data path within a switch. For any input port of a switch, its data paths to all the output ports are included in $\left\{E_{i} \mid 0 \leq i \leq r-1\right\}$. Thus, in these $r$ permutations every data path from each input port to every output port is tested. ${ }^{3}$ The procedure can be generalized as follows: in testing routine $m$, the switch permutation $E_{m}, 0 \leq m \leq r-1$, is performed first. Then, the connection of source $i$ to destination $j$ is specified by $j=$ $T_{k} E_{m} T_{k-1} E_{m} \cdots T_{2} E_{m} T_{1}(i)$. The special case of $r=2$ allows data path stuck-at faults to be detected in two permutations, each of which is composed of two steps [4].

Theorem 2: When a locked RUT $_{i j}$ is configured as a multiplier, an SSA fault on the data path can be located by processor $j$ in one testing routine.

Proof: The testing polynomial for the data path SSA fault is $W_{n}(x)$, where $n$ is the total length of buffers on $\operatorname{RUT}_{i j}$. As discussed earlier, $\mathrm{RUT}_{i j}$ can be expressed as $M_{i j}(x)=\sum_{i=0}^{n}$ $m_{i} x^{i}$. The output at the destination $j$ becomes $Q(x)=\sum_{l=0}^{n}$ $m_{l} x^{l} W_{n}(x)$. $Q(x)$ should then have the format of $1 \cdots 10 \cdots 01 \cdots$, where a $0(1) \rightarrow 1(0)$ transition takes place

[^3]

Fig. 4. A testable design of switches for concurrent testing.
at each position of an xOR gate on $\mathrm{RUT}_{i j}$ and the number of consecutive 1's ( 0 's) in the $i$ th block is the size of $B_{i}$. For example, the output stream of the multiplier in Fig. 2(a) is 10011100. When $M_{i j}$ changes to $M_{i j}^{\prime} \neq M_{i j}$ due to an SSA fault, there must be at least one $i$ such that $m_{i} \neq m_{i}^{\prime}, 1 \leq i \leq$ $k$, by Lemmas 2 and 3 . When the number of $0 \rightarrow 1$ transitions is $m_{f}$, the faulty switch can be located by $s_{f}=\left(\prod_{i=0}^{m_{f}} E^{-1}\right.$ $\left.T_{m_{f}-i}^{-1}\right)(j)$, where $T_{i}^{-1}$ is the inverse of permutation $T_{i}$.

MSA Faults: An MSA fault on a data path cannot be determined in one testing routine. However, the polynomial testing can be applied to a sequential repairing procedure which locates and then replaces leading faulty switches/links in each testing routine.
Theorem 3: An MSA fault on a data path can be repaired in $k$ testing routines, where $k$ is the number of stages of the network.

Proof: An MSA fault is the collection of multiple SSA faults. When the testing polynomial $W_{n}(x)$ is applied to a PDM, $Q(x)$ is uniquely determined by the type ( $\mathrm{s}-\mathrm{a}-0$ or $\mathrm{s}-\mathrm{a}-1$ ) and the location of the leading stuck-at fault. In other words, the lowest order faulty switch can be located in each testing routine, regardless of the cardinality of the multiple fault. Since there are $k$ switches on a route, at most $k$ steps are required to repair the network.
Delay Faults: A delay fault on a data path is detectable when its operational speed is at least one clock cycle slower than specified.
Theorem 4: A single delay fault of longer than one clock cycle can be located in one testing routine.

Proof: The polynomial $P_{E}^{1}(x)=\Sigma_{i=0}^{k} x^{2 i}$ can detect all delay faults. However, a polynomial $P_{E}^{m}(x)=\sum_{i=0}^{k} x^{(m+1) i}$ can be used to distinguish a $1 \rightarrow 0$ transition delay fault of $m$ clock cycles from delay faults of less than $m$ cycles. When an $m$ unit delay fault occurs and $P_{E}^{m}(x)$ is applied, the faulty switch's output becomes $W(x)$. By forming a PDM on RUT $_{i j}$, a delay fault can be located in one testing routine. A testing polynomial for $0 \rightarrow 1$ delay transitions is complemented to become $P_{E}^{m}(x)$ and the output is $\bar{W}(x)$.

Like MSA faults, a multiple delay fault composed of different delay lengths can be repaired in $k$ testing routines.
3) Routing Faults: Methods for locating routing faults are studied in this subsection. Switches are set to state $S$ when routing functions are tested.
Merging and Broadcasting Faults: Depending on the implementation details, a merging fault can be located in one testing routine when appropriate polynomials are applied. A $\Delta$-merging fault occurs when a $\Delta$ (i.e., AND or OR) operation results from the merging of two or more switch input/output ports.
Consider the effect of the or merging first. For two routes RUT $_{i_{1}}$ and RUT $_{i_{2}}$, they will topologically intersect in at most one switch when the network is not redundant.
Theorem 5: For a given permutation, a multiple ormerging fault can be located in one testing routine for both distributed and centralized routing control PMIN's.

Proof: The testing polynomial at processor $j$ is $P_{j}^{N}(x)=$ $\Sigma_{i=1}^{N} c_{i} x^{i}$, where $c_{j}=$ ONE and $c_{i}=$ ZERO, $\forall i \neq j$. First, consider the case when two RUT's are merged. The two routes from $i_{1}$ and $i_{2}$ under the given permutation intersect at most once. When the intersecting switch has an or-merging fault, and the testing polynomials $P_{i_{1}}^{N}(x)$ and $P_{i_{2}}^{N}(x)$ are applied, there will be an or operation between these two polynomials. Without loss of generality, $P_{i_{1}}^{N}(x)$ can be assumed to be merged into $P_{i_{1}}^{N}(x)$, i.e., $P_{i_{2}}^{\prime}(x)=P_{i_{1}}^{N}(x)$ or $P_{i_{2}}^{N}(x)$. Since there is no overlap of the positions containing 1 's in both $P_{i_{1}}^{N}(x)$ and $P_{i_{2}}^{N}(x)$, new information on the merging fault is added to $P_{i_{2}}^{\prime}(x)$. Applying the xor operation between $P_{i_{2}}^{\prime}(x)$ and $P_{i_{2}}^{N}(x)$ at the destination of $P_{i_{2}}^{\prime}(x)$, we get $P_{i_{2}}(x)=$ $P_{i_{2}}^{\prime}(x) \oplus P_{i_{2}}^{N}(x)$. A nonzero resulting polynomial implies that some polynomial is merged into $P_{i_{2}}^{N}(x)$. The switch with the merging fault is determined by the topology. That is, $P_{i_{1}}^{N}(x)$ merges with $P_{i_{2}}^{N}(x)$ at $S\left(i_{f}, j_{f}\right)$, where $S\left(i_{f}, j_{f}\right)$ is the $j_{f}$ th switch located at stage $i_{f}$, when $\left(j_{f}-1\right) r=\prod_{i=1}^{i f} E_{m} T_{i}\left(i_{1}\right)$ $-\Pi_{i_{i=1}}^{i^{i}} E_{m} T_{i}\left(i_{1}\right) \operatorname{MOD} r$ and $\left(j_{f}-1\right) r=\Pi_{i=1}^{i_{i}=1} E_{m} T_{i}\left(i_{2}\right)$ $-\Pi_{i=1}^{i_{j}^{i=1}} E_{m} T_{i}\left(i_{2}\right)$ MOD $r$. It is easy to see that no information will be lost when multiple mergings occur. Thus, all multiple merging faults can be determined in one testing routine.

If merging faults are assumed to be independent of the interconnection pattern, they can be located in one testing routine. Otherwise, we need $r$ ! tests to set each switch to every interconnection pattern for fault location. The and-merging fault can be diagnosed by the same method with the testing polynomial $\bar{P}_{j}^{N}(x)$.

A broadcasting fault at one input port of a switch implies a merging fault at the output port of the broadcast data path. Thus, broadcasting faults can be located by the same procedure used for testing merging faults.

Misrouting Faults: There are $r$ ! possible permutations in an $r \times r$ switch. To locate a misrouing fault, the testing polynomial $P_{i}(x)$ for source $i$ must be unique.

Theorem 6: One testing routine is sufficient to locate a multiple misrouting fault for both distributed and centralized routing control PMIN's.

Proof: The testing polynomial for merging faults can also be used for testing misrouting faults. kr! permutation calculations are required in each testing routine. Given a permutation $j=T_{k} E T_{k-1} E \cdots E T_{1}(i)$, a misrouting fault results when $E$ becomes $E^{\prime}$, where $E^{\prime} \neq E$ is a faulty permutation. The fault locating procedure is to find $E^{\prime}$ of a faulty switch. For a given processor $j$ which receives an incorrect polynomial, all possible permutations have to be calculated to find $E^{\prime}$ of the faulty switch. Since each switch has $r$ ! permutations, we need $k r$ ! inverse permutations to locate the faulty switch.
A misrouting fault may be caused by either the misdecoding of a routing tag in the RCU of a faulty switch or a stuck-at link/switch which transmits the routing tag before the routing tag is actually decoded.

Blocking Faults: As mentioned earlier, the network is designed such that there are only two logic values, i.e., 0 and 1 , in all signal lines. When a blocking fault occurs, a data path cannot be utilized, even though it is available.

Theorem 7: A blocked data path in a centralized routing control PMIN can be located in one testing routine.

The proof of this theorem is straightforward. In a centralized routing control network, a locked route can be established even when its data path is blocked. Since the output of a blocked switch is fixed at 1 or 0 , it has the same output as a stuck-at data path. It is much more difficult to locate a blocking fault in a distributed routing control network, because routing tags and data are blocked at the same time. It can be located by a binary search which requires $\log _{2} k$ testing routines.

Partial Setting Faults: When a data path is partially stuck, the testing procedures with multipliers can still be applied. Test patterns, however, must be determined by the design details of the masker and the mapper. In case of a partial fault, unaffected data bits have correct outputs but the stuck-at bit needs the same testing procedures as described above. In such a case, we have to examine a faulty bit(s) instead of a faulty word(s).
4) Pattern Generation: Test patterns are generated by pattern generators $\left\{G_{i}\right\}$ which are processors or dedicated hardware mechanisms. The cost of pattern generators is one of the most important factors for evaluating the performance of a
testing method. Only two testing patterns $W_{n}(x)$ and $\left\{P_{i}^{N}(x)\right\}$ need to be generated for the network level testing. Both patterns can be easily generated when $G_{i}$ 's are ringed through a single bit control line. Denote the input and output of the ring in $G_{i}$ by $D_{i(\text { in })}$ and $D_{i(\text { out })}$, respectively. $D_{N(\text { out })}$ is connected to $D_{1(\text { in })}$, and $D_{i(\text { out })}$ is connected to $D_{i+1(\text { (in) }}, \forall 1 \leq i \leq N-1$. To generate $\left\{P_{i}^{N}(x)\right\}$, the ring is initialized as $D_{1(\mathrm{in})}=1$, $D_{i(\mathrm{in})}=0, \forall i, i \neq 1$. Operations of $G_{i}$ at the $k$ th clock cycle are given as

$$
\begin{aligned}
& \text { OP1. } P_{i}(k)=\left\{\begin{array}{l}
\text { ONE when } D_{i(\mathrm{in})}=1 \\
\text { ZERO when } D_{i(\mathrm{in})}=0
\end{array}\right. \\
& \text { OP2. } D_{i(\mathrm{out})}(k) \leftarrow D_{i(\mathrm{in})}(k)
\end{aligned}
$$

where $P_{i}(k)$ is that the pattern generated by $G_{i}$ at the $k$ th clock cycle. The other test pattern $W_{n}(x)$ can be easily generated by the initialization $D_{i(\text { out })}=$ ONE, $\forall i \leq N$, and applying OP1 and OP2 in each pattern generator. For a given permutation, there are only $r k$ possible mergings on a route and the above testing polynomial is thus not optimal for testing or-merging faults. For testing or-merging faults, the length of the testing polynomial can be reduced to $r k$, when $P_{k}(x) \neq P_{j}(x)$ for any pair of polynomials $P_{j}(x)$ and $P_{k}(x)$ intersecting in a switch under a given permutation. However, the testing polynomial allows merging and misrouting faults to be tested simultaneously, and, thus, simplifies testing procedures. Moreover, $G_{i}$ has a very simple structure and can be easily applied to various interconnection networks.
5) Testing Complexity: It is important to consider the testing complexity of the network level testing. The length of test patterns for data path stuck-at faults and misinterpreted command packets is $k m$, where $m$ is the queue length in each switch. ${ }^{4}$ The calculation of a misinterpreted command packet is straightforward, because the coefficients of the multiplier can be identified directly from the output stream. The stuck-at1 faults at the inputs of xor gates, to which the FL are connected to, can be tested by an all zero polynomial, and its testing length is km . To test single data path stuck-at (delay) faults, we need one testing routine which is composed of at most $k$ steps of inverse permutations. At most $k$ testing routines are thus necessary to repair all multiple data path stuck-at faults, and each testing routine needs $k$ inverse permutations. Thus, a total of $k^{2}+k+2$ inverse permutations is needed for data path diagnosis.

For routing faults, the test pattern length is $N$. One testing routine is sufficient to identify all merging and broadcasting faults. To locate a merging (broadcasting) fault, two RUT's are needed at a time. Since there are $k$ switches on a RUT and each switch needs $r$ ! inverse permutations, $k^{2} r$ ! inverse permutations are required to locate a merging (broadcasting) fault. Finally, $k r$ ! inverse permutations are required to locate the misrouting faults.
The network level testing is quite general to handle various circuit implementations and locate faults without completely stopping the normal operations of the network. The testing time varies with the size of the network. Note, however, that

[^4]

Fig. 5. The structure of a $2 \times 2$ switch and a $C$-connected queue.
the network level testing may not detect all possible faults for different circuit implementations. When the network level testing fails to locate some faults, a fast off-line testing method with high fault coverage needs to be called for. The switch level testing described below meets this very need.

## B. Switch Level Testing

A switch is composed of data paths and a RCU. Data paths consist of links, queues, and MUDEX's. A pool of buffers, $\mathrm{BU}_{i}^{j}, 1 \leq i \leq m$, in a switch constitutes the $j$ th queue of the switch, where $m$ is the number of buffers within the queue. A buffer can store one $w$-bit packet. There are then at least $N w m$ $\log _{r} N$ memory bits in an $N \times N$ PMIN built with $r \times r$ switches, and a CSN is the special case of $m=0$. Let $\mathrm{BU}_{0}^{j}$ and $\mathrm{BU}_{m+1}^{j}$ denote, respectively, the input and output ports of a switch. It is shown in Fig. 5 that these buffers are cascaded, or $C$-connected, and formally described by $C N: \mathrm{BU}_{i} \rightarrow \mathrm{BU}_{i+1}$, where " $\rightarrow$ "' denotes an interconnection within a queue, called an interlink.

Different implementations of registers need different test patterns. We use random testing to test the queues. However, when specific test patterns like the one in Appendix A is needed, they are also easy to generate. In each switch, queues are tested by generation and comparison of polynomials. For the generation of a polynomial we can use the natural structure of a queue. The basic idea is to convert the queue into two PG's. A queue can be taken as a $w \times m$ matrix $M$ in which each column is a buffer of $w$ DFF's. Note that DFF's in each row $j$ (collection of the $j$ th DFF's of $m$ buffers), $1 \leq j \leq w$, of the matrix are cascaded by its natural structure. Assuming $w$ to be even, two PG 's, $\mathrm{PG}_{1}$ and $\mathrm{PG}_{2}$, are formed by properly cascading the rows of $M$.

Two symmetric PG's can be obtained by 1) horizontally halving the buffers in the queue, 2 ) connecting $M(i+1,1)$ to $M(i, m) \forall i \leq w / 2$ for $\mathrm{PG}_{1}$, and $M(i+1, m)$ to $M(i, 1), \forall i$ $\geq(w / 2)+1$ for $\left.\mathrm{PG}_{2}, 3\right)$ identically connecting registers' outputs to the feedback xor gates in $\mathrm{PG}_{1}$ and $\mathrm{PG}_{2}$, and 4) connecting the output of the xor gate outputs of $\mathrm{PG}_{1}$ and $\mathrm{PG}_{2}$ to $M(1,1)$ and $M((w / 2)+1,1)$, respectively. It is well-known
that the maximum period of the output stream of a PG can be obtained when $2^{l}-1$ is a prime number, where $l$ is the PG's length, and the PG's characteristic function is irreducible [18]. A fault is detectable when it yields different output sequences in the two PG's.

The PGs' outputs form a 1 -out-of- 2 codeword when an inverter is added to one xor gate's output. An xor gate with $n$ inputs needs $n+1$ test patterns when $n$ is odd; on the other hand, three test patterns are sufficient for an xor gate with an even number of inputs. The test patterns for the xor gate with an odd and an even number of inputs are $\{0 \cdots 0,10 \cdots 0$, $010 \cdots 0, \cdots, 0 \cdots 01\}$, and $\left\{0 \cdots 0,1 \cdots 1, I_{s}\right\}$, respectively, where $I_{s}$ is any input with an odd number of 1 's. The test patterns for the xor gate of a PG can be easily generated by setting the PG's initial state. Since every component in the PG's is tested, there is no hardcore in this design.

When two symmetric PG's are used, unidirectional stuck-at faults in a buffer cannot be detected. To solve this problem, $\mathrm{PG}_{1}$ can be modified such that the outputs $M(i, m / 2), \forall i \leq$ $w / 2$, are connected to the xor gate whose output is then connected to $M(1,(m / 2)+1)$. Although the physical interconnection of $M(1,1)$ to $M(w / 2, m)$ is different from that of $M((w / 2)+1,1)$ to $M(w, m)$, both $\mathrm{PG}_{1}$ and $\mathrm{PG}_{2}$ still have an identical structure. Such a modification can now detect the unidirectional faults mentioned above. Symmetric and asymmetric PG configurations are illustrated in Figs. 6(a) and (b), respectively.
The optimal testing length of a PG and its fault coverage are important performance parameters. Any DFF in the MSA fault model can be $\mathrm{s}-\mathrm{a}-1$, $\mathrm{s}-\mathrm{a}-0$, or fault-free. To evaluate the MSA fault coverage of the proposed method, we only need to consider the type and position of leading faulty DFF's in a block. Consider a pair of leading faulty DFF's, $s_{1}$ and $s_{2}$, which are in $x^{i}$ and $x^{j}$ positions of $\mathrm{PG}_{1}$ and $\mathrm{PG}_{2}$, respectively. The effects of faults in $s_{1}$ and $s_{2}$ can be distinguished only when they yield different outputs for at least one clock cycle. We begin with the simplest special case of the MSA fault model, i.e., the SSA fault model.

Theorem 8: All SSA faults are detectable, and the maximum testing length is $r+1$, where $r$ is the order of the PG.

Proof: An SSA fault in a PG is detectable when it generates an output different from that of the other PG. Let the initial state of the PG be $\Sigma_{i=1}^{r} n_{i} x^{i}$, where $n_{1}=1$ and $n_{i}=0$, $\forall i \neq 1$. When an $\mathrm{s}-\mathrm{a}-0$ is located at output of $x^{i}, n_{1}$ is falsely inverted at the $i$ th shift. The fault cannot be revealed during the first $i-1$ clock cycles, because the $\mathrm{s}-\mathrm{a}-0$ is the same as the preset value of a fault-free circuit. The worst case occurs when the $\mathrm{s}-\mathrm{a}-0$ is located at the output of $x^{r}$, and, thus, $r$ is the maximum testing length.

When an $\mathrm{s}-\mathrm{a}-1$ is located at the output of $x^{i}$, it will change the parity of the output immediately when it propagates to a feedback line. The worst case occurs when feedback lines emanate from $x^{1}$ and $x^{r}$, and the $\mathrm{s}-\mathrm{a}-1$ is present at the input of $x^{1}$. The output of the faulty PG is the same as the nonfaulty one until the $r+1$ th clock cycle. Thus, the maximum testing length for SSA faults is $r+1$.

To calculate the MSA fault coverage, the position and type

(a)


Fig. 6. A queue transformed into two symmetric and asymmetric PG's. (a) Symmetric PG's. (b) Asymmetric PG's.

TABLE I
MSA FAULT COVERAGES OF PG's. (a) MSA FAULT COVERAGE ( $C$ ) OF DIFFERENT NUMBERS AND LOCATIONS OF FEEDBACK LINES. (b) $k=8$ WITH THREE FEEDBACK LINES. (c) PG's TESTED TWICE BY Two FEEDBACK CONFIGURATIONS. (d) FEEDBACK LINES AT $k$, 0 , and THE PG's are Tested Twice with Two Different Initial States.

(a)

(b)

(c)

(d)
"- -"': not computed because of excessive simulation time requirements. "**": not applicable.
of leading faulty DFF's must be considered. Each DFF can be $\mathrm{s}-\mathrm{a}-1, \mathrm{~s}-\mathrm{a}-0$, or fault-free, and the number of MSA faults in a queue is $3^{m w}-1$. Due to the fault masking effect, the actual computing time is $K \Pi_{i=1}^{k}\left(2 r_{i}+1\right)^{2}$, where $r_{i}$ is the order of block $B_{i}$ and $K$ is the computing time required for each iteration. As shown in Table I, various testing strategies are simulated to examine their MSA fault coverages. The initial state of each simulation is $n_{1}=1, n_{i}=0, \forall i \neq 1$. From the simulation results, the following three conjectures are made.

Conjecture 1: The fault coverage is dominated by the number of feedback lines. It monotonically increases with the number of feedback lines. The length of a PG has little effect on the fault coverage.
Conjecture 2: For a given PG of length $r$, and $l_{f}$ feedback lines, the MSA fault coverage attains a maximum when feedback lines are located at $x^{1}, x^{2}, \cdots x^{l f^{-1}}$, and $x^{r}$.

Conjecture 3: The MSA fault coverage increases with the number of testing routines, each of which uses a different initial state. The optimal testing length for MSA faults is $r$ for a given initial state and a feedback configuration.


Fig. 7. Detected-faults/detectable-faults versus number of shifts when $r=$ 8.

For a given PG configuration and the initial state, theoretically, $2^{r}$ shifts are required to exercise all the states of the PG. However, our simulation results show that testing lengths are rarely required to be longer than the length of the PG. Although the choice of an initial state affects the fault coverage, the number and location of feedback lines are the dominating factors in the fault coverage. It is shown in Fig. 7 that about 65 percent of detectable MSA faults are immediately detected for most cases. From Theorem 8 and the above conjectures, each testing length is found to be $r+1$.

Unlike the off-line testing of data paths, a faulty RCU can be detected on-line. An $R C U$ checker is proposed to detect faults in the RCU using its output signals. An RCU has an $r$ $\log _{2} r$-bit input and an $r^{2}$-bit output. The RCU output signals are denoted by $E_{i j}, 1 \leq i, j \leq r$, where $E_{i j}=1$ if queue $j$ is connected to output port $i$, and $E_{i j}=0$ otherwise. For any fixed $k,\left\{E_{i k}\right\}$ or $\left\{E_{k i}\right\}, 1 \leq i \leq r$, forms a 1-out-of- $r$ codeword. Thus, $2 r 1$-out-of- $r$ self-checking checkers, one for each $\left\{E_{i k}\right\}$ or $\left\{E_{k i}\right\}$, are needed to detect all noncodeword outputs.

The outputs of the RCU and queues are the inputs of the MUDEX to which they are connected. The RCU and queues can be tested first using the above procedures. If they are faultfree, then the MUDEX and the links connected to the MUDEX are tested by using the RCU and queues to generate test patterns for the the MUDEX and its links. For output verification, the streams from the MUDEX's of stage $i$ are transmitted through the links and then verified at stage $i+1$ with special mechanisms.

Before we develop the test method for MUDEX's and links, it is necessary to find the test patterns of the $r \times 1$ multiplexer shown in Fig. 8, where $E_{i}$ and $D_{i}$ are the enable and data of the $i$ th input, respectively. The $r \times 1$ multiplexer is implemented by $r$ two-input AND gates and an or gate.

Lemma 4: All SSA faults in the multiplexer of Fig. 8 can be detected in $r+2$ steps.

Proof: After fault collapsing, the faults that need to be tested are 1) $\mathrm{s}-\mathrm{a}-0$ and $\mathrm{s}-\mathrm{a}-1$ primary output, i.e., output of the OR gate in Fig. 8, 2) s-a-0 $A_{i}, \forall i \leq r$ in Fig. 8, and 3) s-a-1 $D_{i}\left(E_{i}\right), \forall i \leq r$. Test patterns can be derived as follows.

$$
\begin{aligned}
& \mathrm{PT}(1): E_{i} D_{i}=10, \forall i \leq r, \\
& \mathrm{PT}(2): E_{i} D_{i}=01, \forall i \leq r,
\end{aligned}
$$



Fig. 8. The logic and functional diagrams of a multiplexer with $r$ data inputs and $r$ enable signals. (a) Logic diagram. (b) Functional diagram.
$\mathrm{PT}(3): E_{1} D_{1}=11$, and $E_{i} D_{i}=e_{i} d_{i}$ for $i \neq 1$
$\mathrm{PT}(4): E_{2} D_{2}=11$, and $E_{i} D_{i}=e_{i} d_{i}$ for $i \neq 2$

$$
\begin{array}{r}
\mathrm{PT}(r+2): E_{r} D_{r}=11, \text { and } E_{i} D_{i}=e_{i} d_{i} \text { for } i \neq r \\
\text { where } d_{i}=0 \text { or } e_{i}=0, \forall 1 \leq i \leq r
\end{array}
$$

An $r \times r$ MUDEX connects $r$ queues' outputs to $r$ links. The MUDEX can be implemented by two-level AND and OR gates, where each MUDEX's output port is basically a multiplexer. An example design of MUDEX is shown in Fig. 9 , where $E_{i j}$ is the enable signal from the RCU to route the packet at queue $j$ to output port $i$. $E_{i j}$ fans out to $w$ branches to simultaneously enable the $w$ bits of queue $j$.

Theorem 9: Any SSA fault on links or MUDEX's can be tested in $r+2$ clock cycles.

Proof: Since operations to be applied to each of the $w$ bits of a packet are identical, it is sufficient to discuss only one bit of the packet. Each output port of the 1-bit MUDEX is an $r$ $\times 1$ multiplexer, and there are a total of $r$ multiplexers in a MUDEX. Test patterns derived in Lemma 4 can be directly applied to test the MUDEX. However, it is important to minimize the test length when one selects test patterns. The proposed testing procedures are as follows. At clock cycle 1, all the RCU's outputs are set to 1 and the queue outputs to 0 . Queue outputs are fixed at 1 for the rest of the procedures. At cycle 2, all the RCU's outputs are set to 0 . During the remaining $r$ cycles, the RCU performs permutation $i \rightarrow(i+j$ - 1) MOD $r$ at cycle $j, 3 \leq j \leq r+2$. When the network uses distributed routing control, the queues for storing routing tags can be used to generate the desired routing requests to the RCU. By this permutation and the data queue setting, the $r$ multiplexers in a MUDEX are tested simultaneously. The testing procedures are shown in Fig. 10.

The MUDEX's output stream is two 0 's followed by $r 1$ 's. Since both 0 and 1 appear at each switch's output, and thus, at each link, the links can be tested without introducing any


Fig. 10. The testing procedures for a $3 \times 3$ MUDEX.
additional cost. For test verification, it should be noted that all links in the network have identical outputs. Thus, the comparison method to verify the test results of queues can be applied similarly. Without loss of generality, the number of links from each switch is assumed to be even. Half of the links are connected to the primary inputs of a fan-out-free xor tree, and the rest are connected to the primary inputs of the other
fan-out-free xNOR tree. The outputs of the two fan-out-free networks form a 1 -out-of- 2 codeword. A design example for this method is given in Fig. 11(a).

It has been shown in [21] that a linear function implemented by two-input Xor gates needs at most four test patterns. The test patterns can be recursively derived from the primary output of the xOR (xNOR) tree to the primary inputs. Assume


Fig. 11. Verification of testing response by comparison and signature analysis.
that a linear function $P_{n}$ of $n$ variables is implemented by an xor tree as in Fig. 11(a). Then $P_{n}$ can be recursively expressed by $P_{n}=x_{n} \oplus P_{n-1}$, where $x_{n}$ is the $n$th primary input (variable), and $P_{n-1}$ is the linear function implemented by the subnetwork excluding the primary output xor gate and the primary input $x_{n}$. To test the primary output gate, it is sufficient to have $P_{n-1} x_{n}=00,01,10,11$. The input stream in $x_{n}$ is then 0101, and $P_{n-1}$ should be 0011 . We want to derive a test pattern which can be easily generated, e.g., all inputs are identical, or, only one or two inputs are different from others. Thus, for $P_{n-1}=0011$ and $P_{n-1}=P_{n-2} \oplus$
$x_{n-1}$, we set $x_{n-1}=0101\left(\right.$ as $\left.x_{n}\right)$, and thus, $P_{n-2}=0110$. It can be shown by induction that $x_{i}=0101, \forall i \neq 1$, and $x_{1}=$ 0110 (0011) when the number of gates is even (odd).

It is now clear that we can eliminate the hardcore in the xor and xnor trees when their test patterns are applied. Assume that $w / 2$ links are connected to inputs $x_{1} \cdots x_{w / 2}$ of the xor tree. To test the xor (xNOR) tree, we need to add one more input $x_{0}$ to the tree, and $x_{0}$ is controlled by the BIT. Since the output stream of MUDEX testing is composed of two 0 's and $r$ 1's, the XOR (XNOR) tree can be tested simultaneously with the MUDEX's and links when $0011 \cdots(0110 \cdots)$ are simultane-
ously applied to $x_{0}$ by the BIT. It requires $w r$ xOR and XNOR gates in each switch to verify the test response. When the number of XOR (XNOR) gates is too high, the testing method can be decomposed into two phases as follows. In phase one (two), all the queues in even (odd) stage switches serve as pattern generators and those in odd (even)-stage switches serve as multiple input linear feedback shift registers (MILFSR's). To test MUDEX's and links, the outputs of the MILFSR's are compared in a way similar to the case of testing queues. Thus, the network can be tested in two phases, each phase requiring $r$ +2 clock cycles. An example design showing such a strategy is given in Fig. 11(b).

## V. Conclusion

A two-level testing strategy is developed in this paper. The network level testing uses processors as testers, and the switch level testing uses switches to test the network. The network level testing is concurrent testing and the switch level testing is off-line testing.

The first step to ease the network level testing is to synchronize network operations. Then, the network is tested with different polynomial operations. Although only a small number of functional fault models and the corresponding testing procedures are discussed, this method can be easily extended to detect other faults at the cost of additional testing times. For example, it can be used for conflict resolving testing. When two or more requests in a switch requesting the same output port to use, the RCU should grant only one. Using the testing polynomials, $P_{i}^{N}(x)$ and conflicting routing requests, testing an $r \times r$ switch can be done in $r^{r}-r!$ testing routines. Obviously, conflict resolving testing is very expensive, when $r$ is large.
Since testing polynomials can be easily generated by processors or a dedicated hardware, the proposed method can be used for both SIMD and MIMD machines. Most of network faults can be diagnosed in a decentralized manner by processors with the polynomial testing. A processor need not communicate with others except for the simultaneous submission of polynomials. Thus, when the network becomes faulty, the testing method can be used to identify, with different levels of accuracy, fault-free components, i.e., a faulty system can be reconfigured following the testing.

The costs of testable designs are measured by the extra hardware required for links and logic components in switches. When the packet width is more than one bit, one of the data links can be used as a feedforward line (but not as a feedback line). However, when a data line is to be transformed into a feedforward line, a multiplexer should be used to bypass the queue that the link is connected to. One masker, a mapper, and an adder of $w$ bits are required for each data queue. For an $r \times$ $r$ switch, at least $r^{2}$ logic gates are needed to implement the

MUDEX, and the overhead, relative to the combinational circuit of the switch to implement the tracer, is $2 / r$. It should be noted that the overhead is an upper bound, because hardware for the data buffers is not considered.

The goal of the switch level testing is to obtain high fault coverage with a small testing time. Queues are self-tested first by converting them into polynomial generators. Furthermore, test patterns for the MUDEX can also be generated by the PG's. For a $C$-connected queue, it needs $w$ extra interlinks to convert a queue into PG's. Testing the RCU by the conventional method is very inefficient. It is proposed to use $r 1$-out-of- $r$ codeword checkers to monitor the outputs of the RCU. Finally, the MUDEX's and links are tested simultaneously. Testing responses are verified by using xor and xnor trees. It is shown that the network testing time is independent of the network size and the design method can be applied to various types of switch. For the switch level testing, comparators in the switches are the predominating overhead, which is $w r / r^{2}$ $=1 / r$ for the combinational circuits of switches, and no extra links are needed.

In this paper, we have focused only on the development of testing strategies. To determine an optimal (in some sense) testing period, the tradeoff between the performance penalty and fault-detection time must be studied. This and modifications of the proposed testing method for CSN's are the subject of further inquiry.

## Appendix A

## Fault Coverage of Polynomial Testing

Fault coverage of polynomial testing may vary with different circuit implementations of the network. To obtain a concrete figure on its fault coverage at the logic level, consider an example LFSR design in Fig. 12. The basic structure of a shift register is essentially a master-slave $S / R$ flip-flop. The $i$ th gate in a flip-flop is denoted as $G_{i}$, and its output and $j$ th input (indexed from top to bottom) by $G O_{i}$ and $G I_{i}^{j}$, respectively. It should be noted that latches usually have two outputs $Q$ and $\bar{Q}$. However, since the number of links is a major concern in the network design, only the $Q$ output of the slave latch will be used, and $\bar{Q}$ is ignored. In other words, a fault is detectable only when an erroneous response can be observed at the $Q$ output of the slave latch. During normal operations, the $S$ and $R$ inputs are in the form of $d \bar{d}, 11, d \bar{d}$, $11, \cdots$, where $d \in\{0,1\}$. It should be noted that the input 11 is inserted automatically when CLK $=0(1)$ at the master (slave) latch, and the outputs of a latch are read out when $S R$ $=11.01,10$, and 11 at the $S R$ inputs are referred to as $r, s$, and $b$, respectively.

To derive its test set, the slave latch composed of $G_{7}, G_{8}$ is considered first. Using the $D$-algorithm, test patterns for faults in the latch are summarized as follows.

| faults/nodes | $G I_{7}^{1}$ | $G I_{7}^{2}$ | $G O_{7}$ | $G I_{8}^{1}$ | $G I_{8}^{2}$ | $G O_{8}$ |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| $s-\mathrm{a}-0$ | $s b$ | $s b$ | $r b$ | $r b$ | $r b$ | $s b$ |
| $\mathrm{~s}-\mathrm{a}-1$ | $s b r b$ | $s b r b$ | $s b$ | $s b$ | $r b s b$ | $r b$ |



Fig. 12. An example LFSR implemented with master-slave $S R$ latches.

Testing the master latch is more complicated, because the slave latch must be in an appropriate initial state, i.e., output, to propagate the erroneous response of the master latch to the $Q$ output of the slave latch. For example, when $(Q, \bar{Q})=(0$, $\bar{D})$ at the master latch, ${ }^{5}$ the initial state of the slave latch must be $(1,0)$ to obtain $D$ at the output of the slave latch. Otherwise, the slave latch's output is 01 , meaning that the fault is not tested. In our case, the master and slave latches have identical test patterns.

Test patterns for $G_{5}$ and $G_{6}$ are summarized below:
register. Thus, the shift register fails to perform the delay function. Occurrence of such faults in a queue implies that the length of queue is reduced. Since the polynomial testing can identify the configuration of an LFSR, all such faults can be detected.
Although the polynomial testing is developed as a functional level testing, it can clearly detect all the detectable stuck-at faults at the logic level of registers. Since only two of the 56 faults are undetectable in a register, and all other faults, i.e., stuck-at faults on the xor gates and feedforward lines, are

| faults/nodes | $G I I_{5}^{1}$ | $G I_{5}^{2}$ | $G O_{5}$ | $G I_{6}^{1}$ | $G I_{6}^{2}$ | $G O_{6}$ |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{s-a}-0$ | $s b r b$ | $s b r b$ | $s b$ | $r b s b$ | $r b s b$ | $r b$ |
| $\mathrm{~s}-\mathrm{a}-1$ | $r b$ | $r b s b$ | $s b r b$ | $s b r b$ | $s b$ |  |

It can be shown that the above pattern can test $G_{1}, G_{2}$ simultaneously. $G I_{10} \mathrm{~s}-\mathrm{a}-1$ and $G O_{10} \mathrm{~s}-\mathrm{a}-0$ can be tested by sbrb. The only undetectable faults are $G I_{10} \mathrm{~s}-\mathrm{a}-0$ and $G O_{10} \mathrm{~s}$-a1 , because the erroneous responses can only propagate to the $\bar{Q}$ output of the slave latch. $G I_{9} \mathrm{~s}-\mathrm{a}-1$ and $G O_{9} \mathrm{~s}-\mathrm{a}-0$ will block the transmission of data, and thus, sbrb is sufficient to test them. $G I_{9} \mathrm{~s}-\mathrm{a}-0$ and $G O_{9} \mathrm{~s}-\mathrm{a}-1$ faults do not cause logic faults, and thus, are not detectable by the $D$-algorithm. However, the memory of the register is lost when the above two faults occur, i.e., the latch fails to hold the data for a specific period. Assume that the high (low) period $t_{T}$ of testing clock is three times slower than the latch's transition time $t_{d}$. Then, such faults can be tested by the polynomial testing method. For example, when $G O_{9}$ is stuck-at-1, the data at the input of the register will be shifted to the slave latch's output after $2 t_{d}$. At the third $t_{d}$, the data are erroneously shifted into the next

[^5]detectable, the lower bound of the polynomial testing method is 96 percent.

## Appendix B

List of Symbols
$A, N, S, X \quad$ Four states of a switch on the route under

## $B_{i} \quad$ The $i$ th block in a PDM.

$\mathrm{BU}_{i}$
$D_{i(\text { in })}, D_{i(\text { out })}$
DFF
$E_{m}$

The $i$ th buffer in a queue.
The ring link input (output) of the $i$ th pattern generator $G_{i}$.
A $D$-type flip-flop with a single input and a single output.
Switch permutation, $E_{m}^{i+1}: f(i r+j) \rightarrow f(i r$ $+j+m$ MOD $r$ ), where $f(i r+j)$ and $f(i r+j+m$ MOD $r$ ) are the global indexes of the $j$ th input port and the $(j+m$ MOD $r$ )th output port of the $(i+1)$ th
switch, respectively, where $r$ is the switch order, and $\rightarrow$ is the interconection. $E_{m}^{i}$ is denoted as $E_{m}$ when all switches have the same permutation.

| $E_{m}^{-1}$ | The inverse of the switch permutation $E_{m}$. |
| :--- | :--- |
| $E_{i j}$ | An RCU output which enables queue $j$ to be |
|  | connected to output port $i$. |
| $G F(2)[x]$ | The polynomial ring. |
| $I, I_{n}$ | $I$ is the set of integers and $I_{n}=\{0,1,2$, |
|  | $\cdots, n\}$. |

$M \quad$ An $m \times w$ matrix representing $m$ buffers in a queue of $w$ bits each. The $i$ th row of $M$ is the $i$ th bits of all buffers and the $j$ th column of $M$ is the $j$ th buffer in the queue.
$M_{i j}(x) \quad$ The multiplier formed by the route connecting source $i$ to destination $j$.
$\begin{array}{ll}\text { MUDEX } & \text { An } r \times r \text { MUDEX is the combination of } r \\ \text { multiplexers and } r \text { demultiplexers. It is used }\end{array}$ to direct packets in a PMIN switch.
ONE, ZERO Coefficients of a word polynomial, ONE = ZERO.
$P(x), \bar{P}(x) \quad$ A polynomial and its complement.
$P_{j}^{N}(x)$
$P^{m}(x)$
$P_{E}^{m}(x) \quad P_{E}^{m}(x)=\Sigma_{i=0}^{k} x^{(m+1) i}$, the test polynomial
PDM Polynomial multiplier or divisor.
PG Polynomial generator.
PMIN Packet switching multistage interconnection network.
$Q(x) \quad$ The product of $P(x)$ and $M(x)$, or the quotient of $P(x) / D(x)$, or the output of a PDM.
$R(x) \quad$ The remainder of the operation that $D(x)$ divides $P(x)$ or the final contents of a PDM.
RCU Routing control unit.
RUT
$r_{i}$
SSA
$T_{i}$
$W_{k}(x), W(x) \quad W_{k}(x)=\sum_{i=0}^{k} x^{i} . W(x)=W_{\infty}(x)$.
$\Sigma_{i=0}^{r} n_{i} x^{i}$
$P_{j}^{N}(x)=\Sigma_{i=0}^{N} c_{i} x^{i}$, where $c_{j}=1, c_{i}=0, i$ $\neq j$, is a test pattern submitted by processor $j$. for $m$ unit delay faults.

Route under test.
The length or order of $B_{i}$.
Single stuck at fault.
The link permutation at stage $i$.
The initial state of a PDM or a PG.

## Acknowledgment

The authors are grateful to N. K. Jha for the valuable comments on the material in Section IV-C. They also want to thank M. H. Woodbury and referee A for numerous suggestions on the initial draft of this paper.

## References

[1] M. A. Breuer and A. D. Friedman, Diagnosis and Reliable Design of Digital Systems. Rockville, MD: Computer Science, 1976.
[2] W. K. Fuchs, J. A. Abraham, and K. H. Huang, "Concurrent error detection in VLSI interconnection networks," in Dig. Papers, FTCS13, 1983, pp. 309-315.
[3] J. E. Lilienkamp, D. H. Lawrie, and P. C. Yew, "A fault tolerant interconnection network using error correcting codes," in Dig. Papers, FTCS-12, 1982, pp. 123-125.
[4] T. Y. Feng and C. L. Wu, "Fault-diagnosis of a class of multistage
interconnection networks," IEEE Trans. Comput., vol. C-30, pp. 743-758, Oct. 1981.
[5] D. P. Agrawal, "Testing and fault tolerance of multistage interconnection networks," Computer, pp. 41-53, Apr. 1982.
[6] N. J. Davis IV, W. T.-Y. Hsu, and H. J. Siegel, "Fault location techniques for distributed control interconnection networks," IEEE Trans. Comput., Vol. C-34, pp. 902-910, Oct. 1985.
[7] D. C. H. Lee and J. P. Shen, "Easily-testable (N, K) shuffle/exchange networks,'" in Proc. Int. Conf. Parallel Processing, 1983, pp. 65-70.
[8] D. P. Agrawal and J.-S. Leu, "Dynamic accessibility testing and path length optimization of multistage interconnection networks," IEEE Trans. Comput., vol. C-34, pp. 255-266, Mar. 1985.
[9] V. Cherkassky, E. Opper, and M. Malek, "Reliability and fault diagnosis analysis of fault-tolerant multistage interconnection networks," in Dig. Papers, FTCS-14, 1984, pp. 246-251.
[10] M. Malek and E. Opper, 'Multiple fault diagnosis of SW-banyan networks," in Proc. FTCS-13, 1983, pp. 446-449.
[11] E. Opper and M. Malek, "Real-time diagnosis of banyan networks," in Proc. Real Time Syst. Symp., 1982, pp. 27-36.
[12] W. Y.-P. Lim, "A test strategy for packet switching networks," in Proc. Int. Conf. Parallel Processing, 1982, pp. 96-98.
[13] V. Cherkassky and E. Opper, "Fault diagnosis and permuting properties of CC-banyan networks," in Proc. Real-Time Syst. Symp., 1984, pp. 175-183.
[14] J. Y. Maeng, "Self-diagnosis of multistage network-based computer systems," in Dig. Papers, FTCS-13, 1983, pp. 324-331.
[15] S. Thanawastien and V. P. Nelson, "Diagnosis of multiple faults in shuffle/exchange networks,' in Proc. Real Time Syst. Symp., 1984, pp. 184-192.
[16] R. E. Bryant, "A switch-level model and simulator for MOS digital systems,' IEEE Trans. Comput., vol. C-33, pp. 160-177, Feb. 1984.
[17] E. J. McCluskey and S. Bozorgui-Nesbat, "Design for autonomous test," IEEE Trans. Comput., vol. C-30, pp. 866-875, Nov. 1981.
[18] S. W. Golomb, Shift Register Sequences. San Francisco, CA: Holden-Day, 1967.
[19] Z. Kohavi, Switching and Finite Automata Theory. New York: McGraw-Hill, 1978.
[20] J. E. Smith, "Measures of effectiveness of fault signature analysis," IEEE Trans. Comput., vol. C-29, pp. 510-514, June 1980.
[21] J. P. Hayes, "On realizations of boolean functions requiring a minimal or near-minimal numbers of tests," IEEE Trans. Comput., Vol. C20, pp. 1506-1513, Dec. 1971.


Jyh-Charn Liu (S'84) was born in Kaohsiung, Taiwan, on December 6, 1956. He received the B.S. and M.S. degrees in electrical engineering from the National Cheng Kung University, Tainan, Taiwan, in 1979 and 1981, respectively.

He was a system engineer of Siantek Co. Taiwan in 1983. Since 1984 he has been a Research Assistant at the University of Michigan, where he is currently pursuing the Ph.D. degree in electrical and computer engineering. His research interests include fault-tolerant computing and easily testable architectures for real-time applications.
Mr. Liu is a student member of the IEEE Computer Society.


Kang G. Shin (S'74-M'78-SM'83) received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea in 1970, and the M.S. and Ph.D. degrees in electrical engineering from Cornell University, Ithaca, NY, in 1976 and 1978, respectively.
From 1970 to 1972 he served in the Korean Army as an ROTC officer and from 1972 to 1974 he was on the research staff of the Korea Institute of Science and Technology, Seoul, working on the design of VHF/UHF communication systems. From 1978 to 1982 he was an Assistant Professor at Rensselaer Polytechnic

Institute, Troy, NY. He was also a Visiting Scientist at the U.S. Airforce Flight Dynamics Laboratory in Summer 1979 and at Bell Laboratories, Holmdel, NJ in Summer 1980. Since September 1982, he has been with the Department of Electrical Engineering and Computer Science at The University of Michigan, Ann Arbor, MI, where he is currently a Professor. He has been very active and authored/coauthored over 140 technical papers in the areas of distributed fault-tolerant real-time computing, computer architecture, and robotics and automation. As an initial phase of validation of architectures and analytic results, he and his students are currently building a 19 -node
hexagonal mesh multiprocessor at the Real-Time Computing Laboratory (RTCL), The University of Michigan.
Dr. Shin is a member of the Association for Computing Machinery, Sigma Xi, and Phi Kappa Phi. He was the Program Chairman of the 1986 IEEE RealTime Systems Symposium and is the Guest Editor of the special issue of IEEE TRANSACTIONS ON COMPUTERS on Real-Time Systems which appeared in August 1987. In 1987, he also received an Outstanding Paper Award for a paper on robot trajectory planning published in IEEE TRANSACTIONS ON AUTOMATIC CONTROL.


[^0]:    Manuscript received August 31, 1986; revised February 28, 1987. This work was supported in part by the Office of Naval Research under Contract N00014-85-K-0122 and NASA under Grants NAG-1-296 and NAG-1-492. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies.
    The authors are with the Real Time Computing Laboratory, Department of Electrical Engineering and Computer Science, the University of Michigan, Ann Arbor, MI 48109.
    IEEE Log Number 8824535.

[^1]:    ${ }^{1}$ This term should not be confused with the switch-level fault model for MOS circuits [16].

[^2]:    ${ }^{2}$ The $F$-selector can be eliminated if the RUT is to be transformed into either a multiplier or divisor, but not both.

[^3]:    ${ }^{3}$ Only $r$ permutations are needed to test a data path, although $r$ ! permutations are required to test the routing functions.

[^4]:    ${ }^{4}$ The queue lengths need not be identical.

[^5]:    ${ }^{5}$ They are inverted to $(1, D)$ before they enter the slave latch.

