.EQ delim $$ define ddd % ,~"..."^,~ % define pprime % sup prime % define ab % (a,^b) % define as % (a,^s) % define bs % (b,^s) % define lib % {scrL bar} sub i % define lip % {scrL sub i sup prime }% define 2md % 2 sup {d^-^2} % define o2 % left ( 1 over 2 right ) % define cn2 % {left ceiling n over 2 right ceiling} % define fn2 % {left floor n over 2 right floor} % .EN .am DE .sp .5 .ls 2 .. .ds HF 3 3 3 3 3 3 2 .ds HP 10 10 10 10 10 10 10 .nr Hu 7 .nr HF 3 3 3 3 3 3 2 .nr HP 10 10 10 10 10 10 10 .PH "" .DS 2 .B MINIMAL-DISTANCE ROUTING FOR KYKLOS II .R .DE .sp .DS .TS center, tab(:); c c c. .I D.\ Z. Du:F.\ K. Hwang & A.\ M. Odlyzko:Y.\ J. Zhang .R MIT:AT&T Bell Laboratories:University of California Cambridge, Mass.:Murray Hill, NJ:Berkeley, CA .TE .DE .sp .DS 2 .I ABSTRACT .DE .sp .nr Pt 1 .ls 2 .P We propose a new routing strategy for the KYKLOS\ II multiprocessor interconnection network which achieves minimum distance for the path between any two processors. For KYKLOS\ II with $2 sup n$ processors, the average distance is shorter than those of previous routing strategies by approximately $2^log sub 2^n$. The traffic density, a measure of traffic concentration, is comparable or better than previous strategies for up to two thousand processors, but is soundly beaten by a strategy proposed by Jenevein and Menezes for larger\ $n$. .ls 1 .bp .DS 2 .B MINIMAL-DISTANCE ROUTING FOR KYKLOS II .R .DE .sp .DS .TS center, tab(:); c c c. .I D.\ Z. Du:F.\ K. Hwang & A.\ M. Odlyzko:Y.\ J. Zhang .R MIT:AT&T Bell Laboratories:University of California Cambridge, Mass.:Murray Hill, NJ:Berkeley, CA .TE .DE .sp .H 1 INTRODUCTION .ls 2 .P A \f2double binary tree\f1 (DBT) of order $n$ is a graph consisting of two copies, $T sub 1$ and $T sub 2$, of an \%$n$-level complete binary tree $T$ with the $N~=~2 sup n$ terminal nodes of $T sub 1$ one-to-one mapped and identified to the $2 sup n$ terminal nodes of $T sub 2$. Different mappings result in different DBTs. The DBTs have been proposed for multiprocessor interconnection structures where processors are located at those nodes which are obtained by merging the terminal nodes of $T sub 1$ and $T sub 2$. We will label the $2 sup n$ processors by the $2 sup n$ binary sequences of length $n$. The identity mapping results in the DBT as studied by Bentley and Kung\ [1], and also by Imai, Tateizumi, Yoshida and Fukumura\ [3] (called KYKLOS\ I by Jenevein and Menezes). .pn 2 .PH "''- % -''" The inverse mapping, i.e., mapping the node $( a sub 1 ddd a sub n )$ of $T sub 1$ to the node $( a sub n ddd a sub 1 )$ of $T sub 2$, results in KYKLOS\ II, a multiprocessor interconnection structure which Jenevein and Menezes have shown to have many nice properties in several articles\ \%[4-7]. In\ [5] they proposed two routing strategies, the \%$M2$-routing and the \%$H2$-routing, and suggested that $H2 $ has shorter average distance between processors, though it can have larger distances for certain pairs of processors. More importantly, they showed that the traffic density for $H2$ grows on the order of $O(N sup 1.5 )$ versus $O(N sup 2 )$ for $M2$ and most other trees. .P In this paper we propose a new routing strategy $D2$ for KYKLOS\ II and prove that it has the shortest distance for every pair of processors. We show that its average distance is shorter than that of $M2$ and $H2$ by approximately $2^log sub 2^n$ and its traffic density is comparable to that of $H sub 2$ for up to one thousand processors but worse off for larger\ $N$. .H 1 "THE \f2D2\f3 ROUTING STRATEGY\f1" Let $a~=~(a sub 1 ddd a sub n )$ and $b~=~(b sub 1 ddd b sub n )$ be two processors in KYKLOS\ II. We will let $d sub S ab$ denote the distance between $a$ and $b$ in $T$ under the routing strategy $S$ and let $d sub S sup 1 ab (d sub S sup 2 ab )$ denote the part of $d sub S ab$ in $T sub 1 ( T sub 2 )$. Let $d sup 1 ab$ $(d sup 2 ab )$ denote the distance in $T sub 1 (T sub 2 )$ alone. By a \f2run\f1 we mean a sequence $I$ of consecutive indices such that $a sub i~=~b sub i$ if $i~member~I$ but $a sub i~!=~b sub i$ if $i$ is the index (if any) immediately preceding or succeeding the sequence. Let $scrL sub i$ denote the length of the run containing index $i$; if $a sub i~!=~b sub i$, then we define $scrL sub i$ to be zero. The following lemma is well known\ [5] and easily verifiable. .HU "Lemma\ 1." $d sup 1 ab~=~2(n~-~scrL sub 1 )$, $d sup 2 ab~=~2(n~-~scrL sub n )$. .P We now describe $D2$. Let $L ab~=~(i~+~1 ddd i~+~k )$ be a longest run with length $scrL ab~=~k$. Then $D2$ chooses one of the two following paths randomly. The first path routes $a$ to the processor $s sub 1~=~(a sub 1 ddd a sub i+k ,^b sub i+k+1 ddd b sub n )$ through $T sub 1$ and then routes $s sub 1$ to $b$ through $T sub 2$. The second path routes $a$ to the processor $s sub 2~=~( b sub 1 ddd b sub i+k ,^a sub i+k+1 ddd a sub n )$ through $T sub 2$ and then routes $s sub 2$ to $b$ through $T sub 1$. It is easily verified that these two paths are node-disjoint, except when $i~=~0$ or $n~-~k$, then the two paths overlap. By Lemma\ 1 both paths have length $2(n~-~k)$. We now prove that there is no shorter path. Define $d ab~=~min from s~d sub S ab$. .HU "Theorem\ 1." $d ab~=~d sub D2 ab~=~2(n~-~scrL ab )$. .HU Proof: An equivalent statement of Theorem\ 1 is that if $d ab~=~2d$, then $scrL ab~=~n~-~d$. We prove this equivalent statement by induction on $d$. It is trivally true for $d~=~0$. For general $d$ let $p$ denote a shortest path between $a$ and $b$. If $p$ contains no other processor, then $p$ uses only one of the two trees $T sub 1$ and $T sub 2$. Without loss of generality assume that $p$ lies completely in $T sub 1$. Then the length of $p$, which by assumption is $2d$, equals $d sup 1 ab$ by Lemma\ 1. Therefore $scrL ab$ is at least $n~-~d$, the length of the first run. On the other hand, $scrL ab$ is at most $n~-~d$ for otherwise the $D2$ routing will find a path of length shorter than $2(n~-~d)$, contrary to our assumption. Hence $scrL ab~=~n~-~d$. .P Next we consider the case that $p$ contains a processor $s~=~(c sub 1 ddd c sub n )$. Note that .DS 3 .EQ d as~+~d bs~=~d ab~=~2d~. .EN .DE Therefore .DS 3 .EQ max^ "{" d as ,~d bs "}"~<~2d~. .EN .DE By the inductive assumption .DS 3 .EQ scrL as~=~n~-~d as^, .EN .sp .EQ scrL bs~=~n~-~d bs^. .EN .DE Now .DS 3 .EQ scrL ab~mark >=~| L as~cap~L bs | .EN .sp .EQ lineup >=~scrL as~+~scrL bs~-~n .EN .sp .EQ lineup =~n~-~d as~+~n~-~d bs~-~n~=~n~-~d ab~. .EN .DE As before, we know $scrL ab~<=~n~-~d ab$; hence equality holds. .HU Corollary. For each given processor there exists only one other processor which is away from it by the maximum distance $2n$. .H 1 "A COMPARISON OF ROUTING STRATEGIES" The $M2$ routing connects $a$ to $b$ by choosing the shorter of the two paths, one lies completely in $T sub 1$ and the other lies completely in $T sub 2$. By Lemma\ 1 .DS 3 .EQ d sub {M2} ab~mark =~min^"{" d sup 1 ab ,^d sup 2 ab "}" .EN .sp .EQ lineup =~2(n~-~max^"{" scrL sub 1 ,^ scrL sub n "}" )~. .EN .DE .P Since KYKLOS is processor-symmetric, we need only to study the distances of all processors to processor\ 0. Let $P sub r (x)$ denote the number of processors with distance $x$ to processor\ 0 under the routing strategy $r$. Menezes and Jenevein\ [4] showed that .DS 3 .EQ P sub M2 (2d)~mark =~2 sup d~~~~~~~~~~~~~~~~~~~~~~~~~~~\ka if~1~<=~d~<=~fn2^, .EN .sp .ta \nau .EQ lineup =~2 sup d~-~left floor 3~cdot~2 sup {2d^-^n^-^2} right floor~ \t if~2~<=~d~<=~n~. .EN .DE In particular, $2 sup n-2$ processors have the maximum distance $2n$ to processor\ 0. The average distance for even $n$ is .DS 3 .EQ 1 over {2 sup n}~ left [ mark sum from d=1 to n~2d~cdot~2 sup d~-~2 sup d~-~sum from {n/2^+^1} to n~ 2d^(3~cdot~2 sup {2d^-^n^-^2} ) right ] .EN .sp .EQ lineup =~1 over {2 sup n}~ left [ 4(n2 sup n~-~2 sup n~+~1)~-~left ( 2n2 sup n~-~10 over 3^2 sup n~+~n~+~13 over 3 right ) right ] .EN .sp .EQ lineup =~{2 sup n}~-~10 over 3~+~{n~+~13/3} over {2 sup n}~. .EN .DE For odd $n$ the last term is replaced by $(2n~+~5/6)/2 sup n$. .P The $H2$ routing connects $a$ to $b$ using only the subgraph of $T$ induced by the nodes from level $i$ of $T sub 1$ to level $n~-~i$ of $T sub 2$ (the processors are at level\ 0 of both trees). Furthermore, the path can contain at most one other processor. Therefore .DS 3 .EQ d sub H2 ab~=~2(n~-~lib ) ~~roman where~lib~=~scrL sub i~~roman if~~ scrL sub i~>=~1~~roman and~~lib~=~scrL sub i+1~~ roman if~~scrL sub i~=~0~. .EN .DE .P Since $P sub H2 (2d)$ and the average distance of processors have not been given explicitly, we derive them in the following. Without loss of generality, assume that $i~<=~n~-~i$. Note that for $lib~<~i$ there are $lib~+~1$ ways of obtaining $lib$, for $i~<=~lib~<~n~-~i$ there are $i~+~1$ way and for $n~-~i~<=~lib$ there are $n~-~lib~+~1$ ways. Furthermore, if the run contains neither index 1\ nor index $n$, then $lib~+~2$ bits are fixed; if the run contains either one, then $scrL sub i~+~1$ bits are fixed. Thus we have .DS 3 .EQ P sub H2 (2d)~mark =~(d~+~3)~2 sup d-2~~~~~~~~~~\ka if~ 1~<=~d~<=~i .EN .sp .ta \nau .EQ lineup =~(i~+~2)^2 sup d-2 \t if~i~<~d~<=~n~-~i .EN .sp .EQ lineup =~(n~-~d~+~1)^2 sup d-2 \t if~n~-~i~<~d~<=~n~. .EN .DE Again, $2 sup n-2$ processors have the maximum distance $2n$ from processor\ 0. The average distance can be more easily computed by computing $d sub H2 sup 1$ and $d sub H2 sup 2$. Let $scrL sub i sup prime$ denote the length of $L sub i$ truncated at index $i$. Then $d sub H2 sup 1~=~i~-~scrL sub i sup prime$. Since there are $2 sup {i^-^scrL sub i-1 sup prime}$ paths of length exactly $lip$ for $lip~<~i$ and one path of length $i$, the average length is .DS 3 .EQ Ed sub H2 sup 2~==~E sub i~mark =~2 sup -i~sum from {lip ^=^p} to i-1~ 2(i~-~lip )~2 sup {i^-^lip^-^1}~=~ i~sum from {lip^=^0} to i-1~o2~sup { lip }~-~ 1 over 2~sum from {lip^=^1} to i-1~lip~ o2~lip~-~1 .EN .sp .EQ lineup =~i~cdot~{1~-~o2 sup i} over {1~-~1 over 2}~-~ 1 over 2~cdot~ {1~-~i o2 sup i-2~+~(i~-~1) o2 sup i} over { left ( 1~-~1 over 2 right ) sup 2} .EN .sp .EQ lineup =~2i~-~i^o2 sup i-1~-~2~+~i o2 sup i-2~-~(i~-~1) o2 sup i-1 .EN .sp .EQ lineup =~2i~-~2~+~o2 sup i-1~~~~~~ roman for~~1~<=~i~<=~n~-~i~. .EN .DE Therefore, the average distance is .DS 3 .EQ Ed sub H2 sup 1~+~Ed sub H2 sup 2~=~E sub i~+~E sub n-i~=~ 2n~-~4~+~o2 sup i-1~+~o2 sup n-i-1~~~roman for~~ 1~<=~i~<=~n~-~i~. .EN .DE The minimum average distance occurs when $i~=~fn2$ and equals .DS 3 .EQ 2n~-~4~+~o2 sup {left floor n over 2 right floor^ - 1} ~+~o2 sup {left ceiling n over 2 right ceiling^-^1}~. .EN .DE Note that this distance is shorter than the average distance of $M2$ only by a constant $2 over 3$. .P Since the length of a longest run is certainly no less than that of the run $L sub i$, we have .DS 3 .EQ d sub D2 ab~<=~min^"{" d sub M2 ab ,^d sub H2 ab "}"~. .EN .DE Note that $scrL (0,^b)$ is simply the length of the longest zero-run in the $n$-sequence $b$. If we average $scrL (0,^b)$ over all $b$ including $0$, then the average equals the expected length of the longest zero-run which was given by Guibas and Odlyzko\ [2] to be $log sub 2^n~-~3/2~+~gamma / log^2~+~v( log sub 2 ^n)~+~O(( log^n) sup 3 /n )$ where $gamma$ is the Euler constant ($=~.577 ... )$ and $v$ is a nonconstant continuous periodic function with period\ 1 and mean\ 0. By eliminating $scrL (0,^0)$ and using Theorem\ 1 we obtain .HU "Theorem\ 2." The average distance of two processors in KYKLOS\ II under $D2$ routing is .DS 3 .EQ {2 sup n} over {2 sup n~-~1}~cdot~ 2 left ( n~-~log sub 2 ^n~+~3 over 2~-~ gamma over {log^2}~+~ v( log sub 2^n)~+~O left ( {( log ^n) sup 3} over n right ) right )~. .EN .DE .P Hence the average distance of $D2$ is shorter than that of $M2$ and $H2$ by approximately $2^log sub 2 n$. .H 1 "TRAFFIC DENSITY" Let $F sub D2 (k)$ denote the frequency that the level-$k$ links (links connecting level-$k$ nodes and level $(k~+~1)$ nodes) of $T sub 1$ are traveled considering all $left ( cpile {n above 2} right )$ paths under the $D2$ routing. By symmetry, the corresponding frequency for any given level-$k$ link $T sub D2 (k)$ is $F sub D2 (k) / 2 sup n-k$. Define the traffic density of $D2$ as .DS 3 .EQ max from {1^<=^k^<=^n}~T sub D2 (k)~. .EN .DE Then the traffic density is a measure of the concentration of traffic under a routing strategy. Jenevein and Menezes\ [5] showed that the traffic density for $M2$ is $N sup 2 /64$ and occurs at level $n~-~1$, while the traffic density for $H2$ is $N sup 1.5 /2$ for even $n$ and $N sup 1.5 / sqrt 2$ for odd $n$ and occurs at level $cn2$. Thus a major attraction of the $H2$ routing is the reduction of traffic density from order $N sup 2$ to order $N sup 1.5$. .P Though we are unable to solve for the $D2$ traffic density since we do not have the probability distribution $P sub D2 (2d)$, we can still compute the $D2$ traffic density directly for small $n$ by the following method. By symmetry we need only study the path from processor zero to all other processors and also the traffic density in $T sub 1$. For a fixed $n$ we enumerate the set $S sub n$ of all the binary sequences of length $n$ except the one containing all zeros. For each sequence we find its longest run of zeros (if there are more than one, select one randomly) and counts the number of remaining bits after the run. For the sequence of all ones we define the count to be zero or $n$ with equal probability. A count $k$ signifies that the highest node in $T sub 1$ that the path reaches is at level $k$. Let $g sub D2 (k)$ denote the expected number of sequences in $S sub n$ with count $k$, ``expected'', since $k$ is not fixed when there are more than one longest run. Since any path with count $k$ must go through two links at each level $i$ below $k$, we have .DS 3 .EQ F sub D2 (k)~=~2^sum from i=k+1 to n~g sub D2^(i)~~~~ for~k~=~0 ddd n~-~1^, .EN .DE from which $T sub D2 (k)$ and the traffic density can be easily computed. .P In the following table we show the traffic density and the level it occurs of $D2$ for $3~<=~n~<=~10$ and compare it with $H2$. .DS .TS center, tab(:); c c c c c c c c c. $n$:3:4:5:6:7:8:9:10 .sp $D2$-density:8:24:71:213:648:2106:6879:23402 .sp level:1 or 2:2:3:4:5:6:6:7 .sp $H2$-density:32:32:256:256:2048:2048:16384:16384 .sp level:2:2:3:3:4:4:5:5 .TE .DE .P Thus we see that for $n~<=~10$, the traffic density of $D2$ is comparable to $H2$ for even $n$ and quite a bit better for odd $n$. For $n~>=~12$ discrepancy of $D2$ and $H2$ widens as is clear from the following table. .DS .TS center, tab(:); c c c c c c. $n$:12:14:16:18:20 .sp $D2$-density:$2.8~times~10 sup 5$:$3.4~times~10 sup 6$:$4.3~times~10 sup 7$:$5.4~times~10 sup 8$:$7.1~times~10 sup 9$ .sp level:9:11:13:15:17 .sp $H2$-density:$1.3~times~10 sup 5$:$1.0~times~10 sup 6$:$8.0~times~10 sup 6$:$6.5~times~10 sup 7$:$5.3~times~10 sup 8$ .sp level:6:7:8:9:10 .TE .DE .P For $n$ large the probability of $scrL^(0,^b)~=~i$ can be approximated by $(e sup {-n/2 sup i+1}~-~e sup {-n/2 sup i} )$ by using an asymptotic estimate of the probability that a random binary $n$-sequence contains no run of length $i$ or more\ [2]. For given $i$ each position in the sequence from $n~-~i$ to $n~-~1$ then has approximately equal probability of ending a longest zero-run while position $n$ has about twice as much probability. Ignoring this difference, then we can approximate $g sub D2 (i)$, $1~<=~i~<=~n$, by .DS 3 .EQ sum from j=0 to n-i^(e sup {-n/2 sup j+1}~-~e sup {-n/2 sup j} )~ 1 over {n~-~j}~. .EN .DE However, it is not easy to solve for max\ $T sub D2 (k)$. .H 1 CONCLUSION We propose a routing strategy $D2$ for KYKLOS\ II which assures a shortest path between any two processors and is shorter by approximately 2\^ln\ $n$ than $H2$. The traffic density for $D2$ is comparable to $H2$ for up to two thousand processors, in fact, significantly better when $n~<=~9$ is odd. However, $H2$ has much better traffic density characteristic for large\ $n$. .bp .PH "" .DS 2 .B REFERENCES .DE .sp .RL .LI J.\ L. Bentley and H.\ T. Kung, A tree machine for searching problems, Proc.\ 1979 Inter. Conf. Paral. Proc., pp.\ \%257-266. .LI L.\ J. Guibas and A.\ M. Odlyzko, Long repetitive patterns in random sequences, Z.\ Wahrscheinlichkeitstheorie verw. Gebiete 53 (1980), 241-262. .LI M.\ Imai, Y.\ Tateizumi, Y.\ Yoshida and T.\ Fukumura, The architecture and efficiency of a DON: a combinatorial problem oriented multicomputer system, $4 sup roman th$ Inter. Conf. Dist. Comput. Systems, pp. \%174-182. .LI R.\ M. Jenevein and B.\ L. Menezes, Mathematical foundations for KYKLOS, Kyklos Tech. Report\ #1, Mar. 1985. .LI R.\ M. Jenevein and B.\ L. Menezes, KYKLOS: low tide high flow, Proc. $6 sup roman th$ Inter. Conf. Dist. Comput. Systems, pp.\ 8-15. .LI B.\ L. Menezes and R.\ M. Jenevein, KYKLOS: a linear growth fault-tolerant interconnection network, Proc.\ 1985 Inter. Conf. Paral. Proc., pp.\ \%498-502. .LI B.\ L. Menezes, R.\ M. Jenevein and M.\ Malek, Reliability analysis of the KYKLOS interconnection networks, Proc. $6 sup roman th$ Intern. Conf. Dist. Comput. Systems, pp.\ \%46-53. .LE