.fp 4 M2 .fp 5 M6 .nr Cl 7 .nr Pt 1 .EQ delim $$ gsize 10 define dd % "..." % define sE % font 4 E % .EN .S 10 .am DE .ls 2 .sp .. .nr Hb 7 .nr Hs 7 .nr Hu 7 .ds HP 11 11 11 11 11 11 11 .ds HF 3 3 3 3 3 3 3 .TL The 10\u20\d-th Zero of the Riemann Zeta Function and 70 Million of its Neighbors .AU "A. M. Odlyzko" AMO MH 11218 7286 2C-355 .AS 1 .ls 2 .P This paper presents the results of a computation of almost 79 million consecutive zeros of the Riemann zeta function near zero number $10 sup 20$, as well as of several other large sets of high zeros. These zeros lie about $10 sup 8$ times higher than previously calculated large sets of zeros, and their computation was made possible by a fast new algorithm invented by A. Scho\*:nhage and the author. Although the implementation of this algorithm that was used is not entirely rigorous due to incomplete control of roundoff errors, it appears to be highly accurate as well as fast, and the results indicate that all the computed zeros satisfy the Riemann Hypothesis. Various statistical studies of these zeros are presented. Some of these studies provide numerical evidence about conjectures that go even beyond the Riemann Hypothesis, and relate the distribution of zeros of the zeta function to that of eigenvalues of random matrices studied extensively in physics. Other studies compare the actual behavior of the zeta function to known asymptotic results. The computations described in this paper were carried out on a Cray \%X-MP supercomputer. .ls 1 .AE .MT 4 .ls 2 .H 1 "Introduction" The $10 sup 20$-th zero of the Riemann zeta function equals .DS 2 .EQ 1/2 ~+~ i^1 5202440115920747268.6290299 "..." ~. .EN .DE It and a few of its nearest neighbors are shown in Table\ 1.1. All told, almost 79 million zeros near the $10 sup 20$-th zero were computed. These zeros lie almost $10 sup 8$ times higher than any other large sets of zeros that had been computed before. This paper reports the statistics of these and some other high zeros and describes the algorithms that made these calculations possible .P The Riemann Hypothesis (RH) has been subjected to a series of numerical investigations, starting with unpublished ones by Riemann. (See [Ed,\|Od3] for a history of these computations.) The latest result is that the RH is true for the first $1.5 times 10 sup 9$ zeros (i.e., all zeros up to height $^inf .EN .DE (the Lindelo\*:f conjecture). It would be nice to produce convincing numerical evidence about the actual size of $ zeta ( 1/2 + it ) $. However, this is hard to do. One difficulty is that near the $10 sup 20$-th zero, one has $t^approx^1.5 times 10 sup 19$, so that $t sup 1/6 ^approx^1570$, while $( log ^t) sup 2 ^approx^1950$, so it is even hard to distinguish between these two functions that have entirely different rates of growth. (Throughout the paper, $log^x$ denotes the natural logarithm of $x$.) .P Some of the data from the present computations might also be useful in other number theoretic investigations. For example, the Stark method [St] for obtaining lower bounds for imaginary quadratic number fields with small class numbers depends on knowledge of pairs of zeros of the zeta function that are very close together. (Another method for bounding class numbers, that of Montgomery and Weinberger [MW], depends on zeros of $L$-functions.) .P The final reason for the computations of this paper was to demonstrate that the new algorithm of [OS] is of practical use, and not just a theoretical curiosity. Since this algorithm is fairly complicated, this was not obvious to start with, and a large section of this paper is devoted to a description of the implementation, including various modifications that were made to the basic algorithm described in [OS]. As it turns out, the algorithm is very fast, over $ 10 sup 5 $ times faster than the older algorithms would have been near the $ 10 sup 20 $-th zero. Moreover, work on this implementation has suggested many additional modifications, described in Section\ 4.6, which can probably speed up the algorithm by another order of magnitude. .P The main sets of zeros that were computed are listed in Table\ 1.2. The entry for $N=10 sup 20$, for example, means that 78,\|893,\|234 zeros were computed, starting with zero number $10 sup 20^-^30,^769,^710$, and ending with zero number $10 sup 20^+^ 48,^123,^523$, and that all these zeros are of the form $1/2^+^i gamma$ with $gamma ^approx^1.5 times 10 sup 19$. Throughout the paper, references to the $N= 10 sup 20$ data set will denote these 78,\|893,\|234 zeros or some subset of them and similarly for the $N=10 sup 19 ,^dd$, data sets. .P The starting points for the large data sets listed in Table\ 1.2 were chosen to be near zeros of round order (such as $10 sup 20$), so as to be easy to refer to. It was thought that as far as the distribution of zeros is concerned, these intervals would behave like random ones. Another approach is to concentrate on investigating the behavior of $zeta (1/2 + i t)$ near those $t$ where the zeta function might be expected to behave in an unusual fashion (e.g., where it is very large). Some such special values of $t$ were found, and the computations that were carried out there are listed in tables\ 3.1.1 and 3.1.2. (A full explanation of the entries in these tables is given in Section\ 3.) These computations produced many values of the zeta function and of gaps between zeros that are current records. .P While the computations that are described in this paper did obtain values of zeta function zeros at much greater heights than would be feasible with older methods, they do have one fairly serious defect, namely that they are not rigorous. The validity of the values for the zeros that have been computed (and also of the assertion that all these zeros satisfy the RH) depends on the assumption that substantial cancellation among the errors due to roundoff takes place. This is due largely to the extremely large sizes of the numbers being handled, and not so much to the new algorithm, and is explained in detail in Section\ 4. At this point we only mention that the values of zeros that have been obtained are believed to be very accurate (to within $+-^10 sup -6$ or even better for $N= 10 sup 20$). This belief is based partially on the expected cancellation of errors in the computation. The strongest argument for the validity of the computations, however, comes from the fact that several large sets of zeros were computed twice, in entirely different ways. The fact that the numbers being computed were the same follows only from deep mathematical analysis, and is not obvious from the numbers being processed. That the resulting sets of values for the zeros agreed to the expected degree is a very strong argument in favor of validity of the results. These issues are discussed in greater detail in Section\ 4.5. .P The remainder of this paper is organized into three sections. Section\ 2 recalls the basic definitions and conjectures, and then presents the statistics of the large sets of zeros given in Table\ 1.2. Section\ 2 is organized into subsections on a variety of topics, such as large values of the zeta function, large and small gaps between consecutive zeros, and many others. .P Section\ 3 is devoted to the zeros listed in Table\ 3.1.1. First the statistics of these zeros and of various properties of the zeta function in those ranges are presented. Then some simultaneous Diophantine approximation algorithms (based on the Lova\*'sz lattice basis reduction algorithm [LLL]) are described, as well as the ways in which they have been used to produce the points of Table\ 3.1.1 at which the zeta function was expected to behave pathologically, and where it does exhibit unusual behavior. .P Section\ 4 describes the algorithms and computations on which this paper is based. First the basic algorithm of [OS] is briefly surveyed, and then the various modifications that have been made to it are described. (Some are very minor, while others, such as the use of band-limited function interpolation, are much more substantial.) A discussion of various possible modifications that might be utilized in the future is included (such as the replacement of the crucial rational function evaluation algorithm of [OS] by somewhat similar algorithms that have been proposed in the context of astrophysical and fluid dynamics simulations [GR1], or ways to obtain more rigorous results). There is also a large subsection on the accuracy and validity of the computations of this paper. .H 1 "Large sets of zeros: conjectures and statistics" .HU "2.0\0 Notation and definitions" .P The trivial zeros of the zeta function are $-2,^-4,^-6,^dd$. We will consider only the \f2nontrivial zeros\f1, which lie in the critical strip $0^<^roman Re (s) ^<^1$, and are customarily denoted by $rho$. Since for every nontrivial zero $rho$, $rho bar$ is also a zero, we will consider only zeros $rho$ with $roman Im ( rho ) ^>^0$. (There are no nontrivial zeros $rho$ with $roman Im ( rho ) ^=^0$.) We number these zeros $rho sub 1 ,^rho sub 2 ,^dd$ (counting each according to its multiplicity) so that $0^<^roman Im ( rho sub 1 ) ^<=^roman Im ( rho sub 2 ) ^<=^dd$. All the zeros that have been computed so far are simple and lie on the critical line, and so can be written as $rho sub n ^=^1 over 2 ^+^i gamma sub n$, $gamma sub n ^member^R sup +$, with $gamma sub 1 ^=^14.134725 dd$, $gamma sub 2 ^=^21.022039 dd$, $gamma sub 3 ^=^25.010857 dd$, etc. In many definitions throughout the paper we will be tacitly assuming that the RH holds, as otherwise those definitions might not make sense. .P Let $N(t)$ denote the number of zeros $rho$ with $0^<^roman Im ( rho ) ^<=^t$ (counted according to their multiplicity). Then it is known unconditionally [Tit2; Chapter\ 9.3] that .DS 2 .EQ (2.0.1) N(t) ~=~ t over {2 pi} ~ log ~ t over {2 pi e} ~+~ O ( log ^t) ~~~roman as ~~~t^->^inf ^. .EN .DE Since the zeros become denser as the height increases, and the average vertical spacing between zeros at height $t$ is asymptotic to $2 pi / ( log ( t/(2 pi )))$, we define the normalized spacing between consecutive zeros $1/2 + i gamma sub n$ and $1/2 + i gamma sub n+1$ to be .DS 2 .EQ (2.0.2) delta sub n ~=~ ( gamma sub n+1 ^-^ gamma sub n ) ~ {log ( gamma sub n / ( 2 pi ))} over {2 pi} ^. .EN .DE (Here we are assuming that both zeros satisfy the RH.) It then follows from (2.0.1) that the $delta sub n$ have mean value 1 in the sense that for positive integers $N$ and $M$, .DS 2 .EQ (2.0.3) sum from n=N+1 to N+M ~ delta sub n ~=~ M ~+~ O ( log (NM)) ^. .EN .DE .P For $t$ real and positive (as will be the case throughout the paper) we define .DS 2 .EQ (2.0.4) theta (t) ~=~ roman arg [ pi sup -it/2 ^GAMMA ( 1/4 ^+^ it/2 ) ] ^, .EN .DE where the argument is defined by continuous variation of $s$ in $pi sup -s/2 ^GAMMA ( s/2 )$, starting at $s=1/2$ and going up vertically. We also let .DS 2 .EQ (2.0.5) Z(t) ~=~ exp (i theta (t)) ^zeta ( 1/2 + it ) ^. .EN .DE Then it follows from the functional equation of the zeta function that $Z(t)$ is real, and sign changes of $Z(t)$ correspond to zeros of $zeta (s)$ on the critical line. Almost all computations of the zeta function on the critical line actually calculate $Z(t)$ (cf.\ Section\ 4). .P The function $theta (t)$ is monotonic increasing for $t^>=^7$. For $n^>=^-1$, we define the \%$n$-th \f2Gram point\f1 $g sub n$ to be the unique solution $>^7$ to .DS 2 .EQ (2.0.6) theta ( g sub n ) ~=~ n pi ^. .EN .DE We have $g sub -1 ^=^ 9.666 dd$, $g sub 0 ^=^ 17.845 dd$, etc. Gram points are about as dense as the zeros of $zeta (s)$ (see Section\ 2.12 for a detailed discussion), but are much more regularly distributed. In graphs, by a \f2Gram point scale\f1 we will refer to labeling Gram point $g sub n$ by $n$ (or $n-M$ for some fixed $M$ as $n$ varies). For example, Fig.\ 2.0.1 shows $Z(t)$ near zero number $10 sup 20$. Figure\ 2.0.3 shows $Z(t)$ over a somewhat wider range. .P We let .DS 2 .EQ (2.0.7) S(t)~=~ pi sup -1 ^roman arg ^zeta (1/2 + it ) ^, .EN .DE where the argument is defined by continuous variation of $s$ in $zeta (s)$, starting at $s=2$, going up vertically to $s=2+it$, and then horizontally to $s=1/2 + it$. (This definition assumes that there are no zeros $rho$ with $roman Im ( rho ) ^=^t$.) The function $S(t)$ has jump discontinuities at heights equal to zeros. We have .DS 2 .EQ (2.0.8) N(t) ~=~ 1 ~+~ pi sup -1 theta (t) ~+~ S(t) ^, .EN .DE so that (2.0.1) is actually a consequence of the asymptotic expansion of $theta (t)$ (which follows from Stirling's formula [HMF]) .DS 2 .EQ (2.0.9) theta (t) ~=~ 1 over 2 ^t ^log (t/( 2 pi e )) ~-~ pi /8 ~+~ O ( t sup -1 ) ~~~roman as ~~~ t^->^ inf .EN .DE and the bound [Tit2; Theorem\ 9.4] .DS 2 .EQ (2.0.10) | S(t) | ~=~ O ( log ^t) ~~~roman as ~~~ t^->^ inf ^. .EN .DE Since $N(t)$ is an integer, and $theta (t)$ is very smooth, (2.0.8) shows that $S(t)$ jumps at zeros and decreases at a very steady rate between zeros. Figure\ 2.0.2 shows $S(t)$ over the same range of values of $t$ as in Fig.\ 2.0.1, in the vicinity of zero number $10 sup 20$. This range represents fairly typical behavior of $S(t)$ at that height. (For very unusual behavior of $S(t)$, see Fig.\ 3.1.3.) The function $S(t)$ is of crucial importance in understanding the distribution of zeros, and sections\ 2.4,\|2.12, and 2.13 are devoted largely to its properties. .P In comparing empirical distributions of various functions, such as $S(t)$ and $delta sub n$, to their conjectured distributions, we will rely fairly extensively on comparing the moments of their distributions. The method of moments has fallen into some disrepute in statistics because of its many faults, such as lack of robustness. (For example, a single outlier in the data can have a large effect, something we will see in our data.) However, there are some good reasons for using it. One is that it is easy to apply. A more substantial one is that for many of the statistics of the zeta function, such as those of $S(t)$, or of $Z(t)$, computation of moments is currently essentially the only known tool that can be used to obtain rigorous results. In such cases moments provide the most direct way of comparing empirical distributions to theoretical results. .P If a sequence of probability measures with distribution functions $F sub n (x)$ is such that for every $k^>=^0$, the \%$k$-th moment .DS 2 .EQ mu sub n (k) ~=~ int ~ x sup k ^dF sub n (x) .EN .DE converges to $mu (k)$ as $n^->^inf$, then there is a limiting measure with distribution $F(x)$ whose \%$k$-th moment is $mu (k)$. Furthermore, if the $mu (k)$ determine their measure uniquely, and this measure has distribution function $F(x)$, then the $F sub n (x)$ converge to $F(x)$ (in the weak star sense) [Bil; pp.\ 342-353]. The $mu (k)$ determine $F(x)$ uniquely if they do not grow too fast [Bil],\|[Fel; pp.\ 227-228], so that the normal distribution, for example, is characterized by its moments. On the other hand, the log-normal distribution (distribution of $exp ( eta )$, where $eta$ is normal) is not determined uniquely by its moments [Bil],\|[Fel]. .P The standard normal distribution has the density function .DS 2 .EQ (2.0.11) f(x) ~=~ ( 2 pi ) sup {- 1/2} ~e sup {-x sup 2 /2} ^, .EN .DE and so has mean 0 and variance 1. In many cases we will be dealing with quantities (such as $S(t)$) whose known asymptotic distributions are normal, but which have variances on the order of $log^log^N$ (for zeros near zero number $N$). Since $log^log^N$ grows very slowly, it is to be expected that the observed data will have somewhat different variances, as second order terms are likely to be substantial. (For $N=10 sup 20$, $log^log^N^=^3.82976 dd$, so even an additive constant of 1 in the estimate of the variance makes a huge difference.) On the other hand, it is not too unreasonable to hope that the shape of the distribution should be close to the expected one. To carry out such a comparison, we will often use \f2scaled and translated empirical distribution\f1. If $x sub 1 ,^dd ,^x sub n$ are samples (of $delta sub m$, say, or other quantities) with mean $a$ and variance $v= sigma sup 2$ (so that $sigma$ is the \f2standard deviation\f1, or \f2rms value\f1), .DS 3 .EQ (2.0.12) a mark ~=~ 1 over n ~ sum from j=1 to n ~ x sub j ^, .EN .sp .EQ (2.0.13) v lineup ~=~ 1 over n ~ sum from j=1 to n ~ (x sub j -a ) sup 2 ^, .EN .DE then the scaled and translated values will be .DS 2 .EQ (2.0.14) x sub j sup star ~=~ (x sub j -a ) / sigma ^. .EN .DE The $x sub j sup star$ have mean 0 and variance 1. The tables will usually list the \%$k$-th moment of $x sub j sup star$ in the \%$k$-th entry, but there will be entries giving the ordinary mean $a$ and ordinary variance $v$ that will be marked $k=1 sup star$ and $k=2 sup star$, respectively. In a few cases where the mean $a$ is extremely small, we will use $x sub j sup star ^=^x sub j / sigma$. (These cases will be easy to distinguish because the scaled \%1-st moment will not be 0.) .P Throughout this paper, numbers which have ``$dd$'' at the end are truncated to the form that is shown, while those without ``$dd$'' are rounded, but the rounding is sometimes up and sometimes down. Thus, for example, $pi$ could be represented as 3.14159..., as 3.14159, or as 3.14160. The log function will always refer to the natural logarithm. References to maximal values of function $f(x)$ will usually mean the values of $f(x)$ for which $|f(x)|$ is maximal. .P Constants such as $n sub 0 ,^n sub 1 ,^dd$, will generally be different in different sections, but will be the same within a section. .H 2 "Validity of the RH and correctness of the computational results" The main question about the validity of the computations described in this paper has to do with size and cancellation of roundoff errors. This issue is discussed in detail in Section\ 4. Even if we assume that roundoff errors are small (as they seem to be), there remains some further lack of rigor. The set of zeros corresponding to $N=10 sup 12$, for example, is claimed to consist of exactly the zeros numbered $10 sup 12 - 6,^032$ to $10 sup 12 + 1,^586,^163$. Those 1,\|592,\|196 values are indeed zeros of the zeta function between Gram points of orders $10 sup 12 - 6,^034$ and $10 sup 12 + 1,^586,^162$, provided all the computational steps were correct. Given the degree of regularity in the locations of those zeros, a theorem of Turing (see [Br5; Theorem\ 3.2] for a modified and corrected version) allows us to conclude, for example, that the \%21-st through the 1,\|592,\|176-th zeros in our set are indeed zeros numbered $10 sup 12 - 6,^012$ through $10 sup 12 + 1,^586,^143$. However, this theorem does not exclude the possibility that, for example, the interval between Gram points $10 sup 12 - 6,^034$ and $10 sup 12 - 6,^014$ might contain some additional zeros. Since such additional zeros would violate either Rosser's rule (see Section\ 2.13) or even the RH, they seem unlikely to exist, and in any event would not affect most of the statistics very much, and so were assumed not to exist. .P There are some further cases of nonrigorous computations in this paper. For example, the conjectured distribution of the $delta sub n$ (see Section\ 2.2) is complicated, and (as was done in [Od2]) was computed using Van\ Buren's program [VB], with some modifications by S.\ P. Lloyd and this author. This program uses an involved combination of variational procedures and special function expansions, and no rigorous error analysis for it is known, although it appears to be very accurate (cf.\ [Od2]). .P Other examples of nonrigorous computation are presented by various piecewise linear approximations and other interpolation schemes used in the following sections. They are all thought to produce accurate results, but no proofs are available. .H 2 "Eigenvalues of random matrices and zeros" Over the last few decades, an extensive collection of results about eigenvalues of certain types of random matrices have been obtained by mathematical physicists. The aim of these investigations was to obtain insight into the distribution of energy levels in heavy nuclei, and recently their results have been applied to studies of energy levels in other kinds of many-particle systems. Some of the references for this field are [Be1, Be2, Be3, BG, BGS1, BGS2, BFFMPW, Meh, Por]. Not only are there many beautiful and mathematically rigorous results in this area, but there is also experimental evidence that these results do describe behavior of physical systems [HPB]. (It should be mentioned that due to the difficulty of the experiments, the physical data, which was obtained with a lot of effort over the span of several decades, is very sparse and of poor quality compared to the data that can be obtained for the zeta function.) .P The eigenvalue results that will be of greatest interest to us are those of the Gaussian unitary ensemble (GUE), which together with the Gaussian orthogonal ensemble (GOE) and the Gaussian symplectic ensemble (GSE) has been studied very extensively. The GUE consists of $n times n$ complex Hermitian matrices of the form $A^=^ (a sub jk )$, where $a sub jj ^=^2 sup 1/2 ^sigma sub jj$, $a sub jk = sigma sub jk + i eta sub jk$ for $j^<^k$, and $a sub jk = a bar sub kj = sigma sub kj - i eta sub kj$ for $j^>^k$, where the $sigma sub jk$ and $eta sub jk$ are independent standard normal variables. (The GOE consists of real symmetric matrices defined similarly.) The eigenvalues of these matrices are real, and it is their asymptotic distribution, as $n^->^inf$, that is of interest. If we denote the eigenvalues by $lambda sub 1 ^<=^lambda sub 2 ^<=^...^<=^lambda sub n$, then one has the \f2Wigner semi-circle law\f1: if $M(x)$ denotes the expected number of eigenvalues $<=^x$, then for all fixed real $x$, .DS 2 .EQ (2.2.1) lim from {n^->^inf} ~ n sup -1 ^M( x sqrt n ) ~=~ left { matrix { ccol { 1 over {2 pi} ^int from -2 to x ^(4-u sup 2 ) sup 1/2 du ^,~~~ above 0 from "" to "" ^, ~~~ above 1 from "" to "" ^,~~~} lcol {|x|^<^ 2 ^, above x^<=^-2 ^, above x^>=^2 ^.} } .EN .DE This distribution law applies to much more general classes of matrices than those of the GUE and related ensembles. In the case of the GUE (and also of the GOE and GSE) a further step is possible in that one can obtain very precise information about the distribution of spacings between consecutive eigenvalues. The complete distribution of eigenvalues is known, and one can derive many limit laws. To do that one normalizes the eigenvalues (basically by stretching the distance between consecutive eigenvalues $lambda ^<^ lambda sup prime$ by a factor of $(4n - lambda sup 2 ) sup 1/2 / ( 2 pi ))$ so as to make the average nearest neighbor spacing equal to 1. With this normalization, the distribution of eigenvalues looks the same everywhere (in the limit as $n^->^inf$) and one can in principle determine any desired statistic of the zeros. (Doing so in practice means evaluating a definite multidimensional integral, which is often hard, and gives rise to interesting problems.) For example, if we use $w$ to denote a normalized eigenvalue in the GUE, then one finds that for any fixed $0^<=^alpha ^<^beta ^<^ inf$, .DS 2 .EQ (2.2.2) sE ( | "{" ( w, w sup prime ) ^:~ w^<^w sup prime ^, ~~ w sup prime ^-^ w^member^[ alpha ,^beta ] "}" | ) ~wig~ int from alpha to beta ~ left ( 1^-^ left ( {sin ^pi u} over {pi u} right ) sup 2 right ) ^du .EN .DE as $n^->^inf$, where $sE (z)$ is the expectation of $z$. We say that $1^-^ ( ( sin ^pi u ) / ( pi u ) ) sup 2$ is the .I pair correlation function .R of the GUE. (The pair correlation functions of the GOE and the GSE are different.) Equation\ (2.2.2) shows, for example, that it is rare for GUE eigenvalues to be close together. If the $w$'s were obtained by choosing $n$ points independently and uniformly from the interval $[0,^n]$ and letting $n^->^inf$, the pair correlation function would be identically 1. The GUE pair correlation function in the range $0^<=^u^<=^3$ is drawn as the solid curve in Fig.\ 2.3.1, and is far from a constant. .P If $w$ is a normalized eigenvalue of the GUE, we let $w sup (k)$ denote the \%$k$-th smallest normalized eigenvalue of those that are $>^w$. Then it is known that the \%$k$-th nearest spacings $w sup (k) - w$ satisfy a distribution law; for all $0^<=^alpha ^<^beta ^<^ inf$, .DS 2 .EQ (2.2.3) roman Prob ( w sup (k) ^-^w ^member^ [ alpha ,^beta ] ) ~ wig~ int from alpha to beta ~ p( k-1,^u) du .EN .DE as $n^->^inf$. The probability densities $p(k,^u)$ (referred to as $p sub 2 (k;^u)$ in many publications, such as [CM2; Mch, MdC], where the subscript 2 denotes the GUE) are complicated functions defined in terms of linear prolate spheroidal functions. For methods of computing them, see [MdC,\|Od2]. Graphs of $p(0,^u)$ and $p(1,^u)$ are given by the solid lines in figures\ 2.3.4 and 2.3.6, respectively. Those graphs show the ``rigidity'' of the GUE; the eigenvalues repel each other and most of the time are close to the expected distance from their neighbors. For all $u^>=^0$, .DS 2 .EQ (2.2.4) 1 ~-~ left ( {sin ^pi u} over {pi u} right ) sup 2 ~=~ sum from k=0 to inf ~ p(k,^u ) ^. .EN .DE We note for future reference that the $p(k,^u)$ have the following Taylor series expansions around 0 [Meh,\|MdC]: .DS 3 .EQ (2.2.5) p(0,^u) mark ~=~ {pi sup 2} over 3 ^u sup 2 ~-~ {2 pi sup 4} over 45 ^u sup 4 ~+~ {pi sup 6} over 315 ^u sup 6 ^+^ ... ^, .EN .sp .EQ (2.2.6) p(1,^u) lineup ~=~ {pi sup 6} over 4050 ^u sup 7 ~+~ ... ^, .EN .sp .EQ (2.2.7) p(2,^u) lineup ~=~ {pi sup 12} over 5358150000 ^u sup 14 ~+~... ^. .EN .DE .P The normalized eigenvalues in the GUE have (in the limit as $n^->^inf$) a stationary distribution. This means that clusters of eigenvalues have the same distribution no matter where in the spectrum they are located. However, this distribution is not Markovian, so that the distribution of an eigenvalue depends not just on the preceding eigenvalue, but on all previous ones as well. .P The basic results about distribution of GUE eigenvalues are completely rigorous. However, they do have many gaps. One of them is that the results are obtained by averaging over the full ensemble of GUE matrices. It is conjectured that if one considers a large random GUE matrix, the distribution of its eigenvalues will be close to that of the entire ensemble with high probability. Although numerical calculations confirm this conjecture, there is no proof of it. Also, it is thought that entries of the matrix do not have to be of exactly the form specified above for the GUE result to hold. .P The main goal of this paper (and of the preceding paper [Od2]) is to test the conjecture, which will be referred to as the \f2GUE hypothesis\f1, the \f2GUE theory\f1, or simply the \f2GUE\f1, that the zeros of the zeta function behave like eigenvalues of the GUE. More precisely, it is conjectured that the $delta sub n$ behave asymptotically like $w sup (1) ^-^w$ in the GUE, so that for any $0^<=^alpha ^<^ beta ^<^ inf$, .DS 2 .EQ (2.2.8) M sup -1 | "{" n^:~ N+1^<=^n^<=^N+M ,~~delta sub n ^member^ [ alpha ,^beta ] "}" | ~wig~ int from alpha to beta ^p(0,^u) du .EN .DE as $M,^N^->^inf$ with $M$ not too small compared with $N$. Similarly, it is conjectured that .DS 2 .EQ (2.2.9) M sup -1 | "{"n^:~ N+1^<=^n^<=^N+M ,~~delta sub n + delta sub n+1 ^member^ [ alpha ,^beta ] "}" | ~wig~ int from alpha to beta ~ p(1,^u) du ^. .EN .DE More generally, the same reasoning leads one to expect that for any $k$, the empirical distribution function of $delta sub n ,^delta sub n+1 ,^dd ,^delta sub n+k$ for $N+1^<=^n^<=^N+M$ approaches the stationary process that holds for the GUE. .P The GUE hypothesis is of interest because if it is true, it might be interpreted as providing some support for the Hilbert and Po\*'lya conjectures [Be2,\|Be3,\|Mon1,\|Od3] which predict that the RH is true because the zeros of the zeta function correspond to eigenvalues of a positive linear operator. The argument is that if such an operator exists, its eigenvalues might be similar to those of a random operator (especially if, as is conjectured for the GUE, most random operators have very similar eigenvalue distributions), and a random linear operator ought to be the limit of a sequence of random matrices. .P If the GUE hypothesis were true, that would also be of interest in physics, as the zeta function could than be used as a model of quantum chaos [Be2,\|Be3]. .P The main theoretical support and inspiration for the GUE hypothesis comes from H. Montgomery's work on the pair correlation function of the zeros of the zeta function. Under the assumption of the RH, Montgomery showed [Mon1,\|Mon2] that if we define .DS 2 .EQ (2.2.10) F( alpha ,^T) ~=~ 2 pi (T^log^T) sup -1 ~sum from {pile {0^<^gamma^<=^T above 0^<^gamma sup prime ^<=^T}} ~ T sup {i alpha ( gamma - gamma sup prime )} ~ 4 over {4+( gamma - gamma sup prime ) sup 2} .EN .DE for $alpha$ and $T$ real, $T^>=^2$, then .DS 2 .EQ (2.2.11) F( alpha ,^T) ~~=~ (1+ o(1)) T sup {- 2 alpha} ^log ^T ^+^ alpha ^+^o(1) ~~~roman as~~~ T^->^inf ^, .EN .DE uniformly for $0^<=^alpha ^<=^1$. Montgomery also observed that if the primes are distributed sufficiently uniformly in arithmetic progressions, then .DS 2 .EQ (2.2.12) F( alpha ,^T ) ~=~1^+^ o(1) ~~~~roman as ~~~~T^->^inf .EN .DE uniformly for $alpha ^member^[a,^b]$, where $1^<=^a^<^b^<^inf$ are any constants. If the conjecture (2.2.12) were true, then one would find that for any $0^<^alpha ^<^beta ^<^inf$, .DS 3 .EQ N sup -1 ^| "{" (n,^k) ^:~ mark 1^<=^n^<=^N ,~ k^>=^ 0 ,~~ delta sub n ^+^ delta sub n+1 ~+~...~+~ delta sub n+k ^member^[ alpha ,^beta ] "}" | .EN .sp .5 .EQ (2.2.13) ~~ .EN .sp .5 .EQ lineup ~wig~ int from alpha to beta ~ left ( 1 ^-^ left ( {sin ^pi u} over {pi u} right ) sup 2 right ) ^du .EN .DE as $N^->^inf$. The relation (2.2.13) is known as the Montgomery pair correlation conjecture. It says that the pair correlation of the zeros of the zeta function is the same as that of the GUE. Since the pair correlations of the GOE and GSE are different, (indeed, they are even inconsistent with (2.2.11)), this leads one to expect that the zeros might behave like eigenvalues of the GUE rather than GOE or GSE, and this is the reason that only the GUE distributions were presented above. (One possible implication of this observation is that the hypothetical Hilbert-Po\*'lya operator is likely to be complex.) .P Montgomery's hypothetical result (2.2.11) and the conjectures (2.2.12) and (2.2.13) are the main theoretical evidence we have in favor of the GUE hypothesis, and the two conjectures depend on far-reaching assumptions about pseudorandom behavior of primes. Some further evidence in favor of the GUE hypothesis was provided by Ozluk [Oz1], who showed that if one considers a function similar to $F( alpha ,^T)$, but where one sums over zeros of many Dirichlet $L$-functions, then under the assumption of the Generalized Riemann Hypothesis for these $L$-functions, the analog of Montgomery's conjecture (2.2.12) is true for $1^<=^alpha ^<=^2$. Some further slight support for the GUE hypothesis is provided by new results of Ozluk [Oz2] on zeros of Dirichlet $L$-functions close to the real axis. .P Extensive numerical evidence in favor of the GUE hypothesis was presented in [Od2]. It was based largely on computed values of $gamma sub n$, with $1^<=^n^<=^10 sup 5$ and $10 sup 12 +1 ^<=^n^<=^10 sup 12 + 10 sup 5$. With some slight exceptions (such as the slight excess of very small $delta sub n$ that was mentioned in the Introduction) this evidence was in excellent agreement with the GUE hypothesis, and the degree of agreement improved dramatically as one went from the first $10 sup 5$ zeros to those near zero number $10 sup 12$. Some numerical evidence for the pair correlation conjecture for Dirichlet $L$-functions has been obtained since then by Hejhal [Hej5]. .P Various theoretical results and conjectures related to the GUE theories and the pair correlation conjecture have been obtained in recent years. Some of the references are [Be2, Be3, Be4, Fu8, Gal2, Gal3, Gal4, GM, Gol, Go2, Go3, GG, GHB, GM, HB1, Mue2]. .H 2 "General distribution of gaps between zeros" Figure\ 2.3.1 shows how well the pair correlation conjecture is satisfied. The solid line is the GUE prediction $y^=^1- (( sin ^pi x ) / ( pi x )) sup 2$. The scatterplot is based on approximately $8 times 10 sup 6$ zeros near zero number $10 sup 20$. Let .DS 3 .EQ n sub 1 mark ~=~ 10 sup 20 ~-~ 15,^409,^240, .EN .EQ n sub 2 lineup ~=~ 10 sup 20 ~-~ 13,^366,^460, .EN .EQ n sub 3 lineup ~=~ 10 sup 20 ~-~ 10,^302,^282, .EN .EQ n sub 4 lineup ~=~ 10 sup 20 ~-~ 6,^216,^711, .EN .EQ n sub 5 lineup ~=~ 10 sup 20 ~-~ 42,^778, .EN .EQ n sub 6 lineup ~=~ 10 sup 20 ~+~ 15,^316,^087, .EN .EQ n sub 7 lineup ~=~ 10 sup 20 ~+~ 46,^073,^204, .EN .EQ n sub 8 lineup ~=~ 10 sup 20 ~+~ 47,^098,^588, .EN .DE and .DS 2 .EQ V ~=~ "{" n^:~ n sub i ^<=^n^<^n sub i ^+^10 sup 6 ~~~roman {"for some"} ~~i, ~~~ 1^<=^i^<=^8 "}" ^. .EN .DE Then for each interval $I^=^[ alpha ,^beta )$ with $alpha = k/20$, $beta = alpha + 1/20$, $0^<=^k^<^60$, a star is placed at the point $x = ( alpha + beta ) /2$, $y = a sub {alpha ,^beta}$, where .DS 2 .EQ (2.3.1) a sub {alpha ,^beta} ~=~ 20 over {8 times 10 sup 6} ^left | ^"{"(n,^k) ^:~ n^member^V ,~~k^>=^0 ,^~~ delta sub n ^+^...^+^delta sub n+k ^member^ [ alpha ,^beta ) "}" right | ^. .EN .DE The solid line is the GUE prediction $y=1-(( sin ^pi x ) / ( pi x )) sup 2$. As can be seen, the agreement between the conjectured and observed values is excellent. .P Figure\ 2.3.2 presents similar data, but this time based on just $10 sup 6$ values of $n$; $n sub 9 ^<=^n^<^n sub 9 ^+^10 sup 6$, $n sub 9 = 10 sup 12 - 6,^032$. A comparison of these two graphs with figures\ 1 and 2 of [Od2] is instructive. Those figures show similar graphs, but based in each case on $10 sup 5$ zeros starting with zeros number 1 and $10 sup 12 +1$. The scatterplot of Fig.\ 2.3.2 is much smoother than that of Fig.\ 2 of [Od2], because the former is based on $10 sup 6$ instead of $10 sup 5$ samples, and so the sampling error is smaller. That same reason explains why the scatterplot of Fig.\ 2.3.1 looks smoother than that of Fig.\ 2.3.2. Even if we make allowances for the different sample sizes, though, it is clear that the agreement between empirical and predicted values improves dramatically from $N=1$ to $N= 10 sup 12$, and improves some more between $N=10 sup 12$ and $N=10 sup 20$. In all cases, the empirical data has more pronounced peaks and troughs than expected, but this effect decreases as the height increases. .P Some of the pair correlation function oscillations can be seen even for normalized spacings that exceed 3. Figure\ 2.3.3 shows a graph based, just like Fig.\ 2.3.1, on $8 times 10 sup 6$ zeros near zero number $10 sup 20$. In this case, though, the scatterplot was smoothed slightly by applying the lowess function of [BC] (an implementation of Cleveland's robust locally weighted regression [Cle]). The reason for this smoothing is that even with $8 times 10 sup 6$ zeros, each of the $a sub {alpha ,^beta}$ defined in (2.3.1) corresponds to about $4 times 10 sup 5$ counts $(n,^k)$. Therefore we can expect random sampling errors on the order of $(4 times 10 sup 5 ) sup 1/2$, which gives a variation of about $1.6 times 10 sup -3$ in the value of $a sub {alpha ,^beta}$. Given the small variation in the GUE prediction $y=1- (( sin ^pi x ) /( pi x )) sup 2$ over the range $3^<=^x^<=^5$, this random sampling error produces a rather confusing picture if the data is not smoothed. (Another, but slightly less effective way to produce a better picture is to use sampling intervals larger than 1/20. The resulting picture is very similar to that of Fig.\ 2.3.3.) .P Figure\ 2.3.3 shows that the empirical pair correlation function, even for $N=10 sup 20$, has peaks and triangles that are more pronounced than those of the conjectured distribution, at least in the range $3^<^x^<^5$. This is also true in the range $5^<^x^<^10$. .P Figures\ 2.3.4 and 2.3.5 show the distribution of the normalized spacings $delta sub n$ for $N=10 sup 12$ and $N=10 sup 20$, based on the 1,\|592,\|196 and 78,\|893,\|234 zeros, respectively, that have been computed. Thus, for example, in Fig.\ 2.3.4 a star is plotted at $x= ( alpha + beta ) /2$, $y= b sub {alpha ,^beta }$ for $alpha = k/20$, $beta = alpha + 1/20$, $0^<=^k^<=^59$, where .DS 2 .EQ (2.3.2) b sub {alpha ,^beta} ~=~ 20 over {1592195} ^left | ^ "{"n^:~ 10 sup 12 ^-^ 6032 ^<=^n^<=^10 sup 12 ^+^1586162,~~ delta sub n ^member^[ alpha ,^beta ) "}" right | ^. .EN .DE The solid lines are the GUE predictions, $y= p(0,^x)$. Similarly, figures\ 2.3.6 and 2.3.7 show the distribution of $delta sub n + delta sub n+1$. (Similar graphs based on the first $10 sup 5$ zeros are contained in [Od2].) .P The graphs show very good agreement between conjecture and numerical data, and, as was to be expected, the degree of agreement increases dramatically as one goes from $N=1$ to $N= 10 sup 12$, and then improves a bit more as one goes to $N=10 sup 20$. Moreover, the fact that the disagreement is greater for $delta sub n + delta sub n+1$ than for $delta sub n$ is to be expected, given that $S(t)$ is very small. (See Section\ 2.4 for a discussion of this.) .P A quantitative measure of the agreement between observed and conjectured distributions is shown in tables\ 2.3.1 through 2.3.3 which display moments of distributions. For each set of $M$ zeros, $K^<^n^<=^K+M$ $(M=1,^592,^196$ for $N=10 sup 12$, 78,\|893,\|234 for $N=10 sup 20$, etc.) Table\ 2.3.1 displays .DS 2 .EQ (2.3.3) (M-1) sup -1 ~ sum from {n=K+1} to {K+M-1} ~( delta sub n -1 ) sup k ^, .EN .DE while Table\ 2.3.2 shows .DS 2 .EQ (2.3.4) (M-2) sup -1 ~ sum from n=K+1 to K+M-2 ~ ( delta sub n + delta sub n+1 -2) sup k ^, .EN .DE in each case for $2^<=^k^<=^10$. (The values for $N=1$ are taken from [Od2].) Table\ 2.3.3 shows moments of $log ^delta sub n$, $delta sub n sup -1$, and $delta sub n sup -2$. In all cases the values predicted by the GUE are also shown. .P Tables\ 2.3.1 to 2.3.3 do show quite satisfactory agreement between observed values and conjectured ones, with the degree of agreement increasing as the height of the zeros increases. (The slightly anomalous value for the moment of $delta sub n sup -2$ for $N=10 sup 18$ is due to one very small $delta sub n$ that is very unusual and will be discussed in sections\ 2.5 and 2.7.) .P The Kolmogorov test [KS; Section\ 30.49] yields a method for measuring the agreement between the observed distribution of the $delta sub n$ and the GUE predictions. If samples $x sub 1 ,^dd ,^x sub n$ are drawn from a distribution with a continuous cumulative distribution function $F(z)$, let $F sub e (z)$ denote the sample distribution function: .DS 2 .EQ F sub e (z) ~=~ n sup -1 | "{"k^:~ 1^<=^k^<=^n ,~~ x sub k ^<=^z "}" | ^. .EN .DE The Kolmogorov statistic is then .DS 2 .EQ (2.3.5) D ~=~ roman {"sup"} from z ~ |F sub e (z) ^-^F(z) | ^. .EN .DE If the $x sub i$ are drawn from the distribution corresponding to $F(z)$, then [KS; Eq.\ 30.132] .DS 2 .EQ (2.3.6) lim from {n^->^inf} ~ roman Prob (D^>^ un sup {- 1/2} ) ~=~ g(u) ^, .EN .DE where .DS 2 .EQ (2.3.7) g(u) ~=~ 2 ~ sum from r=1 to inf ~ (-1) sup r-1 ~exp (-2r sup 2 u sup 2 ) ^. .EN .DE Table\ 2.3.4 gives the Kolmogorov statistic $D$ for $delta sub n$ and $delta sub n + delta sub n+1$ for several blocks of $10 sup 6$ consecutive values of $n$. The set denoted by $N=10 sup 12$ corresponds to $n sub 9^<=^n^<^n sub 9 ^+^10 sup 6$; the ones denoted by $N=10 sup 20 (a)$, $N=10 sup 20 (b)$, and $N= 10 sup 20 (c)$ start at $n=n sub 6$, $n=n sub 8$ and $n=n sub 5$, respectively, where the $n sub i$ were defined at the beginning of this section. The ``$N=10 sup 12$ vs. GUE'' entry, for example, gives the Kolmogorov statistic of the $N=10 sup 12$ set when it is compared to the GUE distribution. For each value of $D$, the ``prob.'' column gives an estimate that this statistic would arise if the $delta sub n$ ($delta sub n + delta sub n+1$, respectively) were drawn independently for each $n$ from the GUE distribution. This estimate is obtained by evaluating $g( D^times ^1000)$. The ``$N=10 sup 20 (a)$ vs. $N=10 sup 20 (b)$'' row of the table was obtained by constructing a continuous distribution from the $N=10 sup 20 (b)$ data and computing the Kolmogorov statistic for the discrete $N=10 sup 20 (a)$ data against this continuous distribution. .P What is apparent from Table\ 2.3.4 is that as the height increases, the empirical distributions of $delta sub n$ and $delta sub n + delta sub n+1$ do approach that of the GUE. In fact, when one computes the $D$ statistic for the $delta sub n$ in the 10 blocks of $10 sup 5$ consecutive zeros that are contained in the $N=10 sup 20 (b)$ set, one obtains values ranging between 0.002 and 0.0031, which correspond to probabilities of between 0.83 to 0.3 of occurring if the $delta sub n$ were drawn from the GUE distribution. Thus for sets of $10 sup 5$ zeros around zero number $10 sup 20$, it is essentially impossible to distinguish the empirical distribution of the $delta sub n$ from the expected one. (For $delta sub n + delta sub n+1$, the corresponding $D$ values are 0.0035 and 0.00555, which gives probabilities of 0.17 and 0.004, so the fit here is slightly worse.) .P The comparison of the three different sets of $10 sup 6$ zeros near zero number $10 sup 20$ to each other is quite revealing. The Kolmogorov statistics $D$ are quite small (especially for $delta sub n$), and indicate that all three sets come from essentially the same distribution. Thus what seems to be happening is that at each height, when we examine large sets of zeros, the $delta sub n$ and $delta sub n + delta sub n+1$ behave as if they were drawn independently from some distributions that depend on $t$, change relatively slowly as $t$ changes, and tend to the GUE distributions as $t^->^inf$. .H 2 "Values of $bold S(t)$" The upper bound (2.0.10) for $S(t)$ is the best that is known unconditionally. The Lindelo\*:f Hypothesis (see Section\ 2.8) implies that $|S(t)| ^=^o( log ^t)$ as $t^->^inf$, while the RH implies [Tit2] that .DS 2 .EQ (2.4.1) |S(t)| ~=~ O left ( {log^t} over {log^log^t} right ) ~~~roman as ~~~ t^->^inf ^. .EN .DE The true rate of growth is thought to be much smaller. The best lower bound that has been proved under the RH is due to Montgomery [Mon3], and gives .DS 2 .EQ (2.4.2) S(t) ~=~ OMEGA sub +- left ( left ( {log^t} over {log^log^t} right ) sup 1/2 right ) ~~~roman as ~~~ t^->^inf ^. .EN .DE (The best unconditional bound, due to Tsang [Ts1,\|Ts2], replaces the square root in (2.4.2) by a cube root.) Montgomery [Mon3] has conjectured that the quantity on the right side of (2.4.2) represents the correct rate of growth of $S(t)$, and Joyner [Joy2] has presented a heuristic argument supporting this conjecture. As we will see in Section\ 2.5, the GUE suggests that $|S(t)|$ might occasionally get as large as $( log ^t) sup 1/2$, which would contradict the Montgomery conjecture. In any case, it is thought likely that .DS 2 .EQ (2.4.3) |S(t)| ~<=~ ( log ^t) sup {1/2 ^+^o(1)} ~~~roman as ~~~ t^->^inf ^. .EN .DE Some lower bounds for $S(t+h) ^-^S(t)$ are also known, see [Ts1,\|Ts2], for example. .P Not only is $S(t)$ small, but its oscillations tend to cancel out. If we define .DS 2 .EQ (2.4.4) S sub 1 (t) ~=~ int from {t sub 0} to t ~ S(u) du ^, .EN .DE then $|S sub 1 (t) | ^=^ O( log ^t)$ unconditionally, and $|S sub 1 (t) |^=^ O( ( log ^t) ( log^log^t) sup -2 )$ on the RH [Tit2]. The true maximal order of magnitude of $|S sub 1 (t) |$ is probably again around $( log^t) sup 1/2$. (See [Ts1,\|Ts2] for lower bounds. The estimate $|S sub 1 (t) |^=^ o( log ^t)$ is equivalent to the Lindelo\*:f Hypothesis, see Notes to Chapter\ 13 of [Tit2].) Furthermore, if one chooses $t sub 0$ appropriately, then one obtains .DS 2 .EQ (2.4.5) int from 0 to t ~ S sub 1 (u) du ~=~ O( log ^t) ~~~roman as ~~~ t^->^inf ^. .EN .DE (The same property applies to further iterations of this process.) In addition, .DS 2 .EQ lim from {T^->^inf} ~ T sup -1 ~ int from 0 to T ~ S sub 1 (t) sup 2 dt ~=~ c .EN .DE exists for a constant $c^>^0$ (Theorem\ 14.19 of [Tit2]). .P Selberg [Sel2] proved, under the assumption of the RH, that for every fixed positive integer $k$, .DS 2 .EQ (2.4.6) int from 0 to T ~ S(t) sup 2k dt ~=~ {(2k)!} over {k! (2 pi ) sup 2k} ~T ( log^log^T) sup k ( 1^+^O( ( log ^log^T) sup -1 )) .EN .DE as $T^->^inf$. Later [Sel3] he proved similar estimates unconditionally, with $( log ^log ^T) sup -1$ in the remainder term replaced by $( log ^log ^T) sup -1/2$. Although it was apparently not noticed right away, these results imply (unconditionally) that $S(t)$ is asymptotically normally distributed with mean 0 and variance $2 pi sup 2 ^log ^log ^t$, so that for $alpha ^<^ beta$, .DS 2 .EQ (2.4.7) lim from {T^->^inf} ~ T sup -1 ^left | ^ left { t^:~ 0^<=^t^<=^T ,~ {S(t)} over {( 2 pi sup 2 ^log ^log ^T) sup 1/2} ^member^( alpha ,^beta ) right } right | ~=~ (2 pi ) sup -1/2 ^int from alpha to beta ^e sup {-x sup 2 /2} dx ^.~~~"\0\0\0" .EN .DE For further results on moments and distributions of $S sub 1 (t)$, $S(t+h)^-^S(t)$, and related functions, see [Fu1, Fu2, Fu3, Fu4, Fu8, GM, Gh1, Gh2, Go2, Joy1, Ts1, Ts2]. Goldston [Go2] has improved the estimate (2.4.6) for $k=1$ by showing, under the assumption of the RH, that .DS 2 .EQ (2.4.8) int from 0 to T ~S(t) sup 2 dt ~=~ T over {2 pi sup 2} ~ log ^log ^T ~+~ T over {2 pi sup 2} left ( c sub 1 ^+^ int from 1 to inf ^F( alpha ,T) alpha sup -2 d alpha right ) ~+~ o(T) .EN .DE as $T^->^inf$, where $F( alpha ,^T)$ is defined by (2.2.10), and $c sub 1$ is a constant, .DS 2 .EQ (2.4.9) c sub 1 ~=~ c sub 0 ~+~ sum from m=2 to inf ~ sum from p ~ left ( -^ 1 over m ~+~ 1 over {m sup 2} right ) ^1 over {p sup m} ^, .EN .DE where $c sub 0 ^=^ 0.577^"..."$ is Euler's constant. (The sign of the $m sup -1$ term is wrong in [Go2].) If Montgomery's pair correlation conjecture (2.2.12) holds, then $int from 1 to inf ^F( alpha ,^T) alpha sup -2 ^d alpha$ is asymptotic to the constant 1, but if this conjecture were to fail, it is conceivable that the second order term in the asymptotic expansion of $int from 0 to T ^S(t) sup 2 dt$ might oscillate. .P Table\ 2.4.1 presents data on the moments of $S(t)$. Statistics were collected on two intervals of the form $( gamma sub n ,^gamma sub {n+10 sup 6} )$, where $n=n sub 1 = 10 sup 12 - 6,^032$ for the $N=10 sup 12$ data, and $n=n sub 2 = 10 sup 20 - 48,^778$ for the $N=10 sup 20$ data. The average values of $S(t)$ and $S(t) sup 2$ for these sets are given in the $k=1 sup star$ and $k=2 sup star$ zeros. To obtain a good comparison with the asymptotic normal distribution, the other moments were scaled, so that if we let $sigma sup 2$ be the mean value of $S(t) sup 2$, then the $k=1,^2,^dd ,^8$ entries denote the average values of $( sigma sup -1 S(t)) sup k$, and the $k=|1| ,^|3|$, and $|5|$ entries the average values of $| sigma sup -1 S(t) | sup k$. Finally, the last column gives the corresponding values for the standard normal distribution. As we can see, the agreement between empirical values and asymptotic ones is reasonably good, and is somewhat better for $N=10 sup 20$ than for $N=10 sup 12$. .P Since $S(t)$ has jump discontinuities by 1 at zeros and decreases monotonically between zeros with derivative essentially $-1$ (on Gram point scale), and there is asymptotically one zero per Gram point, the smallest mean values of $S(t) sup 2k$ for any $k^member^Z sup +$ that is at all conceivable would be obtained by having a zero exactly halfway between every two neighboring Gram points. This would yield a mean value of $S(t) sup 2$ of 1/6. The values that are observed, 0.23 for $N=10 sup 12$ and 0.26 for $N=10 sup 20$, are not very much larger than that. .P That the distribution of $S(t)$ is close to the normal one can be seen visually in Fig.\ 2.4.1. This figure is based on determining for what fraction of values of $t^member^( gamma sub n sub 2 ,^gamma sub {n sub 2 + 10 sup 8} )$ we have $S(t) ^member^ [k/100 ,^(k+1)/100)$, and then scaling the resulting histogram by $sigma$ to produce a graph that can be compared to that of the standard normal distribution. It is curious that the observed distribution of $S(t)$ is less peaked than the normal one, whereas in most of the other comparisons the empirical distributions have sharper peaks than expected. It is especially interesting to compare Fig.\ 2.4.1 to Fig.\ 2.10.1, which compares the distribution of $log ^| Z(t) |$ (essentially the harmonic conjugate of $S(t)$) to the normal distribution. In both cases the limiting distributions are known to be normal (even without assuming the RH), but the observed deviations from normal behavior are different for $S(t)$ and $log ^|Z(t)|$, and are much more pronounced in the latter case. .P The area between the two curves in Fig.\ 2.4.1 is 0.023. For the corresponding figure using the $N=10 sup 12$ data, the area is 0.029. .P Since both $S(t)$ and its integral $S sub 1 (t)$ are very small, we can expect that $S(t)$ will have many sign changes, and several results in this direction have been proved, with the strongest ones being due to Ghosh [GL1] and Mueller [Mue1], but they are all quite weak. For example, Mueller proves that gaps between consecutive zeros of $S(t)$ are $O( log^log^log^t)$. Given that $S(t)$ has a limiting normal distribution with variance on the order of $( log^log^t) sup 1/2$ and mean close to 0, and that it cannot vary too widely (in particular, essentially all of the time it is monotone decreasing with derivative of the order of $-^log^t$), we might expect that the ratio of the number of zero crossings of $S(t)$ for $t^member^( gamma sub N ,^gamma sub N+M )$ to $M$ might be roughly the fraction of $t$ in $ ( gamma sub N,^gamma sub N+M ) $ for which $|S(t) |^<=^1$. This therefore suggests that there ought to be on the order of $( log ^log ^t) sup -1/2$ zeros of $S(t)$ per Gram interval. .P The number of sign changes of $S(t)$ in the intervals that have been investigated can be determined quite easily from the statistics of Gram blocks and exceptions to Rosser's rule that have been collected. When $g sub n$ is a good Gram point that is not close to an exception to Rosser's rule, and is not a zero of the zeta function, then $S(g sub n )^=^0$, and $S(t)$ changes sign at $g sub n$. We will count this sign change as occurring in the Gram interval $[g sub n ,^g sub n+1 )$. If $B(n,^k)$ is a Gram block that has exactly $k$ zeros, then an easy accounting shows that $S(t)$ has exactly 2 sign changes in $B(n,^k)$. On the other hand, when $B(n,^k)$ is an exception to Rosser's rule, and $[g sub m, ^g sub m+r )$ is the smallest union of Gram blocks that contains both the exception and the excess zeros, then a similar accounting shows that $[g sub m ,^g sub m+r )$ contains exactly 2 sign changes. Thus if Gram's law (see Section\ 2.12) held universally, we would have an average of 2 sign changes of $S(t)$ for every zero of $zeta (s)$. Departures from Gram's law lower this average. Table\ 2.4.2 shows the actual averages for the different data sets. There is a steady decrease in the average, but it is very slow. Since the argument in the preceding paragraph suggests a rate of decrease of $( log ^log ^t) sup -1/2$, this is not surprising. .P For every exception $B(n,^k)$ to Rosser's rule (see Section 2.13 for definitions) there is a $t$ nearby with $|S(t)|^>=^2$ (and even $|S(t)|^>^2$, if zeros do not coincide with Gram points, as seems likely). Statistics about these large values of $S(t)$ were collected during investigation of exceptions to Rosser's rule. Large values of $S(t)$ are of special interest because it is only when $S(t)$ is large that unusual behavior of the zeta function can take place. Locally extreme values of $S(t)$ occur at zeros. (Each zero has associated to it two values of $S(t)$, the limits of $S(t)$ as $t$ approaches the zero from the right or the left.) Table\ 2.4.3 shows the values of $S(t)$ for which $|S(t)|$ was largest in absolute value, as well as the number of zeros at which $|S(t)| ^>^2.3$ divided by the number of exceptions to Rosser's rule. The largest value of $|S(t)|$ that was found here is 2.7379, while among the first $1.5 times 10 sup 9$ zeros the largest such value is 2.3137 [LRW2]. (A point $t$ at which $|S(t)|^=^2.8747$ was found later in the computations described in Section\ 3.) Earlier computations established that $|S(t)|^<^1$ for $7^<^t^<=^280$, and $|S(t)|^<^2$ for $7^<^t^<=^6.8 times 10 sup 6$. .P The values of $S sub 1 (t)$ were investigated in the two intervals $( gamma sub n ,^gamma sub {n+10 sup 6} )$, where $n^=^10 sup 12 - 6,^032$ (for the $N=10 sup 12$ set) and $n^=^10 sup 20 - 48,^778$ (for $N=10 sup 20$). The values of $S sub 1 ( gamma sub n )$ were assigned so as to make .DS 2 .EQ (2.4.10) int from {gamma sub n} to {gamma sub m} ~ S sub 1 (t) dt ~=~ 0 ^ .EN .DE for $m=n+ 10 sup 6$. The data that was obtained is summarized in Table\ 2.4.4; the mean of $S sub 1 (t) sup 4$, for example, refers to .DS 2 .EQ 1 over {gamma sub m - gamma sub n} ~ int from {gamma sub n} to {gamma sub m} ~ S sub 1 (t) sup 4 dt ^. .EN .DE In addition to the uncertain choice of $S sub 1 ( gamma sub n )$, there were additional problems in these computations due to the accumulating errors due to uncertainties in the values of zeros and $S(t)$. Values computed over shorter intervals suggest that the mean values in Table\ 2.4.4 are accurate. The entry for sign changes of $S sub 1 (t)$ refers to the number of sign changes per Gram interval. This figure also appears to be quite accurate. Changing the initial value of $S sub 1 ( gamma sub n )$ by $+- ^10 sup -4$ varied the number of computed sign changes of $S sub 1 (t)$ for the $N=10 sup 20$ interval only between 73799 and 74089. .H 2 "Extreme values of gaps between zeros" In its weakest form, the GUE hypothesis predicts only that (2.2.8) holds for all $0^<=^alpha ^<^beta^<^inf$, and so it says essentially nothing about the existence of a small number $(o(M))$ of very large or very small $delta sub n$. A double zero of the zeta function, giving $delta sub n =0$, would not by itself contradict this weak hypothesis. On the other hand, it is known (cf.\ (2.2.3) and (2.2.5)) that in the GUE, .DS 2 .EQ (2.5.1) roman Prob ^( delta sub n ^<=^ x) ~=~ {pi sup 2} over 9 ~x sup 3 ~-~ {2 pi sup 4} over 225 ~x sup 5 ~+~ {pi sup 6} over 2205 ~x sup 7 ~+~...^, .EN .DE so very small $delta sub n$ (roughly $o(M sup -1/3 )$ among $M$ samples) are very unlikely in the GUE, and a similar result holds for large $delta sub n$. A strong form of the GUE hypothesis would predict that even extreme values of $delta sub n$ (and $delta sub n + delta sub n+1$) for the zeta function would behave roughly as in the GUE model. .P Given the constraints on $S(t)$ described in Section\ 2.4, one can expect that even if the strong form of the GUE hypothesis holds, it would only apply to the zeta function at large heights, and that the lower the region under investigation, the fewer extreme values of $delta sub n$ or $delta sub n + delta sub n+1$ there would be. This is clear for large values of $delta sub n$ and $delta sub n + delta sub n+1$, as these clearly correspond to large values of $| S(t) |$. It is also true for small values of $delta sub n$ and $delta sub n + delta sub n+1$, though, since several zeros clustered close together again force $|S(t)|$ to be relatively large. .P What was observed in [Od2] in a comparison of the first $10 sup 5$ zeros to $10 sup 5$ zeros starting with zero number $10 sup 12$ is that the above predictions were largely satisfied by the data. In general, there was a deficiency of extreme values of $delta sub n$ and $delta sub n + delta sub n+1$ (compared to the GUE prediction), but this deficiency declined as one considered the higher zeros. There was, however, one observation that went counter to expectations. The number of small $delta sub n$ that were observed at large heights was larger than predicted by the GUE theory. This excess was not large, but it was also observed in the data for $10 sup 5$ zeros starting with zero number $2 times 10 sup 11$, as well as by some data based on the first $1.5 times 10 sup 9$ zeros. This excess of small spacings was very counterintuitive, and so gave rise to some suspicions about the validity of the GUE hypothesis. .P Table\ 2.5.1 shows the extremal values of $delta sub n$ and $delta sub n + delta sub n+1$ that were found in each data set. (The number of zeros in each data set is given in Table\ 1.2.) The last column in Table\ 2.5.1 gives the probability that the minimal $delta sub n$ would not exceed the values in the second column if all the $delta sub n$ in the data set were drawn independently from the GUE distribution. From (2.5.1), we see that the probability that the smallest $delta sub n$ out of $M$ that are drawn from the GUE satisfies $delta sub n ^<=^x$ is approximately .DS 2 .EQ (2.5.2) 1 ~-~ left ( 1 ^-^ {pi sup 2} over 9 ~x sup 3 right ) sup M ~wig~ 1 ~-~ exp ( - pi sup 2 ^x sup 3 M/9 ) ^. .EN .DE This approximation was used to compute the last column of Table\ 2.5.1. We can see that most of the entries in that column are fairly high (although not too high, which would indicate a severe deficiency of small spacings), while those for $N=10 sup 18$ (where $delta sub n ^=^0.001124$ for $n^=^10 sup 18 ^+^12,^376,^780$, a case that will be discussed in sections\ 2.7 and 4.5) and for $N^=^10 sup 19$ (where $delta sub n ^=^0.000897$ for $n^=^10 sup 19 ^+^ 15,^987,^196$ is the smallest $delta sub n$ that was found) are extremely low. Furthermore, the smallest value of $delta sub n$ that is known is $delta sub n ^=^0.000310$ for $n^=^1,^048,^449,^114$ (found by van\ de\ Lune et\ al. [LRW2]), and the probability of such a small spacing occurring among $1.5 times 10 sup 9$ samples drawn from the GUE is only 0.048. Thus the extremely small values of the $delta sub n$ do appear to be somewhat too frequent. (Some more evidence pointing to this conclusion is presented in Section\ 3.) .P When we consider still very small, but slightly larger spacings, we find essentially no evidence of an excess of small spacings. Table\ 2.5.2 shows the number of $delta sub n ^<=^1/20$ and $<=^1/10$ observed in each set (given in number of cases per million zeros to make comparisons easier). If we consider the $N=10 sup 19$ entry for $delta sub n ^<=^1/20$, for example, we see that we are dealing with 2353 cases altogether, so a normal sampling error might be around 50, which is about 2%. Thus the 140.5 figure in the table is quite consistent with the 136.8 expected for the GUE. .P Still another way to judge whether there is any anomaly in the distribution of the $delta sub n$ or the $delta sub n + delta sub n+1$ is through the use of the quantile-quantile $(q-q)$ plots to compare the observed distributions to those of the GUE. Given a sample $x sub 1 ,^dd ,^x sub n$, and a continuous cumulative distribution function $F(z)$ for some distribution, the $q-q$ plot is obtained by plotting $x sub (j)$ against $q sub j$, where $x sub (1) ^<=^x sub (2) ^<=^...^<=^x sub (n)$ are the $x sub i$ sorted in increasing order, and the $q sub j$ are the theoretical quantiles defined by $F(q sub j ) ^=^ (j-1/2) /n$ [CCKT]. The $q-q$ plot is a sensitive method of detecting differences among distributions. In particular, while it does show the outliers that are far away from the expected position, it makes it possible to disregard them and concentrate on the main part of the distribution curve. If the $x sub j$ are drawn from the distribution corresponding to $F(z)$, and the sample size $n$ is large, the $q-q$ plot will be close to the straight line $y=x$. In all of our $q-q$ plots, straight lines $y=x$ are drawn to facilitate comparisons. (By the standards of typical statistical investigations, the sample sizes we deal with are very large, and the degree of agreement between conjecture and numerical evidence is very good, so one has to look at minute deviations.) .P The $q-q$ plots of [Od2] that showed the distribution of small $delta sub n$ indicated a deficiency of small $delta sub n$ for $N=1$, and a slight excess for $N=2 times 10 sup 11$ and $N= 10 sup 12$. These plots were each based on $10 sup 5$ values of $delta sub n$. When the new, more extensive data for $N=10 sup 12$ was obtained, the resulting $q-q$ plot was very similar to that of Fig.\ 2.5.1, and did not behave like the plot in Fig.\ 8 of [Od2] (which was based on only $10 sup 5$ zeros). Figures\ 2.5.1 and 2.5.2 show $q-q$ plots of $delta sub n$ drawn from two disjoint sets of $10 sup 6$ zeros near zero number $10 sup 20$. While the plot of Fig.\ 2.5.1 might indicate a very slight excess of small spacings (those in (0.02,\|0.04), roughly), and a slight deficiency of slightly larger spacings (where the scatterplot lies above the straight line), Fig.\ 2.5.2 indicates almost perfect agreement between theory and experiment. Figure\ 2.5.2 is not completely representative of zeros in the $N=10 sup 20$ sets, since it was the one of several $q-q$ plots based on disjoint sets of $10 sup 6$ zeros that gave the best agreement. Figure\ 2.5.1 is more typical in this respect. .P Figures\ 2.5.1 and 2.5.2 provide only a little, if any, support to the theory that there is an excess of small spacings among the zeros. Some further support can be found, however, if we aggregate all the data from the $N=10 sup 8$, $10 sup 19$, and $10 sup 20$ data sets, which contain $112,^314,^006$ zeros, and yield 112,\|314,\|003 values of $delta sub n$. The resulting $q-q$ plot, shown in Fig.\ 2.5.3, does indicate a slight excess of small $delta sub n$ (the two outliers close to the bottom of the graph are the unusually small $delta sub n$ that are minimal in the $N=10 sup 18$ and $10 sup 19$ data sets), but the evidence is not very conclusive. .P When we consider the other extremal values of $delta sub n$ and $delta sub n + delta sub n+1$, the evidence is in much better agreement with expectation. The counts in Table\ 2.5.2 show that the numbers of small $delta sub n + delta sub n+1$, large $delta sub n$, and large $delta sub n + delta sub n+1$ are all smaller than predicted by the GUE theory, but increasing towards that prediction. The $q-q$ plots of figures\ 2.5.4 through 2.5.7 also support this impression; there are too few extreme values in general, but the deficiency is smaller for $N=10 sup 20$ than for $N=10 sup 12$. .P In view of (2.2.6), one can expect that among the values of $delta sub n + delta sub n+1$ drawn from the GUE, the probability of the minimal value being $<=^x$ is about .DS 2 .EQ 1~-~ exp ( -^pi sup 6 ^x sup 8 M/32400 ) ^. .EN .DE The minimal value of $delta sub n + delta sub n+1$ of 0.1124 in the $N=10 sup 20$ data set would then occur with probability of 0.06 in the GUE, while the corresponding probabilities for the $N=10 sup 12$, $10 sup 14$, $10 sup 16$, $10 sup 18$, and $10 sup 19$ data sets are 0.93, 0.78, 0.25, 0.27, and 0.60, respectively. Thus the only one of these figures that might seem unusually small is that for the minimal $ { delta sub n } + delta sub n+1$ for $N=10 sup 20$. .P The maximal values of $delta sub n$ and $delta sub n + delta sub n+1$ recorded in Table\ 25.1 are all somewhat smaller than what the GUE predicts, which is not too surprising given the bounds known to hold for $S(t)$ and $S sub 1 (t)$. For very large spacings in the GUE, des\ Cloizeaux and Mehta [CM2] have proved that .DS 2 .EQ (2.5.3) log ~ p(0,^t) ~wig~ -~ pi sup 2 t sup 2 /8 ~~~roman as ~~~ t^->^ inf ^, .EN .DE which suggests that .DS 2 .EQ (2.5.4) max from {N+1^<=^n^<=^N+M} ~ delta sub n ~wig~ pi sup -1 ( 8 ^log ^M) sup 1/2 .EN .DE as $N,^M^->^inf$ with $M$ reasonably large compared to $N$. This is larger by about a $( log ^log^M) sup 1/2$ factor than the conjecture (2.4.2) of Montgomery allows. Our data is too limited to shed any light on the question of whether that conjecture is right. .P Values of $delta sub n$ and $delta sub n + delta sub n+1$ larger than those of Table\ 2.5.1 have been found in other computations, and are described in Section\ 3.1. In particular, the largest known values of $delta sub n$ and of $delta sub n + delta sub n+1$ are 5.1454 and 6.0165, respectively. .P Even on the assumption of the RH, it is only known that $delta sub n ^<=^0.5172$ and $delta sub n ^>=^2.337$ each occurs infinitely often [CGG1], and $delta sub n^>=^2.68$ occurs infinitely often on the assumption of the Generalized Riemann Hypothesis for Dirichlet $L$-functions (or at least of a Generalized Lindelo\*:f Hypothesis) [CGG2]. On the assumption of the RH, it is also known that $delta sub n ^<^ 0.77$ and $delta sub n ^>^ 1.33$ each holds for a positive proportion of $n$ [CGGGGH]. The GUE predicts that $delta sub n ^<^ epsilon$ and $delta sub n ^>^ epsilon sup -1$ should each hold for a positive proportion of $n$ for every fixed $epsilon ^>^0$. If one could prove that $delta sub n ^<^ 1/4$ holds for infinitely many $n$, we could obtain effective bounds for class numbers of imaginary quadratic number fields [MW]. The GUE hypothesis predicts that $delta sub n ^<^ 1/4$ for 1.6% of $n$'s, and this is very close to what is observed in numerical data. (For $delta sub n ^<^ 1/2$ the corresponding figure is 11.3%.) .H 2 "Long and short range correlations between zeros and Berry's formula" The distribution of the eigenvalues of the GUE is stationary (but not Markovian). In the limit, that also should be true for the zeros of the zeta function. However, given the slow growth rate of $S(t)$, one cannot expect GUE behavior from joint distributions of $delta sub n ,^delta sub n+1 ,^dd ,^delta sub n+k$ if $k$ is large. Already the data of sections\ 2.3 and 2.5 show that the behavior of $delta sub n$ is much closer to the GUE prediction than that of $delta sub n + delta sub n+1$. That was the main reason for not investigating $delta sub n + delta sub n+1 + delta sub n+2$ and even higher order spacings intensively. .P When we investigate long range correlations among the zeros of the zeta function, we find phenomena connected not to the GUE, but rather to the distribution of primes. For example, if we let the autocovariances of a set of $delta sub n$ be defined by .DS 2 .EQ (2.6.1) c sub k ~=~ c sub k (H,^M) ~=~ 1 over M ~ sum from {m=H+1} to H+M~ ( delta sub n -1) ( delta sub n+k -1 ) ^, .EN .DE then it has been conjectured by F.\ J. Dyson (unpublished) that in the GUE, .DS 2 .EQ (2.6.2) c sub k ~ approx ~ -1 over {2 pi sup 2 k sup 2} .EN .DE for $k^>^0$, with the $approx$ indicating some degree of approximation, not asymptotic equality as $N,^M^->^inf$. This result has not been proved for the GUE, but it is intuitively appealing for both the GUE and the zeros of the zeta function, since it says in effect that a large spacing would lead to smaller spacings nearby (and vice versa), and that this effect would diminish as one considered spacings further and further away. .P What was observed in [Od2] for the $delta sub n$ was quite different from the conjecture (2.6.2). Additional data based on the new computations is presented in Table\ 2.6.1. The $N=1$ entries come from the [Od2] computations, and have $H=0$, $M=10 sup 5$. The $N=10 sup 12$ and $10 sup 20$ entries come from the new computations, and both have $M=10 sup 6$, with $H^=^10 sup 12 - 6,^032$ for the $N=10 sup 12$ column and $H=10 sup 20 ^-^ 48,^776$ for the $N=10 sup 20$ column. (A comparison of the $N=10 sup 12$ entries here with those in Table\ 6 of [Od2], which are based on 1/10 as many zeros indicates the size of the sampling errors.) For small $k$, the data in this table supports Dyson's conjecture (2.6.2). What we observe is that for higher sets of zeros, the agreement with (2.6.2) extends to slightly higher values of $k$. However, for very high $k$, we see totally different behavior. If $delta sub n$ and $delta sub n+k$ were independent, then, since their mean value is 1 and variance is about 1/6, we would expect a sum of $10 sup 6$ terms of the form $( delta sub n -1) ( delta sub n+k -1 )$ (for $k^>^0$) to be about $10 sup 6/2 / 6 ^approx^170$, and this would correspond to a value of $c sub k$ of $1.7 times 10 sup -4$. The values in Table\ 2.6.1 for $9,980 ^<=^k^<=^10,000$ are usually much larger than that, which indicates that there are fairly strong long range correlation between the $delta sub n$. The pattern of signs of the $c sub k$ also indicates the nonrandom characters of the values of the $c sub k$. The $c sub k$ are occasionally positive, and occasionally negative, indicating that for some $k$, a large $delta sub n$ tends to be associated with large $delta sub n+k$, while for other $k$ it tends to be associated with small $delta sub n+k$. .P An explanation for the long range dependencies among the $delta sub n$ was proposed in [Od2]. It implies that the observed correlations come from primes through formulas such as that of Landau [Lan1], which says that for any fixed $y^>^0$, as $N^->^inf$ we have .DS 2 .EQ (2.6.3) sum from n=1 to H ~ e sup {i gamma sub n y} ~=~ left { matrix { lcol {- ^{gamma sub H} over {2 pi} ^e sup {-y/2}^log^p ^+^O(e sup -y/2^log^N) ~~above O(e sup -y/2^log^N)~~} lcol {roman if ~~ y^=^log^p sup m ^, above roman if ~~ y^!=^log^p sup m ^,} } .EN .DE where $p$ denotes a prime and $m^member^Z sup +$. The above statement assumes the RH, but Landau actually proved a similar unconditional result. Improvements on Landau's result (in terms of better error terms and more explicit dependencies of the error terms on $y$) have been obtained by Fujii [Fu5,\|Fu7] and Gonek [Gon2]. (There are many formulas relating primes and zeros, and the ``explicit formulas'' of Guinand [Gu1,\|Gu2] and Weil [We1] are among the most general.) .P The paper [Od2] presents the detailed explanation of how Landau's formula (2.6.3) forces the spectrum of the $delta sub n$ to consist largely of point masses at frequencies corresponding to prime powers, which then forces the initially unexpected behavior of the $c sub k$ that is seen the tables. This explanation will not be repeated here. We will mention only that while it is not rigorous, it is supported by heuristics and numerical evidence. What we will do now is to check how well Landau's formula (2.6.3) fits with the numerical data. The main interest here is to see just how many zeros $gamma sub n$ are needed at various heights to observe the phenomenon of large values occurring at logarithms of prime powers. Some proposals have even been made to use sums like that in (2.6.3) for primality testing and factoring integers. While it seems unlikely that efficient methods could be developed by this approach, it is of some interest to see what happens when one considers a relatively short sum over high zeros. .P Let .DS 2 .EQ (2.6.4) h(y) ~=~ sum from {n=10 sup 20 +1} to {10 sup 20 +4 times 10 sup 4} ~ e sup {i gamma sub n y} ^. .EN .DE Figure\ 2.6.1 shows a graph of $2^log ^| h(y) |$ for $0^<=^y^<=^3$. It is instructive to compare this graph with that of Fig.\ 15 of [Od2], which is drawn on the same scale, but is based on an exponential sum of $4 times 10 sup 4$ zeros starting at zero number $10 sup 12 +1$. Both graphs show sharp peaks precisely at logarithms of prime powers, and the peaks are visibly higher at primes than at proper prime powers, as predicted by Landau's formula. (The heights of the peaks are not represented too accurately on the graph due to limited sampling.) All the prime powers $<~e sup 3 ^=^20.09$ are visible. The main difference between the two graphs is that in Fig.\ 2.6.1 the peaks are slightly lower, and the ``noise'' region between the peaks has somewhat higher values. Furthermore, the nice regular patterns seen in the ``main'' regions of Fig.\ 15 of [Od2] (which come from sampling at regular intervals a very rapidly oscillating function whose frequency and amplitude are changing slowly) is not visible in Fig.\ 2.6.1. These differences are probably due partly to the errors in the computed values of the $gamma sub n$ near the $10 sup 20$-th zero and partly to the fact that we are taking a very short sum. $4 times 10 sup 4$ zeros out of the first $10 sup 20$ is a very small proportion, so it is remarkable that the pattern of Fig.\ 2.6.1 is as clear as it is, since this is much better than the proved results of [Fu5,\|Fu7,\|Gon2,\|Lan1] might lead one to expect. .P Figure\ 2.6.2 shows a graph of $2^log^| h(y) |$, where $h(y)$ is again defined by (2.6.4), but this time over the region $8^<=^y^<=^8.05$. (This graph is based on $10 sup 4$ equally spaced values of $y$.) The interval from $e sup 8 = 2980.96$ to $e sup 8.05 ^=^3133.79$ contains the primes 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 3067, 3079, 3083, 3089, 3109, 3119, and 3121, and the prime power $5 sup 5 =3125$. Figure\ 2.6.2 fails to distinguish between several close pairs of primes. When one graphs a similar sum, but with 10 times as many zeros, as is done in Fig.\ 2.6.3, all the primes can be distinguished, and even 3125 can be easily discerned. .P One of the most elegant long range correlations between zeros was found by Berry [Be4]. If we consider an interval of length $2 pi L ( log ^(T/(2 pi )) sup -1$ at height $T$, the expected number of zeros in it equals $L$. We define the number variance of the zeros by .DS 2 .EQ (2.6.5) V sub T (L) ~=~ V sub T,H (L) ~=~ H sup -1~int from T to T+H~ left { N left ( t^+^ {2 pi L} over {log^(t/(2 pi ))} right ) ^-^ N(t) ^-^L right } sup 2 dt ^. .EN .DE In the GUE, one has $V sub T (L)^=^G(L)$, with .DS 3 .EQ G(L) ~=~ pi sup -2 "{" mark log (2 pi L) ^-^ Ci( 2 pi L ) ^-^ 2 pi L ^Si (2 pi L ) .EN .sp .5 .EQ (2.6.6) ~~ .EN .sp .5 .EQ lineup ~~~~+~ pi sup 2 L ~-~ cos ( 2 pi L ) ^+^ 1^+^c sub 0 "}" ^, .EN .DE where $Ci$ and $Si$ are the cosine and sine integrals [HMF] and $c sub 0 ^=^ 0.577^dd$ is Euler's constant. Asymptotically, .DS 2 .EQ (2.6.7) G(L) ~wig~ pi sup -2 ^log (2 pi L ) ~~~~roman as ~~~~ L^->^inf ^, .EN .DE while .DS 2 .EQ (2.6.8) G(L) ~wig~ L ~~~~roman as ~~~~ L^->^0 ^. .EN .DE Gallagher and Mueller [GM] showed that Montgomery's pair correlation conjecture implies $V sub T (L) ^=^ L-L sup 2^+^ o( L sup 2 )$ as $L^->^ 0$, which is consistent with (2.6.8). (See also [Fu8].) On the other hand, the numerical evidence of [Od2] showed that $V sub T (L)$ was small even for moderately large $L$, and so a relation like (2.6.7) appeared impossible. Motivated by this discovery, by the relations between primes and long range correlation between zeros discussed above, and by his earlier work on eigenvalues of Hamiltonians of chaotic dynamical systems [Be1,\|Be2,\|Be3], Berry [Be4] found heuristic arguments which suggested that for any $tau^member^(0,^1)$, and any $L^>^0$, .DS 2 .EQ (2.6.9) V sub T (L) ~approx~ G(L) ~+~ B sub T (L) ^, .EN .DE where for $U^=^T (2 pi ) sup -1$, .DS 3 .EQ B sub T (L) ~=~ pi sup -2 left { 2~{sum from p ~ sum from r=1 to inf} from {p sup r ^<^ U sup tau}~mark {sin sup 2 ( pi L r ( log ^p) /( log ^U))} over {r sup 2 p sup r} .EN .sp .5 .EQ (2.6.10) ~~ .EN .sp .5 .EQ lineup ~+~ Ci( 2 pi L tau ) ~-~ log ( 2 pi L tau ) ~-~ c sub 0 "\b'\(rt\(bv\(rk\(bv\(rb'"^, .EN .DE and $p$ denotes primes. Computations using $10 sup 5$ zeros near zero number $10 sup 12$, using values of $L$ up to 1000, showed excellent agreement between Berry's conjecture (2.6.9) and empirical data, and those results are shown in the graphs in [Be4]. Note that the $log (L)$ terms in $G(L)$ and $B sub T (L)$ cancel out, and so for every fixed $L$, one can show that there is a positive function $g(L)$ such that .DS 2 .EQ (2.6.11) G(L) ~+~ B sub T (L) ~=~ g(L) ~+~ o(1) ~~~~roman as ~~~~ T ^->^ inf ^. .EN .DE Moreover, if $tau$ is held fixed, then it is easy to see that .DS 2 .EQ (2.6.12) G(L) ~+~ B sub T (L) ~wig~ pi sup -2 ^log^log^T ~~~~roman as ~~~~ T,^L~->~ inf ^, .EN .DE (with $L$ growing much more slowly than $T$), since the arguments of the sine in the definition (2.6.10) of $B sub T (L)$ will be asymptotically equidistributed modulo $2 pi$. .P The new values of zeros were used to obtain further data. For $N=10 sup 12$, the number variance $V sub T (L) ^=^ V sub T,H (L)$ defined by (2.6.5) was computed with .DS .TS center; r1 l1 l5 r1 l. $T$ $=$ $gamma sub n sub 0$, $n sub 0$ $=~ 10 sup 12 ^-^ 6,^032$, .sp $T+H$ $=$ $gamma sub m sub 0$, $m sub 0$ $=~ n sub 0 ^+^5 times 10 sup 5$. .TE .DE For $N=10 sup 20$, the values that were chosen were .DS .TS center; r1 l1 l5 r1 l. $T$ $=$ $gamma sub n sub 1$, $n sub 1$ $=~ 10 sup 20 ~-~ 48,^778$, .sp $T+H$ $=$ $gamma sub m sub 1$, $m sub 1$ $=~ n sub 1 ^+^ 5 times 10 sup 5$. .TE .DE Berry's function (2.6.10) was computed in each case with $tau ^=^ 1/4$. (Varying $tau$ between 0.2 and 0.3 did not appreciably change the results, as was to be expected.) The results of some of these computations for $N=10 sup 20$ are presented in figures\ 2.6.4 through 2.6.6. In Fig.\ 2.6.4, the dashed line is the graph of the GUE prediction $G(L)$, the solid line is the graph of Berry's prediction $G(L) ^+^ B sub T (L)$, and the scatterplot is that of computed values of $V sub T (L)$. In figures\ 2.6.5 and 2.6.6 the graphs of the computed values of $V sub T (L)$ and of Berry's prediction $G(L) ^+^ B sub T (L)$ were both drawn as solid lines, one superposed on the other. The very slight differences between the two curves show up as slight blotches on the graph. (The empirical data is slightly more wiggly than $G(L) ^+^ B sub T (L)$.) We see that even for $L^=^ 5 times 10 sup 5$, the agreement between computed and predicted values is almost perfect. .P A comparison of the graphs of [Be4] (and of similar graphs drawn with the more extensive data that has been obtained for $N=10 sup 12$ in the present computation) with figures\ 2.6.4 through 2.6.6 shows that for $N=10 sup 20$, the number variance oscillates less than for $N=10 sup 12$. The agreement of data with Berry's prediction is better for $N=10 sup 20$. .P While Berry's prediction (2.6.9) for $ V sub T (L) $was based on heuristic arguments, one can prove that a version of the conjecture follows from the RH and the pair correlation conjecture (2.2.12). This will be shown in a separate manuscript [Od4]. .H 2 "Lehmer phenomenon" For the RH to be true, $|Z(t)|$ cannot have any relative minima between two distinct consecutive zeros. Cases where .DS 2 .EQ (2.7.1) v sub n ~=~ max from {gamma sub n ^<^t^<^gamma sub n+1} ~ |Z(t)| .EN .DE is very small (so that in a sense the RH is ``almost violated'') are referred to as Lehmer's phenomenon [Lr2], and provide some of the more interesting heuristics both for and against the RH (cf.\ [Od3]). In this section we present statistics on the frequency of this phenomenon (which does not have a precise definition). .P The zero-locating program printed the largest value of $|Z(t)|$ that had been computed in each stretch of $10 sup 4$ zeros. To provide further information, the program was modified for the $N=10 sup 19$ and $N=10 sup 20$ data sets so as to obtain statistical information about the behavior of $v sub n$. Since getting a very good approximation to $v sub n$ would have required substantial computing time, what the program computed was the midpoint value .DS 2 .EQ (2.7.2) w sub n ~=~ |Z(( gamma sub n + gamma sub n+1 ) / 2) | ^. .EN .DE When a value of $w sub n ^>^ 250$ or $w sub n ^<^ 5 times 10 sup -4$ was encountered, it was printed together with $n$, $gamma sub n$, and $delta sub n$. (However, $w sub n$ was not computed for a total of roughly 100 zeros at the ends of data sets.) .P To see how good an approximation $w sub n$ was to $v sub n$, the values .DS 2 .EQ (2.7.3) v sub n sup star ~=~ max from {1^<=^k^<=^39} ~ left |^ Z left ( gamma sub n ^+^ k over 40 ^( gamma sub n+1 - gamma sub n ) right ) ^right | .EN .DE as well as of $w sub n$ were computed for $n sub 0 ^<=^n^<=^n sub 0 ^+^8 times 10 sup -5 -1$, where $n sub 0 ^=^10 sup 20 ^+^15,^316,^087$. Let .DS 2 .EQ (2.7.4) r sub n ~=~ v sub n sup star / w sub n ^. .EN .DE Then the maximal value of $r sub n$ that was found was 1.43. Only 755 out of the $8 times 10 sup 5$ values of $nu sub n$ were $>^1.2$, while the rms value of $r sub n -1$ was 0.029. Among the 873 values of $n$ for which $delta sub n ^<^0.1$, the maximal value of $r sub n$ was 1.008, and the rms value of $r sub n -1$ was $5.1 times 10 sup -4$. For the 898 values of $n$ for which $delta sub n ^>^2.5$, the corresponding numbers were 1.29 and 0.036. For the 244 values of $n$ for which $w sub n ^>^100$, these numbers were 1.137 and 0.032, while for the 1426 values of $n$ for which $w sub n ^<^ 0.01$, they were 1.072 and 0.0066. Thus in general the values of $w sub n$ do provide good approximations to $v sub n sup star$, and therefore surely also to $v sub n$. This was to be expected on the basis of the GUE predictions (in particular that the approximation would be exceptionally good when $delta sub n$ is small). In fact, the size of $v sub n$ is determined largely by the few zeros nearest to $delta sub n$ (cf.\ [Hej5, Hej6]), and so under the assumption of the GUE one can make quantitative predictions about the behavior of $r sub n$. .P Table\ 2.7.1 shows frequency of occurrence of values of $w sub n ^<^ 5 times 10 sup -4$ among the approximately $9.5 times 10 sup 7$ values of $n$ that were checked in the $N=10 sup 19$ and $N=10 sup 20$ data sets. The smallest value of $w sub n$ that was found there was $4.02 times 10 sup -6$, for $n=10 sup 19 ^+^15,^987,^197$ with ($delta sub n ^=^ 0.000897$), while the second smallest was $5.03 times 10 sup -6$. .P One might expect, and one does observe empirically, that the Lehmer phenomenon is associated to small values of $delta sub n$. If $delta sub n$ is small, then one might expect that $w sub n$ is almost proportional to $delta sub n sup 2$, since zeros other than $gamma sub n$ and $gamma sub n+1$ ought to contribute multiplicative factors that behave like a power of $log^gamma sub n$ on the average, and are at most $gamma sub n sup o(1)$ as $n^->^inf$ (assuming the Lindelo\*:f conjecture). Since the probability that $delta sub n ^<=^x$ is about $pi sup 2 x sup 3 /9$ for $x$ small (see sections\ 2.2 and 2.5), one might conjecture that the probability of $w sub n ^<^y$ might be proportional to $y sup 3/2$. This would suggest that among the first $n$ zeros, the smallest $w sub n$ might be on the order of $n sup -2/3+o(1)$ as $n^->^inf$. If true, this relation would settle an old question [Ed] about the number of terms in the asymptotic part of the Riemann-Siegel formula that have to be used to separate the zeros; even the old estimate of Titchmarsh [Tit1] with an error term of $O(t sup -3/4 )$ would suffice at large heights. .P The above heuristic about the behavior of small $w sub n$ is supported very well by empirical data. Among the 1976 values of $n$ in the $N=10 sup 19$ and $N=10 sup 20$ data sets that had $w sub n ^<^ 5 times 10 sup -4$, the ratio $w sub n / delta sub n sup 2$ varies between 0.0136 and 8.56, with a mean of 0.608 and a variance of 0.427. Thus the correlation between $delta sub n sup 2$ and $w sub n$ is only fairly good. On the other hand, these $w sub n$ follow almost perfectly the rule conjectured above that the fraction of them that are $<^y$ ought to be proportional to $y sup 3/2$. This can be seen from the counts in Table\ 2.7.1, as well as by looking at the ratio of the \%$k$-th smallest $w sub n$ to $5 times 10 sup -4 times (k/1978) sup 2/3$, which varies between 0.715 and 1.267, with a mean of 1.01 and a variance of $9 times 10 sup -4$, and from looking at a $q-q$ plot of the sorted $w sub n$ against $5 times 10 sup -4 times (k/1976) sup 2/3$. Thus on average the influence of the neighboring $gamma sub k$ cancels out. .P The most extreme example of the Lehmer phenomenon that was found during the computations described in this paper occurs for $n= 10 sup 18 ^+^12,^376,^780$, where $v sub n$ and $w sub n ^=^5.28 times 10 sup -7$ and $delta sub n ^=^ 0.001124$. A graph of $Z(t)$ in the vicinity of this point is given in Fig.\ 2.7.1. (Figure\ 2.7.1 shows also what looks like another case of the Lehmer phenomenon near Gram point $n-5$, but in that case the minimum of $Z(t)$ reaches $-0.0094$, and so it does not qualify under our definition.) A much more detailed view of $Z(t)$ in a very small neighborhood of this Lehmer phenomenon is given in Fig.\ 4.5.1. (That picture plays an important role in the discussion of the validity of the present computations that is presented in Section\ 4.5.) .P The most extreme example of the Lehmer phenomenon that is known was found by van\ de\ Lune et\ al. [LRW2]. For $n=1,^048,449,^114$, they discovered that $delta sub n ^=^ 0.000310$, while $v sub n ^=^ 2.2 times 10 sup -7$ $(>=^w sub n )$. Since the height of this example is only about the square root of that for $n=10 sup 18 ^+^12,^376,^780$, it could be argued that the higher example of this paper is even more extreme. However, the $delta sub n$ found by van\ de\ Lune et\ al. is by far the smallest of any that are known. .H 2 "Large values of $fat zeta bold {(1/2 +it)}$" The largest value of $|Z(t) |^=^ | zeta (1/2 + it )|$ that was encountered by van\ de\ Lune et\ al. [LRW2] in their investigation of the first $1.5 times 10 sup 9$ zeros was 117. Table\ 2.8.1 lists the largest values of $|Z(t)|$ that were encountered in each of the data sets computed in this paper. The main zero locating program kept track of the largest value of $|Z(t)|$ that had been computed, but did not attempt to do a careful search for large values. However, since large values are usually associated with large $delta sub n$, the standard zero locating procedure seemed to be quite good at finding the high peaks in $|Z(t)|$. For $N=10 sup 19$ and $N=10 sup 20$ data sets, the more careful procedure described in Section\ 2.7 was employed, which provided even more reliable statistics. The number of values of $n$ in those two data sets for which $w sub n$ (defined in 2.7.2) exceeded various thresholds is given in Table\ 2.8.2. (Section\ 3 lists some values $t$ for which $|Z(t)|$ is much larger and which were found by a different procedure.) .P The rate of growth of $|Z(t)|$ is one of the most intensively studied problems in the theory of the zeta function, since bounds on it provide estimates on the distribution of zeros away from the critical line. It is very easy to show that .DS 2 .EQ (2.8.1) |Z(t)| ~<=~ t sup {alpha + o(1)} ~~~roman as ~~~ t^->^inf .EN .DE with $alpha ^=^ 1/4$. Exponential sum methods were used in the first few decades of this century to show that (2.8.1) holds with $alpha ^=^1/6$, and then to successively lower this value of $alpha$. (See the Notes to Chapter\ 5 of [Tit2] for a list of the improvements.) Until recently, the smallest value of $alpha$ for which (2.8.1) was known to hold was $alpha ^=^ 139/858^=^0.162004 dd$, due to Kolesnik [Ko], and there were indications that this result was close to the limit of what the ``exponent pair'' method that was being used could yield [GK]. However, Bombieri and Iwaniec [BI] have obtained a new method that gave $alpha ^=^ 9/56 ^=^ 0.16071 dd$. This method was then developed by Huxley and Watt [HW] and was used very recently by Watt [Wa] to show that (2.8.1) holds with $alpha ^=^89/560 ^=^0.15892 dd$. .P The Lindelo\*:f hypothesis is the statement that (2.8.1) holds with $alpha =0$. The RH yields a slightly stronger bound [Tit2] .DS 2 .EQ (2.8.2) |Z(t)| ~<=~ exp (c( log ^t) ( log ^log ^t) sup -1 ) .EN .DE for some $c>0$. On the other side, Balasubramanian and Ramachandra [Bala,\|BR] have shown that .DS 2 .EQ (2.8.3) max from {0^<=^t^<=^T} ~ |Z(t) |~>=~ exp left ( {3( log ^T) sup 1/2} over {4( log ^log ^T) sup 1/2} right ) .EN .DE if $T$ is large enough and more generally, that if $eta ^>^0$, then for $T^>=^T( eta )$ and $( log ^T) sup eta ^<=^H^<=^T$, we have .DS 2 .EQ (2.8.4) max from {T^<=^t^<=^T+H} ~|Z(t)| ~>=~ exp left ( {3( log ^H) sup 1/2} over {4( log^ log^H) sup 1/2} right ) ^. .EN .DE Montgomery [Mon3] has conjectured that (2.8.3) is close to the real rate of growth of $|Z(t)|$. .P While the data that was collected about large values of $|Z(t)|$ probably does reflect accurately the behavior of the zeta function in these ranges, it does not help in assessing what the true rate of growth of $|Z(t)|$ is. There are two main problems. One is the relatively small number of zeros that were investigated. Since large values of $|Z(t)|$ are very rare, we probably do not even have a good representation of the large values of $|Z(t)|$ for $t^<^ gamma sub n$, $n=10 sup 20$. (This is supported by the results of Section\ 3, where much higher values were found by special methods.) Another problem in using our data to assess the true growth rate of $|Z(t)|$ arises from the slow approach to true asymptotic behavior. As is noted in Section\ 2.10 (see especially Fig.\ 2.10.1), even $log^|Z(t)|$ in the ranges that have been investigated can be rather far from its eventual distribution. Furthermore, as was noted in the Introduction, even when one investigates at heights $t^approx^1.5 times 10 sup 19$, it is hard to tell the differences in growth rates between various functions. (The situation is not quite as bleak as might seem from the argument used in the Introduction, since one can use relatively sensitive tools such as ratios of values of a function at different points to estimate its growth rate, but that only helps to a very limited extent.) Note that for $t= gamma sub n$, $n=10 sup 20$, the bound (2.8.3) is only 12.9. .P Before concluding this section, we present some more statistics on the large values of $|Z(t)|$ that were found in the $N=10 sup 19$ and $10 sup 20$ data sets. Altogether 565 values of $n$ were recorded for which $w sub n ^>^250$. The largest is $w sub n ^=^631.7$ for $n^=^10 sup 20 + 13,^704,^916$, for which $delta sub n = 3.1428$. (The maximum of $|Z(t)|$ between $gamma sub n$ and $gamma sub n+1$ is at least 641, and there is no violation of Rosser's rule in the neighborhood of $gamma sub n$.) Of the 565 values, 94 are associated with violations of Rosser's rule. (Of the 28 values of $n$ for which $w sub n ^>^400$, 7 are associated to violations of Rosser's rule.) The smallest value of $delta sub n$ that was found for these 565 values of $n$ was 2.07, and the largest was 4.03. .P There was a fairly substantial correlation between $delta sub n sup 2$ and $w sub n$ among these 565 samples. The ratio $w sub n / delta sub n sup 2$ was in the range (19.47,\|64.74), with a mean of 35.47 and variance 61.32. However, at very large heights one would expect this correlation to diminish, in contrast to the situation for Lehmer's phenomenon (Section\ 2.7). In the latter case the GUE theories predict that $delta sub n sup 2$ will occasionally get as small as $n sup -2/3$, so that the influence of the other zeros (likely to be $n sup o(1)$ because of the Lindelo\*:f hypothesis and the separation of the other zeros that is predicted by the GUE) will not affect the size of $Z(t)$ very much. On the other hand, the GUE theories predict that $delta sub n sup 2 ^=^O ( log ^n )$, and since $Z(t)$ is known to get much larger (cf.\ (2.8.3)), this must be due to some longer range imbalances in the locations of the zeros. One model for distribution of $Z(t)$ (first proposed informally by Montgomery, and worked out in detail by Bombieri and Hejhal [BH, Hej5, Hej6]) predicts that at large heights, the size of $Z(t)$ is determined primarily by long ``amplitude'' waves, which are then slightly modulated by local distributions of zeros. This model predicts that there should be clusters of large values of $|Z(t)|$, and that over very wide ranges, $w sub n$ ought to depend mostly on the ``amplitude'' waves, and not on $ delta sub n$. The fact that there is a very strong correlation between the large $w sub n$ and $delta sub n sup 2$ in our data might therefore indicate that we are not seeing the true asymptotic behavior. .H 2 "Moments of $fat zeta bold {(1/2 + it)}$ It is conjectured that for every $lambda^>=^0$, .DS 2 .EQ (2.9.1) lim from {T^->^inf} ~ T sup -1 ( log^T) sup {- lambda sup 2} ~int from 0 to T ~ | Z(t) | sup {2 lambda} ^dt ~=~ c( lambda ) .EN .DE exists, with $c( lambda ) ^>^0$ for all $lambda$. A proof of this conjecture, or even of some much weaker bound, would be very important, since it would prove the Lindelo\*:f conjecture. However, this conjecture is only known to be true for $lambda =0$ with $c(0)=1$ (trivial), $lambda =1$ with $c(1)=1$, and $lambda =2$ with $c(2) = (2 pi sup 2 ) sup -1$ (see the Notes for Chapter\ 7 in [Tit2] for detailed information and references). No specific values have been conjectured for $c( lambda )$ in general, but under the assumption of the RH, Conrey and Ghosh [CG1] have shown that $c( lambda ) ^>=^c sub 1 ( lambda )$, where .DS 2 .EQ (2.9.2) c sub 1 ( lambda ) ~=~ GAMMA ( 1+ lambda sup 2 ) sup -1 ~ prod from p ^left { left ( 1 - 1 over p right ) sup {lambda sup 2} ~sum from m=0 to inf ~ left ( {GAMMA (m+ lambda )} over {m!^GAMMA ( lambda )} right ) sup 2 ^p sup -m right } ^, .EN .DE and since $c( lambda ) = c sub 1 ( lambda )$ for $lambda =0$ and 1, they suggested that perhaps $c( lambda ) = c sub 1 ( lambda )$ for all $lambda^member^[0,^1]$. Since $c sub 1 (2) = ( 4 pi sup 2 ) sup -1 = c(2)/2$, equality of $c( lambda )$ and $c sub 1 ( lambda )$ is unlikely outside the range $0^<=^lambda ^<=^1$. (There is a mistake on this point in the Notes to Chapter\ 7 of [Tit2].) Conrey and Ghosh [CG3] have shown that the derivatives of $c sub 1 ( lambda )$ and $c( lambda )$ with respect to $lambda$ agree at $lambda =0$ and 1. Also, for $0^<=^lambda ^<^1$, Heath-Brown [HB2] has shown under the assumption of the RH that if $c ( lambda )$ exists, it is not much larger than predicted by the Conrey-Ghosh conjectures. .P One of the purposes of this section is to provide some numerical evidence about possible values of $c( lambda )$. One might expect that if .DS 2 .EQ (2.9.3) r( lambda ,^T,^H) ~=~ H sup -1 ( log ^T) sup {- lambda sup 2}~ int from T to T+H ~ |Z(t)| sup {2 lambda} ^dt ^, .EN .DE then $r( lambda ,^T,^H)^wig^c( lambda )$ as $T^->^inf$, if $H$ grows sufficiently fast with $T$ while $lambda$ is held fixed. Table\ 2.9.1 presents some values of $r( lambda ,^T,^H)$ computed for $T= gamma sub n sub 0$ with $n sub 0 = 10 sup 20 ^+^47,^098,^588$ and $T+H^=^ gamma sub n sub 1 ,^n sub 1 = n sub 0 + 10 sup 6$. Each of the $10 sup 6$ gaps between consecutive zeros was divided into 40 intervals, $Z(t)$ was evaluated at the endpoints of these subintervals, and Simpson's rule was applied to estimate the integral. Variations on this procedure showed that it produced estimates that were accurate to at least three decimal places (and more for high moments, as Simpson's rule is least accurate for small $lambda$). However, the values in the tables, especially for large $lambda$, have to be used with caution because even an interval of $10 sup 6$ zeros around zero number $10 sup 20$ is too small to be truly representative. For example, similar data was obtained for $T= gamma sub n sub 2$ with $n sub 2 = 10 sup 20 + 15,^316,^087$ and $T+H^=^gamma sub n sub 3$, $n sub 3 = n sub 2 + 8 times 10 sup 5$, and also for $T= gamma sub n sub 4$, $n sub 4 = 10 sup 20^-^ 15,^409,^244$, $T+H = gamma sub n sub 5$, $n sub 5 = n sub 4 + 10 sup 6$. For $lambda =1$, the values found there differed by less than 0.5% from those in Table\ 2.9.1, but for $lambda = 2.5$ these values were 1.20 and 0.752 times those in Table\ 2.9.1, respectively. The problem is that high moments are determined largely by the few exceptionally large values of $Z(t)$, and those are very rare. (See the next section for some further evidence of this.) To get a good sample, for large $lambda$, one would need to integrate $|Z(t)| sup {2 lambda}$ over much longer intervals. .P The data in Table\ 2.9.1 is reasonably consistent with the Conrey-Ghosh conjectures that $c( lambda ) = c sub 1 ( lambda )$ for $0^<=^lambda ^<=^1$ and that $c( lambda ) = c sub 2 ( lambda )$ for $1^<=^lambda ^<=^2$. Even for $lambda ^=^5/2$, where $c sub 3 ( lambda ) ^=^ 11.802 ^dd ^times^c sub 1 ( lambda )$, the agreement of data with conjecture is very good, as $r( lambda , H) / c sub 1 ( lambda ) ^=^11.38 ^dd$. However, given the differences between the empirical data for $lambda =1$ and 2 and the known asymptotic values, it is hard to draw any definitive conclusions. For $lambda =1$, estimates of the second moment of $Z(t)$ are known that are better than (2.9.1). They are of the form .DS 2 .EQ (2.9.4) int from 0 to T ~Z(t) sup 2 dt ~=~ T( log ^T -1- log (2 pi ) + 2 c sub 0 ) ~+~ E(T) ^, .EN .DE where $c sub 0$ denotes Euler's constant $(=^0.577215^"...")$, and $|E(T)|^=^O(T sup alpha )$ for various $alpha ^<^1/3$. (The best current value of $alpha$ is $139/429 + o(1)$ as $T^->^inf$, due to Kolesnik [Ko] and in a slightly sharper form to Hafner and Ivi\o'c\(hc' [HI]. Note that 139/429$^=^$0.3240....) If we let $r sup star ( lambda ,^T,^H)$ be defined similarly to $r( lambda ,^T,^H)$, but with $log^T$ in (2.9.3) replaced by $log^T - log ( 2 pi ) + 2 c sub 0$, we find that for the values of $T$ and $H$ that were used to compute Table\ 2.9.1, $r sup star (1,^T,^H)^=^1.004$, which is closer to the asymptotic value $c(1) =1$ than the value of $r(1,^T,^H)^=^0.989$. (The other two sets of values that were considered give $r sup star (1,^T,^H)^=^1.0003$ and 0.9995, respectively.) Thus one of the main problems in using the empirical data is that we do not have good conjectures about asymptotics of moments of $Z(t)$, and that second order terms in those asymptotics are likely to be only slightly smaller than the main terms. (See also Section\ 2.10 on deviations between observed and expected behavior of $Z(t)$.) .P Some data were obtained also about the negative moments of $|Z(t)|$. Table\ 2.9.2 shows some values of .DS 2 .EQ 1 over H ~ int from T to T+H ~|Z(t)| sup {- 2 lambda} ^dt .EN .DE for $T$ and $H$ as in Table\ 2.9.1. (The values for $T= gamma sub n sub 2$, $T+H ^=^gamma sub n sub 3$, were essentially identical.) They were obtained by applying Simpson's rule to the inner 38 subintervals in every gap between consecutive zeros, and approximating $|Z(t)|$ by a linear function on the two outer subintervals. .P Conrey and Ghosh [CG2] have shown (assuming the RH) that .DS 2 .EQ (2.9.5) 1 over M ~ sum from m=1 to M ~ max from {gamma sub m ^<^t^<^gamma sub m+1} ~Z(t) sup 2 ~wig~ 1 over 2 (e sup 2 - 5) ^log ^( gamma sub M /( 2 pi )) .EN .DE as $M^->^inf$. Since $c(1) =1$, this means that on average $Z(t) sup 2$ at its maxima is $1+ 1 over 2 (e sup 2 -7) = 1.1945^"..."$ times the average of $Z(t) sup 2$ over the entire range $0^<^t^<=^gamma sub M$. (This surprisingly small factor of 1.1945... is due to the fact that the values of $Z(t) sup 2$ at the critical points where they achieve their maxima are not weighted by the lengths of the intervals on which the maxima are computed. Large values of $Z(t)$ are usually associated to large gaps between consecutive zeros.) Actual computation over the range from $T= gamma sub n sub 2$ to $T+H = gamma sub n sub 3$ yielded a value of 1.224... instead of the asymptotic value of $1.1945 dd ^$. (The value 1.224... is probably a slight underestimate of the actual ratio, since the actual maxima were not determined, but the largest of the values at the 40 evenly spaced points was used.) .P Gonek [Gon1] has shown, again assuming the RH, that .DS 2 .EQ (2.9.6) 1 over M ~ sum from m=1 to M ~ Z( gamma sub m + i alpha DELTA ) sup 2 ~ wig~ left ( 1 ^-^ left ( {sin ^pi alpha} over {pi alpha} right ) sup 2 right ) ^log^( gamma sub M / ( 2 pi )) .EN .DE as $M^->^inf$, when $DELTA = 2 pi ( log ( gamma sub M /( 2 pi ))) sup -1$. Computations for $alpha =0.1,^0.2,^"...",^0.9$ and over the zeros numbered $n sub 4$, $n sub 4 +1 ,^"...",^n sub 5 -1$ showed reasonably good argument, but with the ratio of empirical data to Gonek's asymptotic estimate declining by 4% as $alpha$ goes from 0.1 to 0.9. .H 2 "Distribution of values of $fat zeta bold {( 1/2 + it)}$" .P Since .DS 2 .EQ log ^zeta (1/2 + it) ~=~ log ^ | Z (t) | ~+~pi i S(t) ^, .EN .DE it is not surprising that methods that yield the distribution of $S(t)$ should give corresponding results for $log ^| Z(t) |$. In fact, Selberg in unpublished manuscripts studied mean values of $( log ^zeta ( 1/2 + it)) sup h ( log ^zeta (1/2 - it )) sup k$ for nonnegative integers $h$ and $k$, and his results imply, for example, that for rectangles $E$ in $R sup 2$ .DS 2 .EQ (2.10.1) lim from {T^->^inf} ~ 1 over T ^left | ^ left { t^:~ T^<=^t^<=^2T ,~ {log ^zeta (1/2 + it )} over {( 2 sup -1 ^log^log^T) sup 1/2} ^member^E right } ^right | ^=^ ( 2 pi ) sup -1~ {int int} from E ~ e sup {- ( x sup 2 + y sup 2 )/2} dx dy ^,~~~"\0\0\0" .EN .DE so that in particular, for any $alpha ^<^ beta$, .DS 2 .EQ (2.10.2) lim from {T^->^inf} ~ 1 over T ^left | ^ left { t^:~ T^<=^t^<=^2T, ~alpha ^<^ {log^|Z(t)|} over {( 2 sup -1 ^log^log^T ) sup 1/2} ^<^ beta right }^ right | ~=~ (2 pi ) sup -1/2~int from alpha to beta ^e sup {- x sup 2 /2} dx ^.~~~"\0\0\0" .EN .DE Thus the real and imaginary parts of $log ^zeta (1/2 + it )$ behave like independent normal variables with means 0 and variances $( log ^log ^t) /2$. While Selberg's results have not been published, they were known to some mathematicians (see [Hej6,\|Joy1,\|Jut,\|Mon6\), and some extensions of Selberg's results have been obtained by Joyner [Joy1] and Tsang [Ts2]. The weaker result (2.10.2) has been reproved by Laurinchikas [Lau1,\|Lau2,\|Lau3,\|Lau4,\|Lau5]. .P The critical issue is whether the approximation (2.10.2) is accurate even for $T$ fixed and $alpha$ and $beta$ varying over fairly wide ranges. If that is the case, then we are led to expect that something like (2.9.1) holds. Furthermore, if the approximation is very good even for $alpha$ and $beta$ relatively large (compared to $T$), one would expect that the maximal size of $|Z(t)|$, for $0^<=^t^<=^T$, would be of the order of $exp (( log ^T) sup {1/2 + o(1)} )$, which is conjectured by some to be the true rate of growth of $Z(t)$ (cf. Section\ 2.8). Thus it is of substantial interest to find out more about the tails of the distribution of $log ^|Z(t)|$. .P For $n sub 0 ^<=^n^<=^n sub 1 -1$, $n sub 0 ^=^ 10 sup 12 - 6032$, $n sub 1^=^n sub 0 + 10 sup 6$, each interval $( gamma sub n ,^gamma sub n+1 )$ was partitioned into 40 equal subintervals, $Z(t)$ was evaluated at the endpoints of these subintervals, and a linear approximation to $Z(t)$ between consecutive evaluation points was used to estimate .DS 2 .EQ (2.10.3) b sub {alpha ,^beta} ~=~ 1 over {gamma sub n sub 1 - gamma sub n sub 0} ~ left | ^"{"t^:~ gamma sub n sub 0 ^<=^t^<=^gamma sub n sub 1 , ~~ alpha ^<=^log ^| Z(t) | ^<=^beta "}" ^right | .EN .DE for $beta ^=^ alpha + 1/100$, $alpha^=^ k/100$, $- 1000^<=^k^<^1000$. The mean of this distribution, referred to as $N=10 sup 12$, (as derived from the $b sub {alpha ,^beta}$ data) was $5.29 times 10 sup -4$ and the variance was 2.2930. Similar data was obtained for $n sub 2 ^<=^n^<=^n sub 3 -1$, $n sub 2 ^=^ 10 sup 20 + 15,^316,^087$, $n sub 3 ^=^ n sub 2 + 10 sup 6$, and there the mean was $5.20 times 10 sup -4$ and the variance was 2.5657. (This is the $N=10 sup 20$ distribution.) Based on (2.10.2), one would expect mean values of 0, which is quite close to the calculated values, given the errors in the computation and sampling errors. The values for the variances would be expected to be $( log ^log ^T) /2$, where $T$ is the height of the data set, which equals 1.635 and 1.894 for the two data sets, respectively. Since $( log ^log ^T) /2$ is only the asymptotic value and increases very slowly, lower order terms can be expected to be significant, and so the agreement between observed data and theory is reasonably good on this point as well. However, the shapes of the observed distributions of $log ^|Z(t)|$ appear to be quite different from the asymptotic normal distribution. To obtain a good comparison, the two distributions for $N=10 sup 12$ and $10 sup 20$ were each scaled so as to have variance\ 1, and were plotted in Fig.\ 2.10.1 together with the standard normal distribution. We see that while the fit of the $N=10 sup 20$ data is slightly better than that for $N=10 sup 12$, it is not much better. This is in great contrast to the fit of the data for $S (t)$ (which, apart from a factor of $1/ pi$, is the imaginary part of $log ^zeta (1/2 +it)$, while $log ^| Z(t)|$ is the real part of it) which, as we see in Section\ 2.4 and Fig.\ 2.4.1, is much better. It might be of some interest to compute second order terms in the expansion of moments of $log ^| Z(t) |$ to see what is responsible for the deviations from the asymptotic behavior that are visible in the data. In view of Goldston's results [Go2] (mentioned in sections\ 2.4 and 2.6), it seems likely that such higher order terms depend on the pair correlation of zeros, and even on higher order correlations. .P The area between the empirical distribution curve for $N=10 sup 12$ in Fig.\ 2.10.1 and the normal curve is 0.132, while for $N=10 sup 20$ the corresponding area is 0.114. In both cases these areas are much larger than those for the distribution curves for $S(t)$ discussed in Section\ 2.4, which confirms the impression one obtains by comparing Fig.\ 2.4.1 to Fig.\ 2.10.1. .P Table\ 2.10.1 presents fairly extensive data on the moments of $log^|Z(t)|$. The six sets of data summarized in this table were all obtained by choosing $10 sup 6$ random points in an interval of length $1.5 times 10 sup 5$. For $N=10 sup 12$, this interval started near zero number $10 sup 12^-^6,^032$. For $N=10 sup 18 (a)$ and $N=10 sup 18 (b)$, the intervals were the same, starting near zero number $10 sup 18 ^-^ 8,^839$ but the random sequences were different, since different seeds were chosen for the pseudorandom number generator. This was done to show the size of the sampling error. For $N=10 sup 20 (c)$, the starting point was near zero number $10 sup 20 ^-^ 48,^776$, while for $N=10 sup 20 (d)$, it was near zero number $10 sup 20 ^+^15,^316,^087$. The mean and second moment for each data set are shown in the $k=1 sup star$ and $k=2 sup star$ entries, respectively. These were then applied to translate and scale the data sets so as to obtain mean equal to 0 and variance equal to 1, for ease of comparison with the standard normal distribution. The $k$-th entry in the table, $1^<=^k^<=^10$, given the \%$k$-th moment of each scaled data set, and the last column gives the corresponding value for the normal distribution (0 for $k$ odd, $(k-1) ^cdot^ (k-3)^cdot^"..."^cdot^3^cdot^1$ for $k$ even). .P Given that the distribution of $log^|Z(t)|$ differs so much from the expected normal one, one has to treat the data about moments of $|Z(t)|$, for example, with extreme caution, as they may not be very representative of true asymptotic behavior. Furthermore, the general distribution of $Z(t)$ may be even less representative of what happens higher up. .P Figure\ 2.10.2 presents some empirical data on values of $Z(t)$. This figure is based on the values of $Z(t)$ in the three intervals covering $2.8 times 10 sup 6$ zeros that were described in the preceding section. For each interval between consecutive zeros, the function $|Z(t)|$ was approximated on 40 equal-sized subintervals by a linear function, and the length of the interval on which this linear approximation was in each range $[k-1,^k)$ for $k^>=^0$ was computed. If $A sub k$ denotes the length of all the intervals on which the linear approximations were in $[k-1,^k)$, and .DS 2 .EQ q sub k ~=~ {A sub k} over {sum from k=1 to inf ~A sub k} .EN .DE the fraction of time spent there, then the plot in Fig.\ 2.10.2 shows $log^q sub k$. From this graph and other graphs based on the data from each of the three main intervals separately, it appears that for $k^wig^250$, the behavior of $log^q sub k$ is dominated by a few large peaks of $|Z(t)|$ (which also account for a large part of the values of high moments of $Z(t)$ dealt with in the previous section). In particular, the segments of the graph in Fig.\ 2.10.2 that shoot up are due to high peaks, with the final region $(k^>=^353)$ due to two peaks where $|Z(t)|$ reaches the neighborhood of 460, and the preceding region of increase in $log^q sub k$ being due to a point where $|Z(t)|$ is around 351. .H 2 "Values of $fat {zeta sup prime} bold {( 1/2 + i fat gamma )}$" Under the assumption of the RH and of a weak consequence of the pair correlation conjecture, namely that for some $tau ^>^0$, there is a constant $B$ such that .DS 2 .EQ (2.11.1) roman {"lim sup"} from {N^->^inf} ~ 1 over N ^left |^"{" n^:~ N^<=^n^<=^2N ,~~delta sub n ^<^c "}" right | ~<=~ B c sup tau .EN .DE holds uniformly for all $c^member^(0,^1)$, Hejhal [Hej6] has shown that for all $alpha ^<^ beta$, .DS 3 .EQ (2.11.2) lim from {N^->^inf} ~1 over N ^left | ^left { n^:~ N^<=^n^<=^2N ,~~ {log left | ^{2 pi Z sup prime ( gamma sub n )} over {log ( gamma sub n ( 2 pi ) sup -1 )} right |} over {( 2 sup -1 ^log^log^N) sup 1/2}~member~( alpha ,^beta ) right }^ right | ~=~ ( 2 pi ) sup -1/2 int from alpha to beta ^e sup {- x sup 2 / 2} dx ^. .EN .DE (Note that under the RH, which we assume throughout this section, $| zeta sup prime ( rho ) |^=^ |Z sup prime ( gamma )|$ for $rho = 1/2 + i gamma$.) .P As is the case with the values of $Z(t)$, we would like to obtain more information about the tails of the distribution of $Z sup prime ( gamma sub n )$, and in particular about the moments. Let us define .DS 2 .EQ (2.11.3) J sub lambda (T) ~=~ sum from {pile {n above gamma sub n ^<=^T}} ~| Z sup prime ( gamma sub n ) | sup {2 lambda} ^. .EN .DE Then $J sub lambda (T)$ exists for all $lambda ^>=^0$, and if the zeros of the zeta function are all simple (as they are conjectured to be, and as is the case with all of the zeros that have been computed) then $J sub lambda (T)$ also exists for $lambda ^<^0$. The only nontrivial proved asymptotic result is due to Gonek [Gon1] under the assumption of the RH; .DS 2 .EQ J sub 1 (T) ~wig~ T over {24 pi} ~ ( log ^T) sup 4 ~~~roman as ~~~ T^->^inf ^. .EN .DE It is trivial that $J sub 0 (T) ^wig^T over {2 pi}^log^T$ as $T^->^inf$, and it is known (cf. [Tit2; Section\ 14.27]) that $J sub {- 1/2} (T) /T^->^inf$ as $T^->^inf$. Gonek [Gon3] has also shown that (under the RH) $J sub -1 (T)^>=^cT$ for some $c^>^0$. If the limit law (2.11.2) holds fairly well even for small $N$, and the tails of the distribution of $Z sup prime ( gamma sub n )$ are not too large, then we might expect (as was suggested by Hejhal [Hej6] and stated explicitly by Gonek [Gon3]) that $J sub lambda (T)$ is on the order of .DS 2 .EQ (2.11.4) T( log^T ) sup {( lambda +1) sup 2} ~~~roman as ~~~ T^->^inf ^. .EN .DE Furthermore, Gonek [Gon3] has conjectured that .DS 2 .EQ (2.11.5) J sub -1 (T) ~wig~ 3 pi sup -3 T ~~~roman as ~~~T^->^inf ^. .EN .DE .P Approximate values were obtained for $|Z sup prime ( gamma sub n )|$, $n sub 0 ^<=^n^<=^n sub 0 ^+^10 sup 6 -1$, where $n sub 0 ^=^ 10 sup 20 ^+^15,^316,^107$. Since the behavior of $Z(t)$ is determined primarily by zeros close to $t$ (cf.\ [BH, Hej6]), it was assumed that for $t$ near $gamma sub n$, $Z(t)$ is approximated well by .DS 2 .EQ (2.11.6) a ~ prod from j=-20 to 20 ~ (t- gamma sub n+j ) ^, .EN .DE where $a$, representing the influence of zeros far away from $gamma sub n$, is almost a constant, and this led to approximating $Z sup prime ( gamma sub n )$ by .DS 2 .EQ (2.11.7) epsilon sup -1 ^Z( gamma sub n + epsilon ) ~ prod from {pile {j=-20 above j^!=^0}} to 20 ~ {gamma sub n+j} over {gamma sub n + epsilon - gamma sub n+j} ^, .EN .DE where $epsilon ^=^ ( gamma sub n+1 - gamma sub n ) /40$. Varying the number of terms in the heuristic approximation (2.11.6) as well as varying $epsilon$ suggested that (2.11.7) does produce good approximation to $Z sup prime ( gamma sub n )$. .P The smallest value of $| Z sup prime ( gamma sub n )|$ that was found was 0.13, while the largest was $2.47 times 10 sup 3$. The values of $log ^|Z sup prime ( gamma sub n )|$ had a mean of 3.35 and a variance of 1.14, in contrast to 1.91 and 1.9, respectively, which are predicted by Hejhal's result (2.11.2). Given the slow rate of growth of these quantities, second order terms in the asymptotic results are likely to be comparatively large, so this difference between expected and observed values is probably not significant. If we let .DS 2 .EQ (2.11.8) v sub n ~=~ ( log ^| Z sup prime ( gamma sub n )| ^-^ m ) / sigma ^, .EN .DE where $m$ is the mean and $sigma$ the standard deviation of our set of $log ^|Z sup prime ( gamma sub n )|$, then Fig.\ 2.11.1 shows a comparison of the distribution of $v sub n$ with the standard normal distribution. The line is the standard normal density, while the scatterplot represents a histogram of $v sub n$; for each interval $[ alpha ,^beta )$, $alpha ^=^ k/50$, $beta ^=^ alpha + 1/50$, a star is placed at $(x,^y)$, $x^=^(( alpha + beta ) /2-m)/ sigma$, $y ^=^sigma ^ b sub {alpha ,^beta}$, where .DS 2 .EQ (2.11.9) b sub {alpha ,^beta} ~=~ 50 over {10 sup 6} ^left | ^"{" n^:~ v sub n ^member^[ alpha ,^beta ) "}" right | ^. .EN .DE It is worth noting that the distributions of $log ^| Z (t)|$ and $log ^|Z sup prime ( gamma ) |$ are both supposed to be asymptotically normal, but the convergence appears much faster for $log ^| Z sup prime ( gamma )|$, as is revealed by a comparison of Fig.\ 2.11.1 to Fig.\ 2.10.1. This is true even though the asymptotic normality of $log ^|Z(t)|$ is an unconditional theorem, while that of $log^|Z sup prime ( gamma ) |$ depends on unproved assumptions. .P Table\ 2.11.1 gives the moments of the $v sub n$ and of the asymptotic normal distribution. The entry for $k=1 ,^dd ,^10$ denotes the \%$k$-th moment of $v sub n$ and the normal distribution, while the $k=1 sup star$ and $k=2 sup star$ entries give the first two moments of $log ^|Z sup prime ( gamma sub n ) |$, respectively. Comparison with Table\ 2.10.1 again shows much better agreement between empirical and expected values for $log^|Z sup prime ( gamma sub n ) |$ than for $log^|Z(t)|$. .P Moments of $Z sup prime ( gamma sub n )$ for the $10 sup 6$ values that were computed are shown in Table\ 2.11.2. Since (2.11.4) suggests that for $M$ relatively large, .DS 2 .EQ (2.11.10) J sub lambda sup star (M,^N) ~=~ 1 over M ~ sum from n=N+1 to N+M ~ | Z sup prime ( gamma sub n ) | sup {2 lambda} .EN .DE ought to be of the order of magnitude of .DS 2 .EQ ( log ^gamma sub N ) sup {( lambda +1) sup 2 -1} ^, .EN .DE while for $lambda =1$ and $-1$ we ought to have the more precise relations .DS 3 .EQ (2.11.11) J sub 1 sup star (M,^N) mark ~wig~ 1 over 12 ~ ( log ^gamma sub N ) sup 3 ^, .EN .sp .EQ (2.11.12) J sub -1 sup star (M,^N) lineup ~wig~ 6 pi sup -2 ~ ( log ^gamma sub N ) sup -1 .EN .DE (the asymptotic relations holding as $M,^N^->^inf$ with $M$ relatively large). Table\ 2.11.2 shows the ratio of empirical to expected values, namely .DS 2 .EQ (2.11.13) r sub lambda ~=~ J sub lambda sup star ( 10 sup 6 ,^n sub 0 -1 ) ( log ^gamma sub n sub 0 ) sup {1- ( lambda +1) sup 2} ^. .EN .DE .P The value for $lambda =1$ is in excellent agreement with (2.11.11) (which is a theorem under the assumption of the RH), while the value for $lambda =-1$ is reasonably consistent with that of (2.11.12). Since (2.11.12) is derived from Gonek's conjecture (2.11.5), this supports the conjecture. .P A theorem announced by Fujii [Fu4] (which assumes the RH) states that .DS 3 .EQ sum from {0^<^gamma ^<=^T} ~ zeta sup prime ( 1/2 + i gamma ) mark ~=~ T over {4 pi} ~ log sup 2 ~ T over {2 pi} ~+~ c sub 0 ~ T over {2 pi} ~ log~ T over {2 pi} .EN .sp .5 .EQ (2.11.14) ~~ .EN .sp .5 .EQ lineup ~+~ c sub 1 ~ T over {2 pi} ~+~ O(T sup {9/10^+^o(1)} ) .EN .DE as $T^->^inf$, where $c sub 0$ and $c sub 1$ are explicit constants. This turns out to be in excellent agreement with the empirical result .DS 2 .EQ (2.11.15) sum from {n=n sub 0} to {n sub 0 +10 sup 6 -1} ~ zeta sup prime (1/2 + i gamma sub n ) ~=~ 2.181 times 10 sup 6 ~+~ i ^8.7 times 10 sup 3 ^. .EN .DE .P The approximate procedure that was used to evaluate $Z sup prime ( gamma sub n )$ can be replaced by a much more rigorous and accurate method. The algorithm of [OS] that was used to compute $Z(t)$ precomputes a set of values from which $Z(t)$ is obtained by interpolation. However, the main interpolation formula (4.3.15) can be differentiated with respect to $t$, which enables one to compute $Z sup prime (t)$ (and therefore also $zeta sup prime (1/2 + it )$) from the basic data. If such a program were written, it could be used to check the speed of convergence of the distribution of $log ^| zeta sup prime (1/2+ it)|$ to the gaussian limit that has been shown to hold under the assumption of the RH by Hejhal [Hej6]. .H 2 "Gram points and blocks" \f2Gram's law\f1 is the empirical observation that $Z(t)$ usually changes sign in each \f2Gram interval\f1 $G sub n ^=^[g sub n ,^g sub n+1 )$, $n^>=^-1$. (The Gram points $g sub n$ are defined in Section\ 2.0.) Gram [Gram] observed that it held in the range of values he investigated, but he conjectured that it would fail eventually. The first counterexample occurs for $G sub 125$, and was discovered by Hutchison [Hu]. If Gram's law held universally, the RH would be true. However, it is known that this ``law'' fails infinitely often. On the other hand, it does hold for a large fraction of cases. For $n^<=^1.5 times 10 sup 9$, Gram's law holds 72.79% of the time [LRW2], among $10 sup 6$ Gram intervals near zero number $10 sup 12$, it holds 70.82% of the time, and among $10 sup 6$ Gram intervals near zero number $10 sup 20$, it holds 68.9% of the time. (Under the GUE and some further assumptions to be discussed later, one might expect that asymptotically, Gram's law would hold 66.3% of the time.) .P One barely plausible reason why Gram's law might hold (and why the RH might hold) is that in the Riemann Siegel formula for $Z(t)$ (see Eq.\ (4.1.2)) the leading term equals $2(-1) sup n$ at $t=g sub n$. If this term, which is the largest, were truly dominant, then Gram's law and the RH would follow. We now know this to be false, but there is still a lot of interest in the behavior of $Z(t)$ at Gram points, since sign changes of $Z(t)$ correspond to zeros of the zeta function on the critical line. .P A Gram point $g sub n$ is called \f2good\f1 if $(-1) sup n ^Z(g sub n )^>^0$, and \f2bad\f1 otherwise. A \f2Gram block\f1 is an interval $B sub n ^=^[g sub n ,^g sub n+k )$ such that $g sub n$ and $g sub n+k$ are good Gram points, while $g sub n+1 ,^dd ,^g sub n+k-1$ are bad Gram points. The \f2length\f1 of a Gram block $B sub n ^=^ [g sub n ,^g sub n+k )$ is $k$. The \f2pattern of zeros\f1 in a Gram block $B sub n ^=^[g sub n ,^g sub n+k )$ is the string $a sub 1 ^...^a sub k$, where $a sub i$ denotes the number of zeros of $Z(t)$ in $[g sub n+i-1 ,^g sub n+i )$. Since no Gram interval with more than 4 zeros has yet been found, writing $a sub 1 ^...^a sub k$ without comma separators is unambiguous. (Gram intervals with arbitrarily many zeros almost surely exist, but given the GUE predictions about zeros repelling each other, they are likely to be very rare.) .P The statistics that have been collected on Gram intervals and blocks (as well as on exceptions to Rosser's rule, which are discussed in Section\ 2.13) are subject to errors, not only due to the roundoff problems that have been mentioned before and are discussed extensively in Section\ 4, but also to the fact that even if the computations of $Z(t)$ were exact, Gram points were determined only approximately, so that the determinations of the signs of $Z(g sub n )$ were not certain. No special precautions were taken to deal with this problem (such as checking on the size of the computed value of $Z(g sub n )$) as it was felt that this was unlikely to affect general statistics. .P The computations of [LRW2] of the first $1.5 times 10 sup 9$ zeros found only 6 Gram blocks of length 9, and none of lengths $>=^10$. In contrast, the maximal lengths of Gram blocks found during the present computations were 9 for $N=10 sup 12$, 9 for $N=10 sup 14$, 11 for $N=10 sup 16$, 13 for $N=10 sup 18$ (1 case), 12 for $N=10 sup 19$, and 14 for $N=10 sup 20$ (1 case, with zero pattern $0111...13110$) .P Table\ 2.12.1 gives the fraction of Gram blocks in given data sets that had given lengths. The $N=1$ and $N=1.4 times 10 sup 9$ data is derived from Table\ 1 of [LRW2], and comes from two sets of $10 sup 8$ Gram intervals each, the first one starting at $g sub 0$, the second at $g sub n$ for $n=1.4 times 10 sup 8$. The $N=10 sup 12$ data is based on only 1,\|590,\|000 Gram interval. .P The main program did not keep track of Gram blocks according to their pattern of zeros. However, a special study was made of two blocks of $10 sup 6$ Gram intervals each, one starting at $g sub n sub 1$, $n sub 1 ^=^ 10 sup 12 - 6,^034$, the other at $g sub n sub 2$, $n sub 2 ^=^10 sup 20 - 42,^780$, which for the remainder of this section will be referred to as the $N=10 sup 12$ and $N=10 sup 20$ data sets, respectively. .P If a Gram block $B(n,^k)$ contains exactly $k$ zeros (so is not associated with a violation of Rosser's rule, see Section\ 2.13) then its zero pattern must be either $211...110$, or $011...112$, or $011...131...110$ (where any number of 1's in the indicated pattern might be missing). Van\ de Lune et\ al. [LRW2] noted in their computations that for a fixed $k$, the first two zero patterns seemed to be much more frequent than the third one, and that the frequencies seemed stable as the height of zeros increased. The new computations, however, show a steady decrease in the frequency with which the third pattern appears. Table\ 2.12.2 shows the actual numbers. The $N=1$ entry is drawn from Table\ 2 of [LRW2], which is based on statistics of 3 sets of $10 sup 8$ Gram intervals each, starting at $g sub 0$, $g sub {7 times 10 sup 8}$, and $g sub {1.4 times 10 sup 9}$. In all cases only Gram blocks of length $k$ with exactly $k$ zeros are considered, and the entry in the table gives the fraction of all such Gram blocks that have a zero pattern with a 3 in it. The decrease in the frequency of the third zero pattern is rather puzzling. The GUE theories suggest that this pattern ought to occur a positive proportion of the time. .P Table\ 2.12.3 presents data on the fraction of Gram intervals that contain a given number of zeros. The $N=1$ and $N= 1.4 times 10 sup 9$ data sets are the same as in Table\ 2.12.1, and these entries come from Table\ 5 of [LRW2]. Note that there were no Gram intervals with $>=^4$ zeros in the $N=10 sup 12$ and $N=10 sup 20$ sets (although such intervals did turn up in other data sets around the $10 sup 20$-th zero, for example). .P The GUE entry in Table\ 2.12.3 comes from assuming that a Gram interval does not differ from any other interval of that length, and so the entry in the table for a given $m$ in the GUE row is the probability that an interval of length 1 contains exactly $m$ zeros. Since the averages of $S(t)$ do increase as $t$ increases, it seems reasonable to expect that at large heights the