%  FILE: book.toc

\contentsline {chapter}{\numberline {1}Introduction}{1}
\contentsline {chapter}{\numberline {2}Large sets of zeros: conjectures and statistics}{9}
\contentsline {section}{\numberline {2.1}Notation and definitions}{9}
\contentsline {section}{\numberline {2.2}Validity of the RH and correctness of the computational results}{13}
\contentsline {section}{\numberline {2.3}Eigenvalues of random matrices and zeros}{14}
\contentsline {section}{\numberline {2.4}General distribution of gaps between zeros}{20}
\contentsline {section}{\numberline {2.5}Values of $S(t)$}{24}
\contentsline {section}{\numberline {2.6}Extreme gaps between zeros}{30}
\contentsline {section}{\numberline {2.7}Long and short range correlations between zeros}{34}
\contentsline {section}{\numberline {2.8}Lehmer phenomenon}{40}
\contentsline {section}{\numberline {2.9}Large values of $\zeta (1/2+it)$}{43}
\contentsline {section}{\numberline {2.10}Moments of $\zeta (1/2 + it)$}{46}
\contentsline {section}{\numberline {2.11}Distribution of values of $\zeta ( 1/2 + it)$}{49}
\contentsline {section}{\numberline {2.12}Values of $\zeta ' (1/2 + i \gamma )$}{53}
\contentsline {section}{\numberline {2.13}Gram points and blocks}{57}
\contentsline {section}{\numberline {2.14}Violations of Rosser's rule}{61}
\contentsline {chapter}{\numberline {3}Special points for the zeta function}{65}
\contentsline {section}{\numberline {3.1}Introduction}{65}
\contentsline {section}{\numberline {3.2}Computational results}{66}
\contentsline {section}{\numberline {3.3}Diophantine approximation algorithms and special points}{69}
\contentsline {section}{\numberline {3.4}Possible extensions}{71}
\contentsline {chapter}{\numberline {4}Algorithms and their implementation}{75}
\contentsline {section}{\numberline {4.1}Introduction}{75}
\contentsline {section}{\numberline {4.2}Zero-locating program}{77}
\contentsline {section}{\numberline {4.3}Odlyzko-Sch\"{o}nhage algorithm}{80}
\contentsline {section}{\numberline {4.4}Band-limited function interpolation}{88}
\contentsline {section}{\numberline {4.5}Space and time requirements}{93}
\contentsline {section}{\numberline {4.6}Correctness of computational results}{95}
\contentsline {section}{\numberline {4.7}Possible improvements}{106}
\contentsline {subsection}{\numberline {4.7.1}Faster and more accurate computations}{107}
\contentsline {subsection}{\numberline {4.7.2}Greengard-Rokhlin algorithm}{112}
\contentsline {subsection}{\numberline {4.7.3}Computations of low zeros}{116}

%  FILE: preface.tex

\thispagestyle{empty}
\noindent{\Huge {\bf Preface}}

\vspace{.4in}

This monograph presents the results of a computation of over 175 million
consecutive zeros of the Riemann zeta function near zero number $10^{20}$, as well
as of several other large sets of high zeros,
including some near zero number $2 \times 10^{20}$.
These zeros lie about $10^8$ times higher than previously
calculated large sets of zeros, and 
their computation was made possible by 
a fast
algorithm invented by A. Sch\"{o}nhage and the author.
Although the present implementation of this algorithm
is
not
entirely rigorous
due to incomplete control of roundoff errors,
it appears to be highly accurate as well as fast,
and the results indicate that
all the computed zeros satisfy the Riemann Hypothesis.
Various statistical studies of these zeros are presented.
Some of
them
provide numerical evidence about conjectures that go even beyond the Riemann Hypothesis,
and relate the distribution of zeros of the zeta function
to that of eigenvalues of random matrices studied
extensively in physics.
Other studies compare the observed behavior of the zeta function
to known asymptotic estimates.
The computations described in this book were carried
out
on a Cray X-MP supercomputer.

%  FILE: ch1.tex

\chapter{Introduction}\label{sc1}
\hspace*{\parindent}
The $10^{20}$-th zero of the Riemann zeta function equals
$$1/2 + i~ 1
5202440115920747268.6290299 \dd~ .$$
It and
a few of its nearest neighbors are shown in Table~1.1.
All told, almost 176 million zeros near the $10^{20}$-th zero were computed,
as wee as over 101 million zeros near zero number $2 \times 10^{20}$.
These zeros lie almost $10^8$ times higher than any other large
sets of zeros that had been computed before.
This monograph
reports statistics of these and some other high zeros
and describes the algorithms that made these calculations possible

The Riemann Hypothesis (RH) has been subjected to a series of numerical
investigations, starting with unpublished ones by Riemann.
(See \cite{Ed,Od3}
for a history of these computations.)
The latest result is that the RH is true for the first $1.5 \times 10^9$ zeros
(i.e., all zeros up to height $\preceq  5 \times 10^8$)
\cite{LRW2}.
This computation required about 1500 hours on modern
supercomputers (primarily the Cyber~205).
It only separated the zeros, and did not produce accurate
values for them.
The reason for not obtaining values of zeros was that the investigations in this case were very concerned with establishing the validity of the RH, and for that purpose,
as was explained earlier, it is only necessary to separate the zeros of $Z(t)$.
Several other
large sets of zeros (four sets of roughly $10^5$ zeros in each case,
starting with zeros number $10^{10}$, $10^{11}$, $2 \times 10^{11}$,
and $10^{12}$)
have also been computed accurately \cite{Od2}.
Those computations took several tens of hours on Cray-1 and Cray~X-MP supercomputers, and produced values of the zeros that are accurate
to within about $10^{-8}$.
(The $10^{12}$-th zero equals $1/2 + i \gamma$, with $\gamma \approx 2.7 \times 10^{11}$.)
The purpose of those calculations of zeros was partially to check
the validity of the RH,
but the primary goal (and the reason for obtaining
accurate values of the zeros) was to
obtain data about the
distribution of spacings between zeros of the zeta function
so as to compare them to some recent conjectures.
These conjectures, which are described briefly in Section~\ref{sc2} and which go substantially beyond the RH,
originate in the Montgomery pair correlation conjecture, and relate the
behavior of the zeta function zeros to eigenvalues of random
hermitian
matrices that are used to model energy levels in 
many-particle systems in physics, and to the quantum chaos theories.
Agreement between these conjectures and computed values of zeros might
be taken as providing some support for the Hilbert and P\'{o}lya
conjecture that the RH is true because
the zeros of the zeta function correspond to eigenvalues of some positive
operator.
While all these conjectures are highly
speculative, it seemed worthwhile to test them numerically.
As it turned out, the agreement between conjectures and empirical data was excellent in most cases.
A few features of the data that were initially unexpected were in the end
explained by relating the behavior of the zeros to that of the primes.
However, there were some features of the data that were slightly
counterintuitive (such as a slight excess of
small spacings between consecutive zeros over that predicted by the random matrix theories), and so it seemed desirable to obtain data about even higher
zeros of the zeta function.

All large-scale computations of zeros of the zeta function in the last fifty-odd years (as well as Riemann's own unpublished computations \cite{Ed,Od3,Sie1}) relied on the Riemann-Siegel formula \cite{Ed,Sie1,Gab},
which requires
roughly $t^{1/2}$
operations to compute
$\zeta ( 1/2 + it )$ for $t$ a large positive real number.
Recently a much more efficient algorithm for computing the zeta function
was invented by
A. Sch\"{o}nhage and the author
\cite{Od1,OS}.
It enables one to compute all the approximately $T^{1/2}$
zeros of $\zeta ( 1/2 + it)$ in an interval
$T \le t \le T + T^{1/2}$ in about $T^{1/2}$ steps.
(This algorithm is described in detail in Section~\ref{sc4}.
At this point we only note that the above description assumes that
the RH is satisfied by the zeros between heights $T$ and $T + T^{1/2}$, and
that in addition these
zeros are simple and well separated.
All of these conditions are satisfied in all the ranges that have been investigated.)
The new algorithm has now been implemented, and used to compute the zeros described in this paper.
It turns out to be fast in practice as well as in theory,
and for computing large sets of zeros around the $10^{20}$-th zero was at
least $10^5$ times faster than the straightforward application of the Riemann-Siegel formula.
The computations described in this book took about 2000 hours of (otherwise idle) time on a
Cray~X-MP supercomputer, so they were substantial.
However, without the new algorithm they would have been totally infeasible.

The computations that have been carried out with the
new algorithm of \cite{OS}, and that form the basis for this paper, had several goals.
The first was to test the RH numerically.
If the RH is false, then
counterexamples
are probably more likely to be found
at large heights than closer to the origin, since the behavior of the zeta function
is very constrained at low heights.
As it turned out, no counterexample was found.
Another, more important goal, was to extend the numerical studies of \cite{Od2} by
computing accurate values of large sets of high zeros to provide
additional numerical checks on various conjectures about the zeros,
especially about the frequency of occurrence of small gaps.
If the slightly excessive frequency of small gaps that was observed in \cite{Od2}
were to occur again at greater heights, that would cast doubt on many of the
conjectures that have been made.
The latest computations show excellent agreement with these
conjectures
in almost all the measured statistics.
The
excess of small spacings found in \cite{Od2} is still somewhat
ambiguous, though, as will be described in Section~\ref{sc26}.

Another reason for the computations
of this monograph was to produce
various statistics about the zeta function at large heights.
One advantage of the algorithm of \cite{OS} is that once the main step of the computation
is done, it is easy to compute individual values of the zeta function in the covered ranges, and collect many statistics.
Such statistics can be used to test various conjectures (about mean values of the zeta function, for example) and to judge how fast the zeta function approaches its asymptotic behavior.
Some of the statistics presented in Section~\ref{sc2} show amazingly fast convergence to the
asymptotic behavior, while others are far from it.
It is remarkable that
the most noticeable difference between observed and
expected behavior occurs in the study of the distribution of
values of $\log | \zeta (1/2 + it) | $, which in the ranges
that have been investigated is rather far from the normal
distribution that has been rigorously proved to hold
asymptotically (see Section~\ref{sc211} and Figure~2.11.1).
On the other hand, many of the unproved conjectures 
are supported by the numerical data to a surprising degree
(see, for example, Figures~2.4.5 and 2.12.1).

The main conclusion that can be drawn from the data in this paper
is that in many respects the zeta function reaches its asymptotic behavior
slowly, so that even the neighborhood of the $10^{20}$-th zero does not
represent
what happens much higher.
That this slow convergence is observed is not too surprising.
For example, one important question (see Section~\ref{sc29}) concerns the maximal size of the zeta function on the critical line.
It is known that
$$| \zeta (1/2 + it ) | = O(t^\alpha ) $$
for constants $\alpha$ that are a bit less than 1/6, while on the RH one would have
$$| \zeta ( 1/2 + it ) | \le t^{o(1)} ~~\mbox{as}~~  t \to \infty ~.$$
(This is the Lindel\"{o}f conjecture.)
It would be desirable to produce convincing numerical evidence about 
the precise maximal size of $ \zeta ( 1/2 + it ) $.
However, this is hard to do.
The main difficulty is that
near the $10^{20}$-th zero, one has $t \approx 1.5 \times 10^{19}$,
so that $t^{1/6} \approx 1570$, while $( \log t)^2 \approx 1950$,
so it is even hard to distinguish between these two functions that have
entirely different rates of growth.
(Throughout the paper, $\log x$ denotes the natural logarithm of $x$.)

Some of the data from the present computations might also be useful
in other number theoretic investigations.
For example, the
Stark method \cite{St}
for obtaining
lower
bounds for imaginary quadratic number fields with small class numbers depends on
knowledge of pairs of zeros of the zeta function that are very close together.
(Another method for bounding class numbers, that of Montgomery and Weinberger \cite{MW}, depends on zeros of Dirichlet $L$-functions.)

The final
reason for the computations of this paper was to prove
that the new algorithm of \cite{OS} is of practical use, and not just a
theoretical curiosity.
Since this algorithm is complicated, this was not obvious
to start with, and a large section of this paper is devoted to a description
of its implementation, including various modifications that were made to the basic algorithm described in \cite{OS}.
As it turns out, the algorithm is fast, over $ 10^5 $ times
faster than the older algorithms would have been near the
$10^{20}$-th zero.
Moreover,
work on this implementation has suggested many additional modifications, described
in Section~\ref{sc47}, which can probably speed up the algorithm by
another order of magnitude.

The main sets of zeros
that were computed
are listed in Table~1.2.
The entry for $N=10^{20}$, for example, means that 175,587,726 zeros were
computed, starting with zero number $10^{20} - 30, 769, 710$, and ending with zero number
$10^{20} +  144, 818, 015$, and that all these zeros are of the form
$1/2 + i \gamma$ with $\gamma  \approx 1.5 \times 10^{19}$.
Throughout the paper, references to the $N= 10^{20}$ data set will denote
these 175,587,726 zeros
or some subset of them
and similarly for the $N=10^{19} , \dd ,$ data sets.

The starting points for the large data sets listed in Table~1.2 were chosen to be near zeros of round order (such as $10^{20}$), to be easy to refer to.
It was thought that as far as the distribution of zeros is concerned, these
intervals would behave like random ones.
One can also
concentrate on investigating the behavior of $\zeta (1/2 + i t)$ near those $t$ where the zeta function might be expected to behave in an unusual
fashion (e.g., where it is large).
Some such special values of $t$ were found, and the computations
that were carried out there are listed in Tables~3.1.1 and 3.1.2.
(A full explanation of the entries in these tables is given in Section~\ref{sc3}.)
These computations produced many values of the zeta function and of gaps between
zeros that are current
the largest ones known.

While the computations that are described in this book did yield values of zeta
function zeros at much greater heights than would be feasible with older methods, they do have one
serious defect, namely that they are not rigorous.
The validity of the values for the zeros that have been computed (and also of the assertion that all these zeros satisfy the RH)
depends on
the assumption that substantial
cancellation among the
roundoff errors takes place.
This is due largely to the extremely large sizes of the numbers being handled,
and not so much to the new algorithm,
and is explained in detail in Section~\ref{sc4}.
At this point we only mention that the values of zeros that have been obtained are believed
to be accurate to within $\pm 10^{-6}$ or even better
for $N= 10^{20}$.
This belief is based partially on the expected cancellation of errors in the computation.
The strongest argument for the validity of the computations, however,
comes from
several large sets of zeros
which
were computed twice, in
entirely different ways.
That
the numbers being computed were the same follows only
from deep mathematical analysis, and is not obvious from the numbers being
processed.
The resulting duplicate
values for the zeros agreed to the expected degree,
and this is a
strong argument in favor of the validity of the
computations.
These issues are discussed in greater detail in Section~\ref{sc46}.

The remainder of this monograph is organized into three sections.
Section~\ref{sc2} recalls the basic definitions and conjectures, and then presents
the statistics of the large sets of zeros given in Table~1.2.
Section~\ref{sc2} is organized into subsections on a variety of topics, such as large values of the zeta function,
large and small gaps between consecutive zeros, and many others.

Section~\ref{sc3} is devoted to the zeros listed in Table~3.1.1.
First the statistics of these zeros and of various
properties of the zeta
function in those ranges are presented.
Then some
simultaneous Diophantine approximation algorithms (based on the
Lov\'{a}sz
lattice basis reduction algorithm \cite{LLL})
are described, as well as the
ways in which they have been used to produce the points of Table~3.1.1 where the zeta function was expected
to behave pathologically,
and where it does indeed exhibit unusual behavior.

Section~\ref{sc4} describes
the algorithms and computations on which the results
of this monograph are based.
First the basic algorithm of
\cite{OS} is briefly surveyed, and then various
modifications
to it are described.
(Some are minor, while others, such as the use of band-limited function
interpolation, are much more substantial.)
A discussion of various additional modifications that can be utilized in the future
is included
(such as the replacement of the crucial rational function evaluation algorithm of \cite{OS} by somewhat similar algorithms that have been proposed in the context of
astrophysical and fluid dynamics simulations \cite{GR1}, or ways to 
obtain more rigorous results).
There is also a large subsection on the accuracy and validity of the computations of this paper.

%  FILE: ch2.tex

\chapter{Large sets of zeros: conjectures and statistics}\label{sc2}
\section{Notation and definitions}\label{sc20}
\hspace*{\parindent}
The trivial zeros of the zeta function are $-2, -4, -6, \dd$.
We will consider only the
{\em nontrivial zeros},
which lie in the critical strip $0 < \, \mbox{Re}\, (s) < 1$, and are
customarily denoted by $\rho$.
Since for every nontrivial zero $\rho$, $\overline{\rho}$ is also a zero, we will consider
only zeros $\rho$ with $\mbox{Im} \, ( \rho ) > 0$.
(There are no nontrivial zeros $\rho$ with $\mbox{Im} \, ( \rho ) = 0$.)
We number these zeros $\rho_1 , \rho_2 , \dd$ (counting each according to its
multiplicity) so that $0 < \mbox{Im} \, ( \rho_1 ) \le \mbox{Im} \, ( \rho_2 ) \le \dd$.
All the zeros
that have been computed so far are simple and lie on the critical line,
and so can be written as $\rho_n = \frac{1}{2} + i \gamma_n$, $\gamma_n \in R^+$, with
$\gamma_1 = 14.134725 \dd$, $\gamma_2 = 21.022039 \dd$,
$\gamma_3 = 25.010857 \dd$, etc.
In many definitions throughout the paper we will be tacitly assuming that the RH
holds, as otherwise those definitions might not make sense.

Let $N(t)$ denote the number of zeros $\rho$ with $0 < \mbox{Im} \, ( \rho ) \le t$
(counted according to their multiplicity).
Then it is known unconditionally \cite[Chapter~9.3]{Tit2} that
\beql{eq201}
N(t) = \frac{t}{2 \pi}  \log  \frac{t}{2 \pi e} +
O ( \log  t) ~~~\mbox{as} ~~~t \to \In ~.
\eeq
Therefore $\gamma_n \sim 2 \pi n / ( \log n)$ as $n \to \In$.
Since the zeros become denser as the height increases,
and the average vertical spacing between zeros at height $t$ is asymptotic to
$2 \pi / ( \log ( t/(2 \pi )))$, we define the normalized spacing between
consecutive
zeros
$1/2 + i \gamma_n$ and $1/2 + i \gamma_{n+1}$ to be
\beql{eq202}
\delta_n = ( \gamma_{n+1} - \gamma_n ) 
\frac{\log ( \gamma_n / ( 2 \pi ))}{2 \pi} ~.
\eeq
(Here we are assuming that both zeros satisfy the RH.)
It then follows from \eqn{eq201} that the $\delta_n$ have mean value 1 in the sense that for any
positive integers $N$ and $M$,
\beql{eq203}
\sum_{n=N+1}^{N+M} ~ \delta_n = M + O ( \log (NM)) ~.
\eeq

For $t$ real and positive (as will be the case throughout the paper) we define
\beql{eq204}
\theta (t) = \arg [ \pi^{-it/2} \Gamma ( 1/4 + it/2 ) ] ~,
\eeq
where the argument is defined by continuous variation of $s$ in $\pi^{-s/2} \Gamma ( s/2 )$, starting at $s=1/2$ and going up vertically.
We also let
\beql{eq205}
Z(t) = \exp (i \theta (t)) \zeta ( 1/2 + it ) ~,
\eeq
so that $| Z(t) | = | \zeta ( 1/2 + it ) | $.
Then it follows from the functional equation of the zeta function that $Z(t)$ is real, and sign changes of $Z(t)$ correspond to zeros of
$\zeta (s)$ on the critical line.
Almost all calculations of the zeta function on the critical line compute
calculate $Z(t)$ and not $\zeta (1/2 + it )$ (cf.~Section~\ref{sc4}).
However, it is easy to derive one from the other.

The function $\theta (t)$ is monotonic increasing for $t \ge 7$.
For $n \ge -1$, we define the
$n$-th
{\em Gram point}
$g_n$ to be the unique solution $> 7$ to
\beql{eq206}
\theta ( g_n ) = n \pi ~.
\eeq
We have $g_{-1} = 9.666 \dd$, $g_0 = 17.845 \dd$, etc.
Gram points are about as dense as the zeros of $\zeta (s)$ (see Section~\ref{sc212} for a detailed discussion),
but are much more regularly distributed.
In graphs, by a
{\em Gram point scale} we will refer to labeling Gram point
$g_n$ by $n$ (or $n-M$ for some fixed $M$
as $n$ varies).
For example, Fig.~2.1.1 shows $Z(t)$ near zero number $10^{20}$.
Figure~2.1.3 shows $Z(t)$ over a somewhat wider range.

We let
\beql{eq207}
S(t)= \pi^{-1} \arg \zeta (1/2 + it ) ~,
\eeq
where the argument is defined by continuous variation of $s$ in $\zeta (s)$, starting at $s=2$, going up vertically to $s=2+it$, and then
horizontally to $s=1/2 + it$.
(This definition assumes that there are no zeros $\rho$ with $\mbox{Im} \, ( \rho ) = t$.)
The function $S(t)$ has jump discontinuities at heights equal to zeros.
We have
\beql{eq208}
N(t) = 1 + \pi^{-1} \theta (t) + S(t) ~,
\eeq
so that \eqn{eq201} is a consequence of the asymptotic expansion of $\theta (t)$ (which follows from Stirling's formula \cite{HMF})
\beql{eq209}
\theta (t) = \frac{1}{2} t \log (t/( 2 \pi e )) - \pi /8 + O ( t^{-1} ) ~~~\mbox{as} ~~~ t \to \In
\eeq
and the bound \cite[Theorem~9.4]{Tit2}
\beql{eq2010}
| S(t) | = O ( \log  t) ~~~\mbox{as} ~~~ t \to \In ~.
\eeq
Since $N(t)$ is an integer, and $\theta (t)$ is smooth, \eqn{eq208} shows that $S(t)$ jumps at zeros and decreases at a very steady rate
between zeros.
Figure~2.1.2 shows $S(t)$ over the same range of values of $t$ as
in Fig.~2.1.1,
near zero number $10^{20}$.
This range represents typical behavior of $S(t)$ at that height.
(For rare behavior of $S(t)$, see Fig.~3.2.3.)
The function $S(t)$ is of crucial importance in understanding the distribution
of zeros, and Sections~\ref{sc24}, \ref{sc212}, and \ref{sc213} are devoted largely to its properties.

In comparing empirical distributions of various functions, such as $S(t)$ and $\delta_n$, to their conjectured distributions, we will rely extensively
on comparing the moments of their distributions.
The method of moments has fallen into some disrepute in statistics because of its many faults, such as lack of robustness.
(For example, a single outlier in the data can have a large effect,
something we will see in our data.)
However, there are some good reasons for using it.
One is that it is easy to apply.
A more substantial one is that for many of the statistics of the zeta function, such as those of $S(t)$, or of $Z(t)$, computation of moments is currently essentially the only
known tool
that can be used to obtain
rigorous results.
In such cases moments provide the most direct way of comparing empirical distributions to theoretical results.

If a sequence of probability measures with distribution functions $F_n (x)$ is such that for every $k \ge 0$, the $k$-th moment
$$
\mu_n (k) = \int ~ x^k dF_n (x)
$$
converges to $\mu (k)$ as $n \to \In$, then there is a limiting measure with distribution
$F(x)$ whose $k$-th moment is $\mu (k)$.
Furthermore, if the $\mu (k)$ determine their measure uniquely,
and this measure has distribution function $F(x)$,
then the $F_n (x)$ converge to $F(x)$ (in the weak star sense)
\cite[pp.~342--353]{Bil}.
The $\mu (k)$ determine $F(x)$ uniquely if they do not grow too fast \cite{Bil},
\cite[pp.~227--228]{Fel},
so that the normal
distribution, for example, is characterized by its moments.
On the other hand, the log-normal distribution (distribution of $\exp ( \eta )$, where $\eta$ is normal) is not determined uniquely by its moments \cite{Bil},\cite{Fel}.

The standard normal distribution has the density function
\beql{eq2011}
f(x) = ( 2 \pi )^{- 1/2} ~e^{-x^2 /2} ~,
\eeq
with
mean 0 and variance 1.
Often
we will be dealing with quantities (such as $S(t)$) whose
known asymptotic distributions are normal, but which have variances on the order of
$\log \log N$
(for zeros near zero number $N$).
Since $\log \log N$ grows very slowly, it is to be expected that the observed data will
have somewhat different variances, as second order terms are likely to be substantial.
(For $N=10^{20}$,
$\log \log N = 3.82976 \dd$,
so even an additive constant of 1 in the estimate of the variance makes
a considerable difference.)
On the other hand, it is not too unreasonable to hope that the shape of the distribution should be close to the expected one.
To carry out such a comparison, we will often use
a {\em scaled and translated empirical distributions}.
If $x_1 , \dd , x_n$ are samples (of $\delta_m$, say, or other quantities) with mean $a$ and variance $v= \sigma^2$ (so that $\sigma$ is the
{\em standard deviation},
or
{\em rms value}),
\begin{eqnarray}
\label{eq2012}
a & = & \frac{1}{n} ~ \sum_{j=1}^n ~ x_j ~, \\
\label{eq2013}
v & = & \frac{1}{n} ~ \sum_{j=1}^n ~ (x_j -a )^2 ~,
\end{eqnarray}
then the scaled and translated values will be
\beql{eq2014}
x_j^{\ast} = (x_j -a ) / \sigma ~.
\eeq
The $x_j^{\ast}$ have mean 0 and variance 1.
The tables will usually list the $k$-th moment of $x_j^{\ast}$ in the $k$-th entry, but there will be entries giving the ordinary mean $a$
and ordinary variance $v$ that will be marked $k=1^{\ast}$ and $k=2^{\ast}$,
respectively.
In a few cases where the mean $a$ is extremely small, we will use $x_j^{\ast} = x_j / \sigma$.
(These cases will be easy to distinguish because the scaled 1-st moment will not be 0.)

Throughout
this paper, numbers that have ``$\dd$'' at the end are truncated to the form that is shown, while those without ``$\dd$'' are rounded, but the rounding is sometimes up and sometimes down.
Thus, for example, $\pi$ could be represented as 3.14159..., as 3.14159, or as 3.14160.
The log function will always refer to the natural logarithm.
References to maximal values of a function $f(x)$ will usually
mean the values of $f(x)$ for which $|f(x)| $ is maximal.

Constants such as $n_0 , n_1 , \dd$, will generally be different in
different sections, but will be the same within a section.
\section{Validity of the RH and correctness of the computational results}\label{sc21}
\hspace*{\parindent}
The main question about the validity of the computations described in this paper
has to do with size and cancellation of roundoff errors.
This issue is discussed in detail in Section~\ref{sc4}.
Even if we assume that roundoff errors are small
(as they seem to be), there remains
some further lack of rigor.
The set of zeros corresponding to $N=10^{12}$, for example,
is claimed to consist of exactly the zeros numbered $10^{12} - 6, 032$ to
$10^{12} + 1, 586, 163$.
Those 1,592,196 values are indeed zeros of the zeta function strictly between Gram points
of orders $10^{12} - 6, 034$ and $10^{12} + 1, 586, 162$,
provided all the computational steps were correct.
Given the degree of regularity in the locations of those zeros, a
theorem of Turing (see \cite[Theorem~3.2]{Br5}
for a modified and corrected version)
allows us to conclude, for example,
that the 21-st through the 1,592,176-th zeros in our set are indeed
zeros numbered $10^{12} - 6, 012$ through $10^{12} + 1, 586, 143$.
However, this theorem does not exclude the possibility that, for example, the interval
between Gram points $10^{12} - 6, 034$ and $10^{12} - 6, 014$ might contain
some additional zeros.
Since such additional zeros would violate either Rosser's rule (see Section~\ref{sc213})
or even the RH, they seem unlikely to exist,
and in any event would not affect most of the statistics to a noticeable extent,
and so
were assumed not to exist.

There are some further cases of nonrigorous computations in this paper.
For example, the conjectured distribution of the $\delta_n$ (see Section~\ref{sc22}) is complicated, and (as was done in \cite{Od2})
was computed using Van~Buren's program \cite{VB}, with some
modifications by S.~P. Lloyd and this author.
This program
uses an involved
combination of variational procedures and special
function expansions, and no rigorous error analysis for it is known,
although
it appears to be very accurate
(cf.~\cite{Od2}).

Other examples of nonrigorous computation are presented by various
piecewise linear approximations and other interpolation schemes used in the
following sections.
They are all thought to produce accurate results, but no proofs are available.
\section{Eigenvalues of random matrices and zeros}\label{sc22}
\hspace*{\parindent}
Over the last few decades, an extensive collection of results about eigenvalues of certain types of random
matrices has been obtained by mathematical physicists.
The aim of these investigations was to obtain insight into the distribution of energy levels in heavy nuclei, and recently their results have been applied to studies of
energy levels in other kinds of many-particle systems.
Some of the references for this field are
\cite{Be1,Be2,Be3,BG,BGS1,BGS2,BFFMPW,Meh,Por}.
Not only are there many beautiful and mathematically rigorous results
in this area, but there is also experimental evidence that these results do describe
the behavior of physical systems \cite{HPB}.
(Because of the difficulty of the experiments, the physical data, which was obtained through a major effort over the span of several decades, is
sparse and of poor quality compared to the data that can be obtained for the zeta function.)

The eigenvalue results that will be of greatest interest to us are those of the Gaussian unitary ensemble (GUE), which together with the Gaussian orthogonal ensemble (GOE) and the Gaussian symplectic ensemble (GSE) has been studied extensively.
The GUE consists of $n \times n$ complex Hermitian matrices of the form
$A = (a_{jk} )$, where $a_{jj} = 2^{1/2} \sigma_{jj}$,
$a_{jk} = \sigma_{jk} + i \eta_{jk}$ for $j < k$, and $a_{jk} = \overline{a}_{kj} = \sigma_{kj} - i \eta_{kj}$ for $j > k$,
where the $\sigma_{jk}$ and $\eta_{jk}$ are
independent standard normal variables.
(The GOE consists of real symmetric matrices defined similarly.)
The eigenvalues of these matrices are real, and it is their asymptotic distribution, as $n \to \In$, that is of interest.
If we denote the eigenvalues by $\lambda_1 \le \lambda_2 \le \cdots \le \lambda_n$, then
we have
the
{\em Wigner semi-circle law}:
if $M(x)$ denotes the expected number of eigenvalues $\le x$, then for all fixed real $x$,
\beql{eq221}
\lim_{n \to \In} n^{-1} M( x \sqrt{n} ) =
\left\{
\begin{array}{cl}
\df{1}{2 \pi} \dis\int_{-2}^x (4-u^2)^{1/2} du ~, & |x| < 2 ~, \\
~~ \\ [-.09in]
0~, & x \le -2 ~, \\
~~ \\ [-.09in]
1~, & x \ge 2 ~.
\end{array}
\right.
\eeq
This distribution law applies to much more general classes of matrices than those of the GUE and
related ensembles.
For the GUE
(and also
for the
GOE and GSE) a further step is
possible in that one can obtain precise information about the distribution
of spacings between consecutive eigenvalues.
The complete distribution of eigenvalues is known, and one can derive many limit laws.
To do that one normalizes the eigenvalues (basically by stretching the distance between consecutive eigenvalues $\lambda < \lambda'$ by a factor of
$(4n - \lambda^2 )^{1/2} / ( 2 \pi ))$
to make the average nearest neighbor spacing equal to 1.
With this normalization, the distribution of eigenvalues looks the same everywhere
(in the limit as $n \to \In$) and one can in principle
determine any desired statistic of the zeros.
(Doing so in practice means evaluating a definite multidimensional integral,
which is often hard, and gives rise to interesting problems.)
For example, if we use $w$ to denote a normalized eigenvalue in the GUE,
then one finds that for any fixed
$0 \le \alpha < \beta < \In$,
\beql{eq222}
\sE ( | \{ w' :~
w < w' , ~~
w' - w \in [ \alpha , \beta ] \} | ) \sim
\int_\alpha^\beta ~ \lt 1 - \lt
\frac{\sin \pi u}{\pi u} \rt^2 \rt du
\eeq
as $n \to \In$, where $\sE (z)$ is the expectation of $z$.
We say that $1 - ( ( \sin \pi u ) / ( \pi u ) )^2$ is the
{\em pair correlation function}
of the GUE.
(The pair correlation functions of the GOE and the GSE are different.)
Equation~\eqn{eq222} shows, for example, that it is rare for GUE eigenvalues to be close together.
If the $w$'s were obtained by choosing $n$ points independently
and uniformly from the interval $[0, n]$ and letting $n \to \In$,
the pair correlation function would be identically 1.
The GUE pair correlation function in the range $0 \le u \le 3$ is drawn as the solid curve in Fig.~2.4.1,
and is far from being a constant.

If $w$ is a normalized eigenvalue of the GUE, we let $w^{(k)}$ denote the $k$-th
smallest normalized eigenvalue of those that are $> w$.
Then it is known that the $k$-th nearest spacings $w^{(k)} - w$ satisfy
a distribution law;
for all $0 \le \alpha < \beta < \In$,
\beql{eq223}
Prob ( w^{(k)} - w \in [ \alpha , \beta ] ) \sim
\int_\alpha^\beta  p( k-1, u) du
\eeq
as $n \to \In$.
The probability densities $p(k, u)$ (referred to as $p_2 (k; u)$
in many publications, such as \cite{CM2,Mch,MdC}, where the subscript 2 denotes the GUE)
are complicated functions defined in terms of linear prolate
spheroidal functions.
For methods of computing them, see \cite{MdC,Od2}.
Graphs of $p(0, u)$ and $p(1, u)$ are given by the solid lines
in Figs.~2.4.4 and 2.4.6, respectively.
Those graphs show the ``rigidity'' of the GUE;
the eigenvalues repel each other and most of the time are close to the expected
distance from their neighbors.
For all $u \ge 0$,
\beql{eq224}
1 - \lt \frac{\sin \pi u}{\pi u} \rt^2 = \sum_{k=0}^\In ~
p(k, u ) ~.
\eeq
We note for future reference that the $p(k, u)$ have the following
Taylor series expansions around 0
\cite{Meh,MdC}:
\begin{eqnarray}
\label{eq225}
p(0, u) & = & \frac{\pi^2}{3} u^2 -
\frac{2 \pi^4}{45} u^4 + \frac{\pi^6}{315} u^6 + \cdots ~, \\
\label{eq226}
p(1~, u) & = & \frac{\pi^6}{4050} u^7 + \cdots , \\
p(2, u) & = & \frac{\pi^{12}}{5358150000} u^{14} +... ~.
\nonumber
\end{eqnarray}

The normalized eigenvalues in the GUE have (in the limit as $n \to \In$) a stationary
distribution.
This means that clusters of eigenvalues have the same distribution no matter where in the
spectrum they are located.
However, this distribution is not Markovian, so that the distribution
of an eigenvalue depends not just on the preceding eigenvalue, but on all previous ones as well.

The basic results
about distribution of GUE eigenvalues are completely rigorous.
However, they do have many gaps.
One of them is that the results are obtained by averaging over the full
ensemble of GUE matrices.
It is conjectured that if one considers a large random GUE matrix,
the distribution of its eigenvalues will be close to that of the
entire ensemble
with high probability.
Although numerical calculations confirm this conjecture, there is no proof of it.
Also, it is thought that entries of the matrix do not have to be of exactly
the form specified above for the GUE result to hold.

The main goal of this monograph (and of the preceding paper \cite{Od2}) is to test the conjecture,
which will be referred to as the
{\em GUE hypothesis},
the {\em GUE theory}, or simply the {\em GUE}, that the zeros of the zeta function
behave like eigenvalues of the GUE.
More precisely, it is conjectured that the $\delta_n$ behave asymptotically like
$w^{(1)} - w$ in the GUE, so that for any $0 \le \alpha < \beta < \In$,
\beql{eq227}
M^{-1} | \{ n :~
N+1 \le n \le N+M ,~~\delta_n \in [ \alpha , \beta ] \} | \sim
\int_\alpha^\beta p(0, u) du
\eeq
as $M, N \to \In$ with $M$ not too small compared with $N$.
Similarly, it is conjectured that
\beql{eq228}
M^{-1} | \{ n :~ N+1 \le n \le N+M ,~~\delta_n + \delta_{n+1} \in 
[ \alpha , \beta ] \} | \sim \int_\alpha^\beta  p(1, u) du ~.
\eeq
More generally, the same reasoning leads one to expect that for any fixed $k$,
the empirical distribution function of $\delta_n , \delta_{n+1} , \dd , \delta_{n+k}$ for $N+1 \le n \le N+M$ approaches the stationary
process that holds for the GUE.

If the GUE
hypothesis
is true, it might be
interpreted as providing some support for the Hilbert and
P\'{o}lya
conjectures \cite{Be2,Be3,Mon1,Od3} which predict that the RH is true because
the zeros of the zeta function correspond to eigenvalues of a positive linear operator.
The argument is that if such an operator exists, its eigenvalues might be similar
to those of a random operator (especially if, as is conjectured for the GUE, most random
operators have similar eigenvalue distributions), and a random linear
operator ought to be the limit of a sequence of random matrices.

If the GUE hypothesis were true, that would also be of
interest in physics, as the zeta function could then be used as a
model of quantum chaos \cite{Be2,Be3}.

The main theoretical support and inspiration for the GUE hypothesis comes from
H. Montgomery's work on the pair correlation function of the zeros of the zeta function.
Under the assumption of the RH, Montgomery showed \cite{Mon1,Mon2} that if we define
\beql{eq229}
F( \alpha , T) = 2 \pi (T \log T)^{-1} \sum_{0 < \gamma \le T \atop 0 < \gamma' \le T}
T^{i \alpha ( \gamma - \gamma' )} 
\frac{4}{4+( \gamma - \gamma' )^2}
\eeq
for $\alpha$ and $T$ real, $T \ge 2$, then
\beql{eq2210}
F( \alpha , T) = (1+ o(1)) T^{- 2 \alpha} \log  T + \alpha + o(1) ~~~\mbox{as}~~~
T \to \In ~,
\eeq
uniformly for $0 \le \alpha \le 1$.
Montgomery also observed that if the primes are distributed sufficiently uniformly
in arithmetic progressions, then
\beql{eq2211}
F( \alpha , T ) =1 + o(1) ~~~~\mbox{as} ~~~~T \to \In
\eeq
uniformly for $\alpha \in [a, b]$, where $1 \le a < b < \In$
are any constants.
If the conjecture \eqn{eq2211}
were true, then one would find that
for any $0 < \alpha < \beta < \In$,
\beql{eq2212}
\begin{array}{c}
N^{-1} | \{ (n, k) :~
1 \le n \le N ,~ k \ge 0 ,~~
\delta_n + \delta_{n+1} + \cdots +
\delta_{n+k} \in [ \alpha , \beta ] \} | \\
~~ \\
\sim \dis\int_\alpha^\beta \lt 1 - \lt
\df{\sin \pi u}{\pi u} \rt^2 \rt du
\end{array}
\eeq
as $N \to \In$.
The relation \eqn{eq2212}
is known as the Montgomery pair correlation conjecture.
It says that the pair correlation of the zeros of the zeta function is the same as that of the GUE.
Since the pair correlations of the GOE and GSE are different,
(indeed, they are even inconsistent with \eqn{eq2210}),
this leads one to expect that the zeros might behave
like eigenvalues of the GUE rather than GOE or GSE.
Therefore the discussion above was concentrated on
the GUE distribution.
(One possible implication of this observation is that the hypothetical Hilbert-P\'{o}lya operator
is likely to be complex.)

Montgomery's hypothetical result \eqn{eq2210} and the conjectures \eqn{eq2211}
and
\linebreak
\eqn{eq2212} are the main theoretical evidence we have in favor of the GUE hypothesis,
and the two conjectures depend on far-reaching assumptions about pseudorandom
behavior of primes.
Some further evidence in favor of the GUE
hypothesis was provided by Ozluk \cite{Oz1}, who showed that if one considers a function
similar to $F( \alpha , T)$, but where one sums over zeros of many
Dirichlet $L$-functions, then under the assumption of the Generalized Riemann
Hypothesis for these $L$-functions, the analog of Montgomery's conjecture \eqn{eq2211}
is true for $1 \le \alpha \le 2$.
Some further slight support for the GUE hypothesis is provided by new results of
Ozluk \cite{Oz2} on zeros of Dirichlet $L$-functions close to the real axis.

Extensive numerical evidence in favor of the GUE hypothesis was presented in \cite{Od2}.
It was based largely on computed values of $\gamma_n$,
with
$1 \le n \le 10^5$ and
$10^{12} +1 \le n \le 10^{12} + 10^5$.
With some slight exceptions (such as the slight excess of small $\delta_n$
that was mentioned in the Introduction)
this evidence was in excellent agreement with the GUE hypothesis, and the degree
of agreement improved dramatically as one went from the first $10^5$ zeros
to those near zero number $10^{12}$.
Some numerical evidence for the pair correlation conjecture for Dirichlet $L$-functions has been obtained
since then by Hejhal \cite{Hej5}.
It involved computations of a few quadratic character $L$-functions for several
moduli at large heights.
Much more extensive data have been obtained by Rumely \cite{Rum}, who computed all
zeros of all $L$-functions to small moduli up to height 2500.
His evidence also supports the GUE hypothesis.

Various theoretical results and conjectures related to the GUE theories and the pair correlation conjecture have been obtained in recent years.
Some of the references are \cite{Be2,Be3,Be4,Fu8,Gal2,Gal3,Gal4,GM,Go1,Go2,Go3,GG,GHB,GM,HB1,Mue2}.
\section{General distribution of gaps between zeros}\label{sc23}
\hspace*{\parindent}
Figure~2.4.1 shows how well the pair correlation conjecture is satisfied.
The solid line is the GUE prediction $y = 1- (( \sin \pi x ) / ( \pi x ))^2$.
The scatterplot is based on about $8 \times 10^6$ zeros
near zero number $10^{20}$.
Let
$$
\begin{array}{l@{~}l@{~}l}
n_1 & = & 10^{20} - 15, 409, 240, \\
n_2 & = & 10^{20} - 13, 366, 460, \\
n_3 & = & 10^{20} - 10, 302, 282, \\
n_4 & = & 10^{20} - 6, 216, 711, \\
n_5 & = & 10^{20} - 42, 778, \\
n_6 & = & 10^{20} + 15, 316, 087, \\
n_7 & = & 10^{20} + 46, 073, 204, \\
n_8 & = & 10^{20} + 47, 098, 588,
\end{array}
$$
and
$$V = \{ n :~ n_i \le n < n_i + 10^6 ~~~\mbox{for some} ~~i, ~~~
1 \le i \le 8 \} ~.
$$
Then for each interval $I = [ \alpha , \beta )$ with $\alpha = k/20$,
$\beta = \alpha + 1/20$, $0 \le k < 60$, a star is placed at the point
$x = ( \alpha + \beta ) /2$, $y = a_{\alpha , \beta}$, where
\beql{eq231}
a_{\alpha , \beta} =
\frac{20}{8 \times 10^6} \left |
 \{ (n, k) :~ n \in V ,~~k \ge 0 , ~~ \delta_n + \cdots + \delta_{n+k} \in 
[ \alpha , \beta ) \} \right | ~.
\eeq
The solid line is the GUE prediction
$y=1-(( \sin \pi x ) / ( \pi x ))^2$.
As can be seen, the agreement between the conjectured and observed values is excellent.

Figure~2.4.2 presents similar data, but this time based on just $10^6$ values of
$n$;
$n_9 \le n < n_9 + 10^6$, $n_9 = 10^{12} - 6, 032$.
A comparison of these two graphs with Figures~1 and 2 of \cite{Od2} is instructive.
Those figures show similar graphs, but based in each case on $10^5$ zeros starting with zeros number 1 and $10^{12} +1$.
The scatterplot of Fig.~2.4.2 is much smoother than that of Fig.~2 of \cite{Od2},
because the former is based on $10^6$ instead of $10^5$ samples, and so the sampling error is smaller.
That same reason explains why the scatterplot of Fig.~2.4.1 looks smoother than that of Fig.~2.4.2.
Even if we make allowances for the different sample sizes, though, it is clear that the agreement between empirical and predicted values improves dramatically from $N=1$ to $N= 10^{12}$, and improves
even more between $N=10^{12}$ and
$N=10^{20}$.
In all cases, the empirical data has more pronounced peaks and troughs than expected, but this effect decreases as the height increases.

Some of the pair correlation function oscillations can be seen even for normalized
spacings that exceed 3.
Figure~2.4.3 shows a graph based, just like Fig.~2.4.1, on $8 \times 10^6$ zeros near zero
number $10^{20}$.
Here, though, the scatterplot was smoothed slightly by applying the lowess
function of \cite{BC}
(an implementation of Cleveland's robust locally weighted regression \cite{Clev}).
The reason for this smoothing is that even with $8 \times 10^6$ zeros,
each of the $a_{\alpha , \beta}$ defined in \eqn{eq231} corresponds to about
$4 \times 10^5$ counts $(n, k)$.
Therefore we can expect random sampling errors about $(4 \times 10^5 )^{1/2}$, which gives a variation of about $1.6 \times 10^{-3}$ in the value of $a_{\alpha , \beta}$.
Given the small variation in the GUE prediction $y=1- (( \sin \pi x ) /( \pi x ))^2$ over the range $3 \le x \le 5$, this random sampling error
produces a confusing picture if the data is not smoothed.
(Another, but slightly less effective way to produce a better picture is to use sampling
intervals larger than 1/20.
The resulting picture is similar to that of Fig.~2.4.3.)

Figure~2.4.3 shows that the empirical pair correlation function, even for $N=10^{20}$, has peaks and troughs that are more pronounced than those of the conjectured distribution,
at least in the range $3 < x < 5$.
This is also true in the range $5 < x < 10$.

Figures~2.4.4 and 2.4.5 show the distribution of the normalized spacings $\delta_n$ for $N=10^{12}$ and $N=10^{20}$,
based on the 1,592,196 and 78,893,234 zeros,
respectively, that have been computed.
Thus, for example, in Fig.~2.4.4 a star is plotted at $x= ( \alpha + \beta ) /2$,
$y= b_{\alpha , \beta }$ for $\alpha = k/20$,
$\beta = \alpha + 1/20$, $0 \le k \le 59$, where
\beql{eq232}
b_{\alpha , \beta} =
\frac{20}{1592195} \left | 
 \{ n :~ 10^{12} - 6032 \le n \le 10^{12} + 1586162,~~ \delta_n \in [ \alpha , \beta ) \} \right | ~.
\eeq
The solid lines are the GUE predictions, $y= p(0, x)$.
Similarly, Figures~2.4.6 and 2.4.7 show the distribution of $\delta_n + \delta_{n+1}$.
(Similar graphs based on the first $10^5$ zeros are contained in \cite{Od2}.)

The graphs show excellent agreement between conjecture and numerical data,
and, as was to be expected, the degree of agreement increases substantially as one goes from $N=1$ to $N= 10^{12}$, and then improves a bit more as one goes to $N=10^{20}$.
That the disagreement is greater for $\delta_n + \delta_{n+1}$ than for $\delta_n$ is to be expected,
given that $S(t)$ is small.
(See Section~\ref{sc24} for a discussion of this.)

A quantitative measure of the agreement between observed and conjectured
distributions is shown in Tables~2.4.1 through 2.4.3, which display moments of
distributions.
For each set of $M$ zeros, $K < n \le K+M$
$(M=1, 592, 196$ for $N=10^{12}$, 78,893,234 for $N=10^{20}$, etc., where $K$ is close to $N$.)
Table~2.4.1 displays
\beql{eq233}
(M-1)^{-1} \sum_{n=K+1}^{K+M-1} ( \delta_n -1 )^k ~,
\eeq
while Table~2.4.2 shows
\beql{eq234}
(M-2)^{-1} \sum_{n=K+1}^{K+M-2}  ( \delta_n + \delta_{n+1} -2)^k ~,
\eeq
in each case for $2 \le k \le 10$.
(The values for $N=1$ are taken from \cite{Od2}.)
Table~2.4.3 shows moments of $\log  \delta_n$, $\delta_n^{-1}$,
and $\delta_n^{-2}$.
The values predicted by the GUE are also shown.

Tables~2.4.1 to 2.4.3 show satisfactory agreement between observed values
and conjectured ones, with the degree of agreement increasing as the height of the
zeros increases.
(The slightly anomalous value for the moment of $\delta_n^{-2}$ for $N=10^{18}$ is due to one extremely small $\delta_n$ that is very unusual and will be
discussed in Sections~\ref{sc25} and \ref{sc27}.)

The Kolmogorov test \cite[Section~30.49]{KS}
yields a method for measuring the agreement between the observed distribution of the $\delta_n$ and the GUE predictions.
If samples $x_1 , \dd , x_n$ are drawn from a distribution with a continuous
cumulative distribution function $F(z)$, let $F_e (z)$ denote the sample
distribution function:
$$F_e (z) = n^{-1} | \{ k :~ 1 \le k \le n ,~~ x_k \le z \} | ~.$$
The
Kolmogorov
statistic is then
\beql{eq235}
D = \dis\sup_z |F_e (z) - F(z) |~.
\eeq
If the $x_i$ are drawn independently from the distribution
characterized by
$F(z)$, then
\cite[Eq.~30.132]{KS}
\beql{eq236}
\lim_{n \to \In} Prob (D > un^{- 1/2} ) = g(u) ~,
\eeq
where
\beql{eq237}
g(u) = 2 \sum_{r=1}^\In (-1)^{r-1} \exp (-2r^2 u^2 ) ~.
\eeq
Table~2.4.4 gives the
Kolmogorov
statistic $D$ for $\delta_n$ and $\delta_n + \delta_{n+1}$ for several
blocks of $10^6$ consecutive values of $n$.
The set denoted by $N=10^{12}$ corresponds to
$n_9 \le n < n_9 + 10^6$;
the ones denoted by $N=10^{20} (a)$,
$N=10^{20} (b)$, and $N= 10^{20} (c)$ start at
$n=n_6$, $n=n_8$ and $n=n_5$,
respectively,
where the $n_i$ were defined at the beginning of this section.
The ``$N=10^{12}$ vs. GUE'' entry, for example, gives the
Kolmogorov
statistic
of the $N=10^{12}$ set when it is compared to the GUE distribution.
For each value of $D$, the ``prob.'' column gives an estimate that this statistic
would arise if the $\delta_n$ ($\delta_n + \delta_{n+1}$, respectively) were drawn independently for each $n$ from the GUE
distribution.
This estimate is obtained by evaluating $g( D \times 1000)$.
The ``$N=10^{20} (a)$ vs. $N=10^{20} (b)$'' row of the table was
obtained by constructing a continuous distribution from the $N=10^{20} (b)$ data and computing
the
Kolmogorov
statistic
for the discrete $N=10^{20} (a)$ data against this continuous distribution.

What is apparent from Table~2.4.4 is that as the height increases,
the empirical distributions of $\delta_n$ and $\delta_n + \delta_{n+1}$ do approach that of the GUE.
When one computes the $D$ statistic for the $\delta_n$ in the 10 blocks of $10^5$ consecutive zeros that are contained in the $N=10^{20} (b)$ set, one obtains
values ranging between 0.002 and 0.0031, which correspond to probabilities between
0.83 to 0.3 of occurring if the $\delta_n$ were drawn from the GUE distribution.
Thus for sets of $10^5$ zeros around zero number $10^{20}$,
it is essentially impossible to distinguish the empirical distribution of the $\delta_n$ from the
expected one.
(For $\delta_n + \delta_{n+1}$, the corresponding $D$ values are 0.0035 and 0.00555, which gives probabilities of 0.17 and 0.004, so the fit here is slightly
worse.)

The comparison of the three different sets of $10^6$ zeros near zero number $10^{20}$ to each other is revealing.
The Kolmogorov
statistics
$D$ are small (especially for $\delta_n$), and indicate that all three sets come from
essentially the same distribution.
What seems to be happening is that at each height, when we examine large sets of zeros, the $\delta_n$ and $\delta_n + \delta_{n+1}$ behave as if they were drawn
independently from some distributions that depend on $t$,
change slowly as $t$ changes, and tend to the GUE distributions as
$t \to \In$.
\section{Values of $S(t)$}\label{sc24}
The upper bound \eqn{eq2010} for $S(t)$ is the best that is known unconditionally.
The Lindel\"{o}f Hypothesis (see Section~\ref{sc28}) implies that
$|S(t)| = o( \log  t)$ as $t \to \In$,
while the RH implies \cite{Tit2} that
\beql{eq241}
|S(t)| = O \lt \frac{\log t}{\log \log t} \rt ~~~\mbox{as} ~~~ t \to \In ~.
\eeq
The true rate of growth is thought to be much smaller.
The best lower bound that has been proved under the RH is that of
Montgomery \cite{Mon3}, who showed
\beql{eq242}
S(t) = \Omega_\pm \lt \lt \frac{\log t}{\log \log t} \rt^{1/2} \rt ~~~\mbox{as} ~~~ t \to \In ~.
\eeq
(The best unconditional bound, due to Tsang \cite{Ts1,Ts2},
replaces the square root in \eqn{eq242} by a cube root.)
Montgomery [Mon3] has conjectured that the quantity on the right side of \eqn{eq242}
represents the correct rate of growth of $S(t)$, and Joyner \cite{Joy2} has presented a
heuristic argument supporting this conjecture.
As we will see in Section~\ref{sc25}, the GUE suggests that $|S(t)| $ might occasionally
get as large as $( \log  t)^{1/2}$, which would contradict the Montgomery conjecture.
In any case, it is thought likely that
\beql{eq243}
|S(t)| \le ( \log  t)^{1/2 + o(1)} ~~~\mbox{as} ~~~ t \to \In ~.
\eeq
Some lower bounds for $S(t+h) - S(t)$ are also known, see \cite{Ts1,Ts2}, for example.

Not only is $S(t)$ small, but its oscillations tend to cancel out.
If we define
\beql{eq244}
S_1 (t) = \int_{t_0}^t  S(u) du ~,
\eeq
then $|S_1 (t) | = O( \log  t)$ unconditionally, and
$|S_1 (t) | = O( ( \log  t) ( \log \log t)^{-2} )$ on the RH \cite{Tit2}.
The true maximal magnitude of $|S_1 (t) | $ is probably again around
$( \log t)^{1/2}$.
(See \cite{Ts1,Ts2} for lower bounds.
The estimate $|S_1 (t) | = o( \log  t)$ is equivalent to the
Lindel\"{o}f Hypothesis,
see Notes to Chapter~13 of \cite{Tit2}.)
Furthermore, if one chooses $t_0$ appropriately,
then one obtains
\beql{eq245}
\int_0^t  S_1 (u) du = O( \log  t) ~~~\mbox{as} ~~~ t \to \In ~.
\eeq
(The same property applies to further iterations of this process.)
In addition,
$$\lim_{T \to \In}  T^{-1} \int_0^T  S_1 (t)^2 dt = c
$$
exists for a constant $c > 0$ (Theorem~14.19 of \cite{Tit2}).

Selberg \cite{Sel2} proved, under the assumption of the RH,
that for every fixed positive integer $k$,
\beql{eq246}
\int_0^T  S(t)^{2k} dt =
\frac{(2k)!}{k! (2 \pi )^2k} T ( \log \log T)^k ( 1 + O( ( \log  \log T)^{-1} ))
\eeq
as $T \to \In$.
Later \cite{Sel3} he proved similar estimates unconditionally, with $( \log  \log  T)^{-1}$ in the remainder
term replaced by $( \log  \log  T)^{-1/2}$.
Although it was apparently not noticed right away, these results imply (unconditionally) that
$S(t)$ is asymptotically normally distributed with mean 0 and variance $2 \pi^2 \log \log t$, so that for
$\alpha < \beta$,
\beql{eq247}
\lim_{T \to \In} T^{-1} \left | 
\left \{ t : 0 \le t \le T , \frac{S(t)}{( 2 \pi^2 \log \log T)^{1/2}}
\in ( \alpha , \beta ) \right \} \right | = (2 \pi )^{-1/2}
\int_\alpha^\beta e^{-x^2 /2} dx ~.
\eeq
For further results on moments and distributions of $S_1 (t)$, $S(t+h) - S(t)$, and related
functions, see
\cite{Fu1,Fu2,Fu3,Fu4,Fu8,GM,Gh1,Gh2,Go2,Joy1,Ts1,Ts2}.
Goldston \cite{Go2} has improved the estimate \eqn{eq246} for $k=1$ by showing, under the assumption of the RH, that
\beql{eq248}
\int_0^T S(t)^2 dt = \frac{T}{2 \pi^2} \log \log T +
\frac{T}{2 \pi^2} \lt c_1 + \int_1^\In
F( \alpha ,T) \alpha^{-2} d \alpha \rt + o(T)
\eeq
as $T \to \In$, where $F( \alpha , T)$ is defined by \eqn{eq229}, and $c_1$ is a constant,
\beql{eq249}
c_1 = c_0 + \sum_{m=2}^\In \sum_{p} \lt - \frac{1}{m} +
\frac{1}{m^2} \rt \frac{1}{p^m} ~,
\eeq
where $c_0 = 0.577 \dd$ is Euler's constant.
(The sign of the $m^{-1}$ term is wrong in \cite{Go2}.)
If Montgomery's pair correlation conjecture \eqn{eq2211} holds, then
\linebreak
$\int_1^\In F( \alpha , T) \alpha^{-2} d \alpha$ is asymptotic to the constant 1, but if his conjecture were to fail,
it is conceivable that the second order term in the asymptotic expansion of $\int_0^T S(t)^2 dt$ might oscillate.

Table~2.5.1 presents data on the moments of $S(t)$.
Statistics were collected on two intervals of the form $( \gamma_n , \gamma_{n+10^6} )$, where
$n=n_1 = 10^{12} - 6, 032$ for the $N=10^{12}$ data,
and $n=n_2 = 10^{20} - 48, 778$ for the $N=10^{20}$ data.
The average values of $S(t)$ and $S(t)^2$ for these sets are given
in the $k=1^{\ast}$ and $k=2^{\ast}$ zeros.
To obtain a good comparison with the asymptotic normal distribution, the other moments were
scaled, so that if we let $\sigma^2$ be the mean value of $S(t)^2$,
then the $k=1, 2, \dd , 8$ entries denote the average values of $( \sigma^{-1} S(t))^k$,
and the $k=|1| , |3| $, and $|5| $ entries the average values of $| \sigma^{-1} S(t) |^k$.
Finally, the last column gives the corresponding values for the standard normal
distribution.
As we can see, the agreement between empirical values and asymptotic ones is reasonably good,
and is somewhat better for $N=10^{20}$ than for $N=10^{12}$.

$S(t)$ has jumps discontinuities by 1 at zeros and decreases monotonically between zeros
with derivative very close to $-1$ (on Gram point scale). Since there is asymptotically
one zero per Gram point, the smallest mean values of $S(t)^{2k}$ for any $k \in Z^+$ that is at all conceivable would be obtained by having a zero exactly halfway
between every two neighboring Gram points.
This would yield a mean value of $S(t)^2$ of 1/6.
The values that are observed, 0.23 for $N=10^{12}$ and 0.26 for $N=10^{20}$,
are not much larger than that.

That the distribution of $S(t)$ is close to the normal one can be seen visually
in Fig.~2.3.1.
This figure is based on determining for what fraction of values of $t \in ( \gamma_{n_2 , \gamma_{n_2} + 10^8} )$ we have $S(t) \in [k/100 , (k+1)/100)$, and then scaling
the resulting histogram by $\sigma$ to produce a graph that can be compared to that of the standard normal distribution.
It is curious that the observed distribution of $S(t)$ is less peaked than the normal one,
whereas in most of the other comparisons the empirical distributions have sharper peaks than expected.
It is especially interesting to compare Fig.~2.5.1 to Fig.~2.11.1,
which compares the distribution of $\log  | Z(t) | $ (up to a constant the harmonic conjugate of $S(t)$) to the normal distribution.
In both cases the limiting distributions are known to be normal (even without assuming the RH), but the observed deviations from normal behavior are different for $S(t)$
and $\log  |Z(t)| $, and are much more pronounced in the latter case.

The area between the two curves in Fig.~2.5.1 is 0.023.
For the corresponding figure using the $N=10^{12}$ data, the area is 0.029.

Since both $S(t)$ and its integral $S_1 (t)$ are small, 
we can expect that $S(t)$ will have many sign changes, 
and several results in this direction have been proved,
the strongest ones being due to Ghosh \cite{GL1} and Mueller \cite{Mue1},
but they are all weak.
For example, Mueller proves that gaps between consecutive zeros of $S(t)$ are $O( \log \log \log t)$.
On the other hand, we know that
$S(t)$ has a limiting normal distribution with variance on the order of $( \log \log t)^{1/2}$ and mean close to 0, and that it cannot vary too widely
(in particular,
except for jump discontinuities at zeros it
is monotone decreasing with derivative
close to
$- \log t$).
Therefore
we might expect that the ratio of the number
of zero crossings of $S(t)$ for $t \in ( \gamma_N , \gamma_{N+M} )$ to $M$ might be
roughly the fraction of $t$ in $ ( \gamma_N, \gamma_{N+M} ) $
for which $|S(t) | \le 1$.
This suggests that on average there ought to be on the order of
$( \log  \log  t)^{-1/2}$ zeros of $S(t)$ per Gram interval.

The number of sign changes of $S(t)$ in the intervals that have been investigated can be
determined easily from the statistics of Gram blocks and of
exceptions to Rosser's rule that have been collected.
When $g_n$ is a good Gram point (see Section~\ref{sc212} for a definition) that is not close to an exception to
Rosser's rule,
and is not a zero of the zeta function, then $S(g_n ) = 0$,
and $S(t)$ changes sign at $g_n$.
We will count this sign change as occurring in the Gram interval $[g_n , g_{n+1} )$.
If $B(n, k)$ is a Gram block that has exactly $k$ zeros, then
an easy accounting shows that $S(t)$ has exactly 2 sign changes in $B(n, k)$.
On the other hand, when $B(n, k)$ is an exception to Rosser's rule
(see Section~\ref{sc213} for definitions),
and $[g_m, g_{m+r} )$ is the smallest union of Gram blocks that contains
both the exception and the excess zeros, then a similar
accounting shows that $[g_m , g_{m+r} )$ contains exactly 2 sign changes.
Thus if Gram's law (see Section~\ref{sc212}) held universally,
we would have an average of 2 sign changes of $S(t)$ for every zero of $zeta (s)$.
Departures from Gram's law lower this average.
Table~2.5.2 shows the computed averages for the different data sets.
There is a steady decrease in the average, but it is slow.
Since the argument in the preceding paragraph suggests a rate of decrease of
$( \log  \log  t)^{-1/2}$, this is not surprising.

For every exception $B(n, k)$ to Rosser's rule
there is a $t$ nearby with
$|S(t)| \ge 2$ (and even $|S(t)| > 2$, if zeros do not coincide with Gram points,
as seems likely).
Statistics about these large values of $S(t)$ were collected during investigation of exceptions to Rosser's rule.
Large values of $S(t)$ are of special interest because it is only when $S(t)$
is large that unusual behavior of the zeta function
can take place.
Locally extreme values of $S(t)$ occur at zeros.
(Each zero has
associated to it two values of $S(t)$,
the limits of $S(t)$ as $t$ approaches the zero from the right or the left.)
Table~2.5.3 shows the values of $S(t)$ for which $|S(t)| $ is largest in absolute
value, as well as the number of zeros at which $|S(t)| > 2.3$ divided by the number of exceptions to Rosser's rule.
The largest value of $|S(t)| $ that was found in these computations is 2.7916, while among the
first $1.5 \times 10^9$ zeros the largest such value is
2.3137 \cite{LRW2}.
(A point $t$ at which $|S(t)| = 2.8747$ was found later in the computations described in Section~\ref{sc3}.)
Earlier computations established that $|S(t)| < 1$ for $7 < t \le 280$, and $|S(t)| < 2$ for $7 < t \le 6.8 \times 10^6$.

The values of $S_1 (t)$ were investigated in the two intervals $( \gamma_n , \gamma_{n+10^6} )$, where
$n = 10^{12} - 6, 032$ (for the $N=10^{12}$ set) and $n = 10^{20} - 48, 778$
(for $N=10^{20}$).
The values of $S_1 ( \gamma_n )$ were
chosen to make

\beql{eq2410}
\int_{\gamma_n}^{\gamma_m}  S_1 (t) dt = 0 
\eeq
for $m=n+ 10^6$.
The data that were obtained are summarized in Table~2.5.4;
the mean of $S_1 (t)^4$, for example, refers to
$$\frac{1}{\gamma_m - \gamma_n}  \int_{\gamma_n}^{\gamma_m}  S_1 (t)^4 dt ~.
$$
In addition to the uncertain choice of $S_1 ( \gamma_n )$, there were additional
problems in these computations due to the accumulating errors from the
uncertainties in the values of zeros and $S(t)$.
Values computed over shorter intervals suggest that the mean values in Table~2.5.4 are accurate.
The entry for sign changes of $S_1 (t)$ refers to the number of sign changes
per Gram interval.
This figure appears to be moderately accurate.
Changing the initial value of $S_1 ( \gamma_n )$ by
$\pm 10^{-4}$ varied the number of computed sign changes of $S_1 (t)$ for the
$N=10^{20}$ interval only between 73799 and 74089.
\section{Extreme gaps between zeros}\label{sc25}
\hspace*{\parindent}
In its weakest form, the GUE hypothesis predicts only that \eqn{eq227} holds for all
$0 \le \alpha < \beta < \In$, and so it says nothing about the
existence of a small number $(o(M)$, say) of large or small $\delta_n$.
A double zero of the zeta function, giving $\delta_n =0$, would not by itself
contradict this weak hypothesis.
On the other hand, it is known (cf.~\eqn{eq223} and \eqn{eq225}) that in the GUE,
\beql{eq251}
Prob ( \delta_n \le x) = \frac{\pi^2}{9} x^3 - \frac{2 \pi^4}{225} x^5 +
\frac{\pi^6}{2205} x^7 + \cdots~ ,
\eeq
so very small $\delta_n$ (roughly $o(M^{-1}/3 )$ among $M$ samples) are
rare in the GUE, and a similar result holds for large $\delta_n$.
A strong form of the GUE hypothesis would predict that even extreme values of $\delta_n$ (and $\delta_n + \delta_{n+1}$) for the zeta function would behave
as in the GUE model.

Given the constraints on $S(t)$ described in Section~\ref{sc24}, we can expect that even if the strong
form of the GUE hypothesis holds, it only applies to the zeta function at large heights, and that the lower the region under investigation, the fewer extreme values of $\delta_n$ or
$\delta_n + \delta_{n+1}$ we will find.
This is clear for large values of $\delta_n$ and $\delta_n + \delta_{n+1}$, as these clearly
correspond to large values of $| S(t) | $.
It is also true for small values of $\delta_n$ and $\delta_n + \delta_{n+1}$, though, since several zeros clustered close together again force $|S(t)| $ to be large.

What was observed in \cite{Od2} in a comparison of the first $10^5$ zeros to $10^5$ zeros starting with zero number $10^{12}$ is that the above predictions
were largely satisfied by the data.
In general, there was a deficiency of extreme values of $\delta_n$ and
$\delta_n + \delta_{n+1}$
(compared to the GUE prediction),
but this deficiency declined as one considered the higher zeros.
There was, however, one observation that went counter to expectations.
The number of small $\delta_n$ that were observed at large heights
was larger than predicted by the GUE theory.
This excess was not large, but it was also observed in the data for
$10^5$ zeros starting with zero number $2 \times 10^{11}$, as well as by some data
based on the first $1.5 \times 10^9$ zeros.
This excess of small spacings was very counterintuitive, and so gave rise
to some suspicions about the validity of the GUE hypothesis.

Table~2.6.1 shows the extremal values of $\delta_n$ and $\delta_n + \delta_{n+1}$ that were found in each data set.
(The number of zeros in each data set is given in Table~1.2.)
The last column in Table~2.6.1 gives the probability that the minimal
$\delta_n$ would not exceed the values in the second column if all
the $\delta_n$ in the data set were drawn independently from the GUE
distribution.
From \eqn{eq251}, we see that the probability that the smallest $\delta_n$ out of $M$ that are drawn from the GUE satisfies $\delta_n \le x$ is
about
\beql{eq252}
1 - \lt 1 - \frac{\pi^2}{9} x^3 \rt^M \sim
1 - \exp ( - \pi^2 x^3 M/9 ) ~.
\eeq
This approximation was used to compute the last column of Table~2.6.1.
We can see that most of the entries in that column are high (although
not too high, which would indicate a severe deficiency of small spacings),
while those for $N=10^{18}$ (where $\delta_n = 0.001124$ for $n = 10^{18} + 12, 376, 780$, a case that will be discussed in sections~2.7 and 4.5) and for $N = 10^{19}$ (where $\delta_n = 0.000897$ for $n = 10^{19} + 15, 987, 196$ is the smallest $\delta_n$
that was found in our computations)
are extremely low.
The smallest value of $\delta_n$ that is known
is $\delta_n = 0.000310$ for $n = 1, 048, 449, 114$ (found by
van~de~Lune et~al. \cite{LRW2}),
and the probability of such a small spacing occurring among $1.5 \times 10^9$ samples drawn from the GUE is only 0.048.
Thus the extremely small values of the $\delta_n$ do appear to be somewhat
more frequent than expected.
(Some more evidence pointing to this conclusion is presented in Section~\ref{sc3}.)

When we next consider small, but slightly larger spacings, we find no evidence of an excess of small spacings.
Table~2.6.2 shows the number of $\delta_n \le 1/20$ and $\le 1/10$ observed in each set
(given in terms of cases per million zeros to make comparisons easier).
If we consider the $N=10^{19}$ entry for $\delta_n \le 1/20$, for example,
we see that we are dealing with 2353 cases altogether, so a normal
sampling error might be around 50, which is about 2\%.
Thus the 140.5 figure in the table is consistent with the 136.8 expected for the GUE.

Still another way to judge whether there is any anomaly in the distribution of the
$\delta_n$ or the $\delta_n + \delta_{n+1}$ is to
use
the quantile-quantile $(q-q)$ plots to compare
the observed distributions to those of the GUE.
Given a sample $x_1 , \dd , x_n$, and a continuous cumulative distribution
function $F(z)$ for some distribution, the $q-q$ plot is obtained by plotting
$x_{(j)}$ against $q_j$, where $x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}$ are the $x_i$ sorted in increasing order, and the $q_j$ are the theoretical
quantiles defined by $F(q_j ) = (j-1/2) /n$ \cite{CCKT}.
The $q-q$ plot is a sensitive method of
detecting differences among distributions.
In particular, while it does show the outliers that are far away from the expected position, it makes it possible to disregard them
and concentrate on the main part of the distribution curve.
If the $x_j$ are drawn from the distribution corresponding to $F(z)$, and the sample size $n$ is large, the $q-q$ plot will be close to the straight line $y=x$.
In all our $q-q$ plots, 
straight lines $y=x$ are drawn to facilitate comparisons.
(By the standards of typical statistical investigations, the sample sizes we deal with are very large,
and the degree of agreement between conjecture and 
numerical evidence is very good, so
one has to look at minute deviations.)

The $q-q$ plots of \cite{Od2} that showed the distribution of small $\delta_n$
indicated a deficiency of small $\delta_n$ for $N=1$, and a slight excess for
$N=2 \times 10^{11}$ and $N= 10^{12}$.
Those plots were each based on $10^5$ values of $\delta_n$.
When the new, more extensive data for $N=10^{12}$ were obtained, the resulting
$q-q$ plot was similar to that of Fig.~2.6.1, and did not behave like the plot in Fig.~8
of \cite{Od2} (which was based on only $10^5$ zeros).
Figures~2.6.1 and 2.6.2 show $q-q$ plots of $\delta_n$ drawn from two
disjoint sets of $10^6$ zeros near zero number $10^{20}$.
While the plot of Fig.~2.6.1 might indicate a slight excess of small
spacings (those in (0.02,0.04), roughly), and a slight deficiency of slightly larger spacings (where the scatterplot lies above the straight line),
Fig.~2.6.2 indicates almost perfect agreement between theory and experiment.
Figure~2.6.2 is not completely representative of zeros in the $N=10^{20}$ sets, since it was the one of several $q-q$ plots based on disjoint
sets of $10^6$ zeros that gave the best agreement.
Figure~2.6.1 is more typical in this respect.

Figures~2.6.1 and 2.6.2 provide only a little, if any,
support to the theory that there is an
excess of small spacings among the zeros.
Some further support can be found, however, if we combine all the data from the $N=10^8$, $10^{19}$, and $10^{20}$ data sets, which contain $112, 314, 006$ zeros, and yield 112,314,003 values of $\delta_n$.
The resulting $q-q$ plot, shown in Fig.~2.6.3, does indicate a slight excess of small
$\delta_n$ (the two outliers close to the bottom of the graph are the unusually small $\delta_n$ that
are minimal in the $N=10^{18}$ and $10^{19}$ data sets), but the evidence is not conclusive.

When we consider the other extremal values of $\delta_n$ and $\delta_n + \delta_{n+1}$, the evidence is in much better agreement with expectation.
The counts in Table~2.6.2 show that the numbers of small $\delta_n + \delta_{n+1}$, large $\delta_n$, and large $\delta_n + \delta_{n+1}$ are all smaller than predicted by the GUE theory, but increasing towards that prediction.
The $q-q$ plots of Figures~2.6.4 through 2.6.7 also support this impression;
there are too few extreme values in general, but the deficiency is smaller for $N=10^{20}$ than for $N=10^{12}$.

Because of \eqn{eq226}, one can expect that among the values of $\delta_n + \delta_{n+1}$ drawn from the GUE, the probability of the minimal value being $\le x$ is about
$$1- \exp ( - \pi^6 x^8 M/32400 ) ~.$$
The minimal value of $\delta_n + \delta_{n+1}$ of 0.1124 in the $N=10^{20}$ data set would then occur with a probability of 0.06 in the GUE, while the corresponding
probabilities for the $N=10^{12}$, $10^{14}$, $10^{16}$, $10^{18}$, and $10^{19}$ data sets are
0.93, 0.78, 0.25, 0.27, and 0.60, respectively.
Thus the only one of these figures that might seem unusually small is that
for the minimal $\delta + \delta_{n+1}$ for $N=10^{20}$.

The maximal values of $\delta_n$ and $\delta_n + \delta_{n+1}$ recorded
in Table~2.6.1 are all somewhat smaller than what the GUE predicts,
which is not too surprising given the bounds known to hold for $S(t)$ and
$S_1 (t)$.
For very large spacings in the GUE, des~Cloizeaux and Mehta \cite{CM2} have proved
that
\beql{eq253}
\log  p(0, t) \sim - \pi^2 t^2 /8 ~~~\mbox{as} ~~~ t \to \In ~,
\eeq
which suggests that
\beql{eq254}
\max_{N+1 \le n \le N+M}  \delta_n \sim \pi^{-1} ( 8 \log  M)^{1/2}
\eeq
as $N, M \to \In$ with $M$ reasonably large compared to $N$.
This is larger by about a $( \log  \log M)^{1/2}$ factor than the conjecture
\eqn{eq242}
of Montgomery
allows.
Our data are too limited to
decide
whether that conjecture is right.

Values of $\delta_n$ and $\delta_n + \delta_{n+1}$ larger than those of Table~2.6.1 have been found in other computations, and are described in Section~\ref{sc31}.
In particular, the largest known values of $\delta_n$ and of $\delta_n + \delta_{n+1}$ are 5.1454 and 6.0165, respectively.

Even on the assumption of the RH, it is only known that $\delta_n \le 0.5172$ and $\delta_n \ge 2.337$ each occurs infinitely often [CGG1], and $\delta_n \ge 2.68$ occurs infinitely often on the assumption of the Generalized Riemann Hypothesis for Dirichlet
$L$-functions (or at least of a Generalized
Lindel\"{o}f
Hypothesis) \cite{CGG2}.
On the assumption of the RH, it is also known that $\delta_n < 0.77$ and $\delta_n > 1.33$ each holds for a positive proportion of $n$ \cite{CGGGH}.
The GUE predicts that $\delta_n < \epsilon$ and $\delta_n > \epsilon^{-1}$ should each hold
for a positive proportion of $n$ for every fixed
$\epsilon > 0$.
If we could prove that $\delta_n < 1/4$ holds for
infinitely many $n$, we could obtain effective bounds for class numbers of
imaginary quadratic number fields \cite{MW}.
The GUE hypothesis predicts that $\delta_n < 1/4$ for 1.6\% of $n$'s, and this is very close to what is observed in numerical data.
(For $\delta_n < 1/2$ the corresponding figure is 11.3\%.)
\section{Long and short range correlations between zeros}\label{sc26}
\hspace*{\parindent}
The distribution of the eigenvalue
spacings in the GUE
is stationary.
What this means is that for any $k$, the frequency with which $( \delta_n$,
$\delta_{n+1} , \dd , \delta_{n+k} ) \in Q$ for any measurable subset $Q \subseteq \RR^{k+1}$ does not depend on the range of $n$, so that the distribution eigenvalue spacings looks every place the same.
On the other hand, this distribution is not Markovian, so that the distribution
of $\delta_{n+1}$ does not depend just on that of $\delta_n$.
Instead, $\delta_{n+1}$ is correlated to all the neighboring $\delta_n$,
$\delta_{n-1} , \dd$, as well as $\delta_{n+2}$, $\delta_{n+3} , \dd$.
In the limit, that also should be true for the zeros of the zeta function.
However, given the slow growth rate of $S(t)$, one cannot expect GUE
behavior from joint distributions of
$\delta_n , \delta_{n+1} , \dd , \delta_{n+k}$ if $k$ is
large.
Already the data of sections~2.3 and 2.5 show that the behavior of $\delta_n$ is much closer to the GUE prediction than that of $\delta_n + \delta_{n+1}$.
That was the main reason for not investigating $\delta_n + \delta_{n+1} + \delta_{n+2}$ and higher order spacings.

When we investigate long range correlations among the zeros of the zeta function,
we find phenomena connected not to the GUE, but to the
distribution of primes.
For example, if we let the autocovariances of a set of $\delta_n$ be defined by
\beql{eq261}
c_k = c_k (H, M) = \frac{1}{M}  \sum_{n=H+1}^{H+M}
( \delta_n -1) ( \delta_{n+k} -1 ) ~,
\eeq
then it has been conjectured by F.~J. Dyson (unpublished) that in the GUE,
\beql{eq262}
c_k \approx  - \frac{1}{2 \pi^2 k^2}
\eeq
for $k > 0$, with the $approx$ indicating some degree of approximation,
not necessarily asymptotic equality as $N, M \to \In$.
This result has not been proved for the GUE, but it is intuitively appealing for both the GUE
and the zeros of the zeta function, since it says in effect that a large spacing would lead to
smaller spacings nearby (and vice versa), and that this effect would diminish as one
considered spacings further and further apart.

What was observed in \cite{Od2} for the $\delta_n$ was quite different from the conjecture \eqn{eq262}.
Additional data based on the new computations are presented in
Table~2.7.1.
The $N=1$ entries come from the \cite{Od2} computations, and have $H=0$, $M=10^5$.
The $N=10^{12}$ and $10^{20}$ entries come from the new computations,
and both have $M=10^6$,
with $H = 10^{12} - 6, 032$ for the $N=10^{12}$ column and $H=10^{20} - 48, 776$ for the $N=10^{20}$ column.
(A comparison of the $N=10^{12}$ entries here with those in Table~6 of \cite{Od2}, which are based on 1/10 as many zeros indicates
the size of the sampling errors.)
For small $k$, the data in this table support Dyson's conjecture
\eqn{eq262}.
For higher sets of zeros,
the agreement with \eqn{eq262} extends to slightly higher values of $k$.
However, for large $k$, we see totally different behavior.
If $\delta_n$ and $\delta_{n+k}$ were independent, then, since their mean value
is 1 and variance is about 1/6, we would expect a sum of $10^6$ terms of the form
$( \delta_n -1) ( \delta_{n+k} -1 )$ (for $k > 0$) to be about
$10^{6/2} / 6 \approx 170$, and this would correspond to a value of $c_k$ of
$1.7 \times 10^{-4}$.
The values in Table~2.7.1 for $9,980 \le k \le 10,000$ are usually much larger than that, which shows that there are strong long range
correlation between the $\delta_n$.
The pattern of signs of the $c_k$ also shows the nonrandom
nature of the $c_k$.
The $c_k$ are occasionally positive, and occasionally negative, indicating that for some $k$, a large $\delta_n$ tends to be associated with large $\delta_{n+k}$, while for other $k$ it tends to be associated with small $\delta_{n+k}$.

An explanation for the long range dependencies among the $\delta_n$ was proposed in \cite{Od2}.
It implies that the observed correlations come from primes through
formulas such as that of Landau \cite{Lan1}, which says that for any fixed $y > 0$, as
$N \to \In$ we have
\beql{eq263}
\sum_{n=1}^N  e^{i \gamma_n y} =
\left \{
\begin{array}{ll}
- \df{\gamma_N}{2 \pi} e^{-y/2} \log p + O(e^{y/2} \log N) &
\mbox{if}~~y = \log p^m ~, \\
~~ \\ [-.08in]
O(e^{-y/2} \log N) &
\mbox{if}~~ y \neq \log p^m~,
\end{array}
\right.
\eeq
where $p$ denotes a prime and $m \in Z^+$.
The above statement assumes the RH, but Landau proved a similar unconditional result.
Improvements on Landau's result (with better error terms and more explicit dependencies of the error terms on $y$) have been obtained
by Fujii \cite{Fu5,Fu7} and
Gonek \cite{Gon2}.
(There are many formulas relating primes and zeros, and the ``explicit formulas'' of Guinand \cite{Gu1,Gu2} and Weil \cite{We1} are among the most general.)

The paper \cite{Od2} presents the detailed explanation of how Landau's formula \eqn{eq263} forces the spectrum of the $\delta_n$ to consist largely of point masses at frequencies corresponding to prime powers, which then
forces the initially unexpected behavior of the $c_k$ that is seen in the tables.
This explanation will not be repeated here.
We will mention only that while it is not rigorous, it is supported
by heuristics and numerical evidence.
What we will do now is to check how well Landau's formula \eqn{eq263}
fits with the numerical data.
The main interest here is to see just how many zeros $\gamma_n$ are needed at
various heights to observe the phenomenon of large values
occurring at logarithms of prime powers.
Some proposals have even been made to use sums
like that in \eqn{eq263} for
primality testing and factoring integers.
While it seems unlikely that efficient methods could be developed by this approach,
it is of some interest to see what happens when one considers a relatively
short sum over high zeros.

Let
\beql{eq264}
h(y) = \sum_{n=10^{20} +1}^{10^{20} +4 \times 10^4}  e^{i \gamma_n y} ~.
\eeq
Figure~2.7.1 shows a graph of $2 \log  | h(y) | $ for $0 \le y \le 3$.
It is instructive to compare this graph with that of Fig.~15 of \cite{Od2},
which is drawn on the same scale, but is based on an exponential sum of
$4 \times 10^4$ zeros starting at zero number $10^{12} +1$.
Both graphs show sharp peaks precisely at logarithms of prime powers,
and the peaks are visibly higher at primes than at proper prime powers, as predicted by Landau's formula.
(The heights of the peaks are not represented too accurately on the graph because of
limited sampling.)
All the prime powers $< e^3 = 20.09$ are visible.
The main difference between the two graphs is that in Fig.~2.7.1 the peaks are
slightly lower, and the ``noise'' region between the peaks has somewhat
higher values.
Furthermore, the regular patterns seen in the ``main'' regions of
Fig.~15 of \cite{Od2} (which come from sampling at regular intervals
a rapidly oscillating function whose frequency and amplitude are
changing slowly) is not visible in Fig.~2.7.1.
These differences are probably due partly to the errors in the computed
values of the $\gamma_n$ near the $10^{20}$-th zero and partly
to
taking a very short sum.
$4 \times 10^4$ zeros out of the first $10^{20}$ is a very small
proportion, so it is remarkable that the pattern of Fig.~2.7.1 is as clear
as it is, since this is much better than the proved results of
\cite{Fu5,Fu7,Gon2,Lan1} might lead one to expect.

Figure~2.7.2 shows a graph of $2 \log | h(y) | $, where $h(y)$ is again defined
by \eqn{eq264}, but this time over the region $8 \le y \le 8.05$.
(This graph is based on $10^4$ equally spaced values of $y$.)
The interval from $e^8 = 2980.96$ to $e^{8.05} = 3133.79$ contains the primes
2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 3067, 3079, 3083, 3089, 3109, 3119, and 3121,
and the prime power $5^5 =3125$.
Figure~2.7.2 fails to distinguish between several close pairs
of primes.
When one graphs a similar sum, but with 10 times as many zeros,
as is done in Fig.~2.7.3, all the primes can be distinguished,
and even 3125 can be easily discerned.

An elegant measure of
long range correlations between zeros was found by
Berry \cite{Be4}.
If we consider an interval of length $2 \pi L ( \log  (T/(2 \pi ))^{-1}$ at height $T$,
the expected number of zeros in it equals $L$.
We define the number variance of the zeros by
\beql{eq265}
V_T (L) = V_{T,H} (L) =
H^{-1} \int_T^{T+H} \left\{ N \lt t + \frac{2 \pi L}{\log (t/(2 \pi ))} \rt - N(t) - L \right\}^2 dt ~.
\eeq
In the GUE, one has $V_T (L) = G(L)$, with
\begin{eqnarray}
\label{eq266}
\lefteqn{G(L) = \pi^{-2} \{ \log (2 \pi L) - Ci( 2 \pi L ) - 2 \pi L Si (2 \pi L )} \nonumber \\
&&~~~~~~~~~~~~~~~~+~\pi^2 L - \cos ( 2 \pi L ) + 1 + c_0 \} ~,
\end{eqnarray}
where $Ci$ and $Si$ are the cosine and sine integrals \cite{HMF} and $c_0 = 0.577 \dd$
is Euler's constant.
Asymptotically,
\beql{eq267}
G(L) \sim \pi^{-2} \log (2 \pi L ) ~~~~\mbox{as} ~~~~ L \to \In ~,
\eeq
while
\beql{eq268}
G(L) \sim L ~~~~\mbox{as} ~~~~ L -> 0 ~.
\eeq
Gallagher and Mueller \cite{GM} showed that Montgomery's pair correlation conjecture
implies $V_T (L) = L-L^2 + o( L^2 )$ as $L \to 0$, which is
consistent with \eqn{eq268}.
(See also \cite{Fu8}.)
On the other hand,
the numerical evidence of \cite{Od2} showed that $V_T (L)$ was small even for
moderately large $L$, and so a relation like \eqn{eq267} appeared impossible.
Motivated by this discovery,
by the relations between primes and long range correlation between zeros
discussed above, and by his earlier work on eigenvalues of Hamiltonians of
chaotic dynamical systems \cite{Be1,Be2,Be3}, Berry \cite{Be4} found heuristic arguments
which suggested that for any $\tau \in (0, 1)$, and any $L > 0$,
\beql{eq269}
V_T (L) \approx G(L) + B_T (L) ~,
\eeq
where for $U = T (2 \pi )^{-1}$,
\beql{eq2610}
B_T(L) = \pi^{-2}
\left\{ 2
\begin{array}{c}
_{~~} \\ \dis\sum_{p} \dis\sum_{r=1}^\In \\ ^{p^r < U^\tau}
\end{array}
\df{\sin^2 ( \pi Lr( \log p ) / ( \log U))}{r^2 p^r}
+ Ci(2 \pi L \tau ) - \log (2 \pi L \tau ) - c_0 \right\}\,,~~~~~~~~~~
\eeq
and $p$ denotes primes.
Computations using $10^5$ zeros near zero number $10^{12}$, using values of $L$
up to 1000, showed excellent agreement between Berry's conjecture \eqn{eq269} and empirical data, and those results are shown in the graphs in \cite{Be4}.
Note that the $\log (L)$ terms in $G(L)$ and $B_T (L)$ cancel out, and so for every fixed $L$,
one can show that there is a positive function $g(L)$ such that
\beql{eq2611}
G(L) + B_T (L) = g(L) + o(1) ~~~~\mbox{as} ~~~~ T \to \In ~.
\eeq
Moreover, if $\tau$ is held fixed, then it is easy to see that
\beql{eq2612}
G(L) + B_T (L) \sim \pi^{-2} \log \log T ~~~~\mbox{as} ~~~~
T~, L \to  \In ,
\eeq
(with $L$ growing much more slowly than $T$),
since the arguments of the sine in the definition \eqn{eq2610} of $B_T (L)$ will
be asymptotically equidistributed modulo $2 \pi$.

The new zeros were used to obtain further data.
For $N=10^{12}$, the number variance $V_T (L) = V_{T,H} (L)$ defined
by \eqn{eq265} was computed with
$$
\begin{array}{r@{~}l@{~}l@{~~~~~~~~~~~~~~~~~~}r@{~}l@{~}l}
T & = & \gamma_{n_0} , & n_0 & = & 10^{12} - 6, 032~, \\
~~ \\ [-.09in]
T+H & = & \gamma_{m_0} , & m_0 & = & n_0 + 5 \times 10^5 ~.
\end{array}
$$
For $N=10^{20}$, the values that were chosen were
$$
\begin{array}{r@{~}l@{~}l@{~~~~~~~~~~~~~~~~~~}r@{~}l@{~}l}
T & = & \gamma_{n_1}, & n_1 & = & 10^{20} - 48 , 778 ~, \\
~~ \\ [-.09in]
T+H & = & \gamma_{m_1} , & m_1 & = & n_1 + 5 \times 10^5 ~.
\end{array}
$$
Berry's function \eqn{eq2610} was computed in each case with $\tau = 1/4$.
(Varying $\tau$ between 0.2 and 0.3 did not appreciably change the results,
as was to be expected.)
The results of some of these computations for $N=10^{20}$ are presented
in Figs.~2.7.4 through 2.7.6.
In Fig.~2.7.4, the dashed line is the graph of the GUE prediction $G(L)$,
the solid line is the graph of Berry's prediction $G(L) + B_T (L)$, and
the scatterplot is that of computed values of $V_T (L)$.
In Figs.~2.7.5 and 2.7.6 the graphs of the computed values of
$V_T (L)$ and of Berry's prediction $G(L) + B_T (L)$ were both drawn as solid lines, one superposed on the other.
The slight differences between the two curves show up as slight blotches on the graph.
(The empirical data is slightly more wiggly than $G(L) + B_T (L)$.)
We see that even for $L = 5 \times 10^5$, the agreement between
computed and predicted values is almost perfect.

A comparison of the graphs of \cite{Be4} (and of similar graphs drawn with the more extensive data that has been obtained
for $N=10^{12}$ in the present computation) with Figs.~2.7.4 through 2.7.6 shows that for $N=10^{20}$, the number variance oscillates less than for $N=10^{12}$.
The agreement of data with Berry's prediction is better for $N=10^{20}$.

While Berry's prediction \eqn{eq269} for $ V_T (L) $was based
on heuristic arguments, one can prove that
a version of the conjecture follows from
the RH and the pair correlation
conjecture \eqn{eq2211}.
This will be shown in a
separate paper \cite{Od4}.
\section{Lehmer phenomenon}\label{sc27}
\hspace*{\parindent}
For the RH to be true,
$|Z(t)| $ cannot have any relative minima between two
consecutive zeros
of $Z(t)$.
Cases where
\beql{eq271}
v_n = \max_{\gamma_n < t < \gamma_{n+1}}  |Z(t)|
\eeq
is very small (so that in a sense the RH is ``almost violated'') are referred to as
Lehmer's phenomenon \cite{Lr2}, and provide some of the more interesting heuristics both for and
against the RH (cf.~\cite{Od3}).
In this section we present statistics on the frequency of this phenomenon (which
does not have a precise definition).

The zero-locating program printed the largest value of $|Z(t)| $ that had been computed in each stretch of $10^4$ zeros.
To provide further information, the program was modified for the $N=10^{19}$, $N=10^{20}$, and $N = 2 \times 10^{20}$ data sets to obtain
statistical information about the behavior of
$v_n$.
Since getting a very good approximation to $v_n$ would have required
substantial computing time, what the program computed was the midpoint value
\beql{eq272}
w_n = |Z(( \gamma_n + \gamma_{n+1} ) / 2) | ~.
\eeq
When a value of $w_n > 250$ or $w_n < 5 \times 10^-4$ was encountered, it was printed together with $n$,
$\gamma_n$, and $\delta_n$.
(However, $w_n$ was not computed for a total of roughly 100 zeros at the ends of data sets.)

To see how good an approximation $w_n$ was to $v_n$, the values
\beql{eq273}
v_n^{\ast} = \max_{1 \le k \le 39} \left | Z \lt \gamma_n + \frac{k}{40}
( \gamma_{n+1} - \gamma_n ) \rt \right |
\eeq
as well as of $w_n$ were computed for
$n_0 \le n \le n_0 + 8 \times 10^5 -1$, where $n_0 = 10^{20} + 15, 316, 087$.
Let
\beql{eq274}
r_n = v_n^{\ast} / w_n ~.
\eeq
Then the maximal value of $r_n$ that was found was 1.43.
Only 755 out of the $8 \times 10^5$ values of $r_n$ were $> 1.2$, while the rms
value of $r_n -1$ was 0.029.
Among the 873 values of $n$ for which $\delta_n < 0.1$, the maximal
value of $r_n$ was 1.008, and the rms value of $r_n -1$ was
$5.1 \times 10^-4$.
For the 898 values of $n$ for which $\delta_n > 2.5$, the corresponding
numbers were 1.29 and 0.036.
For the 244 values of $n$ for which $w_n > 100$, these numbers
were 1.137 and 0.032, while for the 1426 values of $n$ for which
$w_n < 0.01$, they were 1.072 and 0.0066.
Thus in general the values of $w_n$ do provide good approximations to $v_n^{\ast}$, and therefore surely also to $v_n$.
This was to be expected on the basis of the GUE predictions (in particular that
the approximation would be exceptionally good when $\delta_n$ is small).
The size of $v_n$ is determined largely by the few zeros
nearest to $\delta_n$ (cf.~\cite{Hej5,Hej6}), and so under the assumption
of the GUE one can make quantitative predictions about the behavior of $r_n$.

Table~2.8.1 shows the frequency of occurrence of values of $w_n < 5 \times 10^-4$ among the approximately
$3 \times 10^8$
values of $n$ that were checked in the
$N=10^{19}$, $N=10^{20}$, and $N= 2 \times 10^{20}$ data sets.
The smallest value of $w_n$ that was found there was $1.82 \times 10^{-6}$,
for $n=10^{20} + 52, 127, 155$ and
$\delta_n = 0.00263$,
while the second smallest was $1.84 \times 10^{-6}$.

One might expect, and one does observe empirically, that the Lehmer phenomenon
is associated to small values of $\delta_n$.
If $\delta_n$ is small, then one might expect that $w_n$ is almost
proportional to $\delta_n^2$, since zeros other than $\gamma_n$ and
$\gamma_{n+1}$ ought to contribute multiplicative factors that
behave like a power of $\log \gamma_n$ on the average, and are at most
$\gamma_n^{o(1)}$ as $n \to \In$ (assuming the
Lindel\"{o}f conjecture).
Since the probability that $\delta_n \le x$ is about $\pi^2 x^3 /9$ for $x$ small (see Sections~\ref{sc22} and \ref{sc25}), one might conjecture that the probability
of $w_n < y$ might be proportional to $y^{3/2}$.
This would suggest that among the first $n$ zeros, the smallest $w_n$
might be
$n^{-2/3}+o(1)$ as $n \to \In$.
If true, this relation would settle an old question \cite{Ed} about the number of terms in the asymptotic part of the Riemann-Siegel formula that have to be used to
separate the zeros;
even the old estimate of Titchmarsh \cite{Tit1} with an error term of $O(t^{-3/4} )$ would suffice at large heights.

The above heuristic about the behavior of small $w_n$ is supported well by
empirical data.
Let $W$ denote the set formed by the $N= 10^{19}$ data set and the first 78,893,234 zeros in set $N= 10^{20}$.
(In the notation of Table~4.7.1, it is the union of sets $i$, $k$,
$l$, $m$, and $n$.)
In set $W$, 1976 values of $n$ have
$w_n < 5 \times 10^-4$.
Among these 1976 values,
the ratio $w_n / \delta_n^2$ varies between 0.0136 and 8.56, with a mean of 0.608 and a variance of 0.427.
Thus the correlation between $\delta_n^2$ and $w_n$ is only modest.
On the other hand, these $w_n$ follow almost perfectly the rule
conjectured above that the fraction of them that are $< y$ ought to be proportional
to $y^{3/2}$.
This can be seen
by looking at the ratio of the
$k$-th smallest $w_n$ to $5 \times 10^{-4} \times (k/1976)^{2/3}$, which varies between 0.715 and 1.267, with a mean of 1.01 and a variance of $9 \times 10^{-4}$, and from looking at a $q-q$ plot of the sorted $w_n$ against
$5 \times 10^{-4} \times (k/1976)^{2/3}$.
We also find good agreement between this prediction of the behavior of the small $w_n$ and the counts of Table~2.8.1, which are based on all of the zeros in the $N= 10^{19}$,
$10^{20}$, and $2 \times 10^{20}$ data sets.
Thus on average the influence of the neighboring $\gamma_k$ cancels out.

The most extreme example of the Lehmer phenomenon that was found during
the computations described in this paper occurs for $n= 10^{18} + 12, 376, 780$, where
$w_n = 5.28 \times 10^{-7}$ and $\delta_n = 0.001124$.
A graph of $Z(t)$
near
this point is given in Fig.~2.8.1.
(Figure~2.8.1 shows also what looks like another case of the Lehmer phenomenon near Gram point $n-5$, but in that case the minimum of $Z(t)$ reaches $-0.0094$, and so it does not
qualify under our definition.)
A much more detailed view of $Z(t)$ in a small neighborhood of this Lehmer
phenomenon is given in Fig.~4.7.1.
(That picture plays an important role in the discussion of the validity of the present
computations that is presented in Section~\ref{sc45}.)

The most extreme example of the
Lehmer
phenomenon
that is known was found by van~de~Lune et~al. \cite{LRW2}.
For $n=1, 048,449, 114$, they discovered that $\delta_n = 0.000310$, while
$v_n = 2.2 \times 10^{-7}$ $(\ge w_n )$.
Since the height of this example is only about the square root of that
for $n=10^{18} + 12, 376, 780$, it could be
argued that the higher example of this paper is even more extreme.
However, the $\delta_n$ found by van~de~Lune et~al. is by far the smallest of any that are known.
%\section{Large values of $\mbox{\protect\boldmath $\zeta$}$ {\bf (1/2+it)}}
\section{Large values of $\zeta (1/2+it)$}\label{sc28}
\hspace*{\parindent}
The largest value of $|Z(t) | = | \zeta (1/2 + it )| $ that was encountered by van~de~Lune et~al. \cite{LRW2} in their investigation of the first $1.5 \times 10^9$
zeros was 117.
Table~2.9.1 lists the largest values of $|Z(t)| $ that were encountered in each of the
data sets computed in this paper.
The main zero locating program kept track of the largest value of $|Z(t)| $ that had been
computed, but did not attempt to do a systematic search for large values.
However, since large values are usually
associated with large $\delta_n$, the standard zero locating procedure seemed to be
quite good at finding the high peaks in $|Z(t)| $.
For the $N=10^{19}$, $N=10^{20}$, and $N= 2 \times 10^{20}$ data sets, the more careful procedure
described in Section~\ref{sc27} was employed,
which provided even more reliable statistics.
The number of values of $n$ in those two data sets for which
$w_n$ (defined by Eq.~\eqn{eq272}) exceeded various
thresholds is given in Table~2.9.2.
(Section~\ref{sc3} lists some
values of $t$ for which $|Z(t)| $ is much larger
and which were found by a different procedure.)

The rate of growth of $|Z(t)| $ is one of the most intensively studied problems in the theory
of the zeta function, since bounds on it provide estimates on the
density of possible
zeros away from the critical line.
It is easy to show that
\beql{eq281}
|Z(t)| \le t^{\alpha + o(1)} ~~~\mbox{as} ~~~ t \to \In
\eeq
with $\alpha = 1/4$.
Exponential sum methods were used in the first few decades of this
century to show that \eqn{eq281} holds with $\alpha = 1/6$,
and then to successively lower this value of $\alpha$.
(See the Notes to Chapter~5 of \cite{Tit2} for a list of
the improvements.)
Until recently, the smallest value of $\alpha$ for which \eqn{eq281} was known to hold was $\alpha = 139/858 = 0.162004 \dd$,
proved by Kolesnik \cite{Ko}, and there were indications that this result
was close to the limit of what the two-dimensional ``exponent pair'' method that was being used could yield \cite{GK}.
However, Bombieri and
Iwaniec \cite{BI} have obtained a new method that gave $\alpha = 9/56 = 0.16071 \dd$.
This method was then developed by Huxley,
Kolesnik, and Watt in a series of papers,
and the latest result, proved by Huxley \cite{Hux} is that \eqn{eq281}
holds with
$\alpha = 89/570 = 0.15614 \dd$.

The Lindel\"{o}f hypothesis is the
statement that \eqn{eq281} holds with $\alpha =0$.
The RH yields a slightly stronger bound \cite{Tit2}
\beql{eq282}
|Z(t)| ~\le~ \exp (c( \log  t) ( \log  \log  t)^{-1} )
\eeq
for some $c>0$.
On the other side,
Balasubramanian and Ramachandra \cite{Bala,BR} have
shown that
\beql{eq283}
\max_{0 \le t \le T}  |Z(t) | \ge \exp \lt
\frac{3( \log  T)^{1/2}}{4( \log  \log  T)^{1/2}} \rt
\eeq
if $T$ is large enough and more generally, that if $\eta > 0$,
then for $T \ge T( \eta )$ and
$( \log  T)^\eta \le H \le T$, we have
\beql{eq284}
\max_{T \le t \le T+H} |Z(t)| \ge
\exp \lt
\frac{3( \log  H)^{1/2}}{4( \log  \log H)^{1/2}} \rt ~.
\eeq
Montgomery \cite{Mon3} has conjectured that \eqn{eq283} is close to the real
rate of growth of $|Z(t)| $.

While the data that was collected about large values of $|Z(t)| $ probably
does reflect accurately the behavior of the zeta function in these ranges,
it does not help in assessing what the true rate of growth of $|Z(t)| $ is.
There are two main problems.
One is the relatively small number of zeros that were investigated.
Since large values of $|Z(t)| $ are rare, we probably do not even have a
good representation of the large values of $|Z(t)| $ for
$t < \gamma_n$, $n=10^{20}$.
(This is supported by the results of Section~\ref{sc3}, where much higher values were
found by special methods.)
Another problem in using our data to assess the true growth rate of $|Z(t)| $
arises from the slow approach to its true asymptotic behavior.
As is noted in Section~\ref{sc211} (see especially Fig.~2.10.1), even $\log |Z(t)| $ in the ranges
that have been investigated can be far from its eventual distribution.
Furthermore, as was noted in the Introduction, even when one investigates at heights
$t \approx 1.5 \times 10^{19}$, it is hard to tell the differences
in growth rates between
various functions.
(The situation is not as bleak as might seem from the argument used in the Introduction,
since one can use sensitive tools such as ratios of values of a function
at different points to estimate its growth rate, but that only helps to a limited extent.)
Note that for $t= \gamma_n$, $n=10^{20}$,
the bound
\eqn{eq283} is only 12.9.

Before concluding this section, we present some more statistics on the large values of
$|Z(t)| $ that were found in the
$W$ data set, which was defined in Section~\ref{sc27}, and which is a subset of the
$N= 10^{19}$ and $10^{20}$ data sets.
In set $W$,
565 values of $n$ were recorded for which $w_n > 250$.
The largest is $w_n = 631.7$ for $n = 10^{20} + 13, 704, 916$, for which
$\delta_n = 3.1428$.
(The maximum of $|Z(t)| $ between $\gamma_n$ and $\gamma_{n+1}$ is at least 641,
and there is no violation of Rosser's rule
near
$\gamma_n$.)
Of the 565 values, 94 are associated with violations of Rosser's rule.
(Of the 28 values of $n$ for which $w_n > 400$, 7 are associated with violations of
Rosser's rule.)
The smallest value of $\delta_n$ that was found for these 565 values of $n$ was
2.07, and the largest was 4.03.

There was a substantial correlation between $\delta_n^2$ and $w_n$ among these 565 samples in set $W$.
The ratio $w_n / \delta_n^2$ was in the range
(19.47, 64.74), with a mean of 35.47 and variance 61.32.
However, at large heights one would expect this correlation to diminish, in contrast to the situation
for Lehmer's phenomenon (Section~\ref{sc27}).
In the latter case the GUE theories predict that $\delta_n^2$ will
occasionally get as small as $n^{-2}/3$, so that the influence of the other zeros (likely to be $n^{o(1)}$ because of the
Lindel\"{o}f hypothesis and the separation of the other zeros
that is
predicted by the GUE) will not affect the size of $Z(t)$ very much.
On the other hand, the GUE theories predict that $\delta_n^2 = O ( \log  n )$, and since $Z(t)$ is known to get much larger (cf.~\eqn{eq283}), this must be due to some
long-range imbalances in the locations of the zeros.
One model for the distribution of $Z(t)$ (first proposed informally by Montgomery,
and worked out in detail by
Bombieri and Hejhal
\cite{BH,Hej5,Hej6}) predicts that at large heights, the size of $Z(t)$ is determined primarily by long ``amplitude'' waves, which are then modulated by local
distributions of zeros.
This model predicts that there should be clusters of large values of $|Z(t)| $, and
that over wide ranges, $w_n$ ought to depend mostly on the ``amplitude'' waves, and not on $ \delta_n$.
That there is a very strong correlation between the large $w_n$ and $\delta_n^2$ in our data might therefore indicate that we are not yet seeing the true asymptotic behavior.
%\section{Moments of $\mbox{\protect\boldmath $\zeta$}$ {\bf (1/2 + it)}}
\section{Moments of $\zeta (1/2 + it)$}\label{sc29}
\hspace*{\parindent}
It is conjectured that for every $\lambda \ge 0$,
\beql{eq291}
\lim_{T \to \In}  T^{-1} ( \log T)^{- \lambda^2}
\int_0^T  | Z(t) |^{2 \lambda} dt = c( \lambda )
\eeq
exists, with $c( \lambda ) > 0$ for all $\lambda$.
A proof of this conjecture, or even of some much weaker bound, would be very important, since it would prove the Lindel\"{o}f
conjecture.
However, this conjecture is only known to be true for $\lambda =0$ with
$c(0)=1$ (trivial), $\lambda =1$ with $c(1)=1$, and $\lambda =2$ with $c(2) = (2 \pi^2 )^{-1}$ (see the Notes for Chapter~7 in \cite{Tit2} for detailed information and references).
No specific values have been conjectured for $c( \lambda )$ in general,
but under the assumption of the RH, Conrey and Ghosh \cite{CG1} have shown that
$c( \lambda ) \ge c_1 ( \lambda )$, where
\beql{eq292}
c_1 ( \lambda ) = \Gamma ( 1+ \lambda^2 )^{-1} 
\prod_p \left \{ \lt 1 - \frac{1}{p} \rt^{\lambda^2}
\sum_{m=0}^\In ~ \lt \frac{\Gamma (m+ \lambda )}{m! \Gamma ( \lambda )} \rt^2 p^-m \right \} ~,
\eeq
and since $c( \lambda ) = c_1 ( \lambda )$ for $\lambda =0$ and 1, they suggested that perhaps $c( \lambda ) = c_1 ( \lambda )$ for all $\lambda \in [0, 1]$.
Since $c_1 (2) = ( 4 \pi^2 )^{-1} = c(2)/2$, equality of $c( \lambda )$ and
$c_1 ( \lambda )$ is unlikely outside the range
$0 \le \lambda \le 1$.
(There is a mistake on this point in the Notes to Chapter~7 of \cite{Tit2}.)
Conrey and Ghosh \cite{CG3} have shown that the derivatives of $c_1 ( \lambda )$ and $c( \lambda )$ with respect to $\lambda$ agree at $\lambda =0$ and 1.
Also, for $0 \le \lambda < 2$, Heath-Brown [HB2] has shown under the assumption of the RH that if $c ( \lambda )$ exists, it is not much larger than predicted by the Conrey-Ghosh conjectures.

One purpose of this section is to provide some numerical evidence about possible values of $c( \lambda )$.
One might expect that if
\beql{eq293}
r( \lambda , T, H) = H^{-1} ( \log  T)^{- \lambda^2}
\int_T^{T+H}  |Z(t)|^{2 \lambda} dt ~,
\eeq
then $r( \lambda , T, H) \sim c( \lambda )$ as $T \to \In$,
if $H$ grows sufficiently fast with $T$ while $\lambda$ is held fixed.
Table~2.10.1 presents some values of $r( \lambda , T, H)$
computed for $T= \gamma_{n_0}$ with $n_0 = 10^{20} + 47, 098, 588$ and
$T+H = \gamma_{n_1} , n_1 = n_0 + 10^6$.
Each of the $10^6$ gaps between consecutive zeros was divided into 40
intervals, $Z(t)$ was evaluated at the endpoints of these subintervals,
and Simpson's rule was applied to estimate the integral.
Variations on this procedure showed that it produced estimates that were
accurate to at least three decimal places (and more for high moments, as
Simpson's rule is least accurate for small $\lambda$, where the function has
singularities at zeros
that are hard to deal with).
However, the values in the tables, especially for large $\lambda$, have to be used with
caution because even an interval of $10^6$ zeros around zero number $10^{20}$ is too small
to be truly representative.
For example,
similar data was obtained for
$T= \gamma_{n_2}$ with $n_2 = 10^{20} + 15, 316, 087$ and
$T+H = \gamma_{n_3}$, $n_3 = n_2 + 8 \times 10^5$, and also for
$T= \gamma_{n_4}$, $n_4 = 10^{20} - 15, 409, 244$,
$T+H = \gamma_{n_5}$, $n_5 = n_4 + 10^6$.
For $\lambda =1$, the values found there differed by less than
0.5\% from those in Table~2.10.1, but for $\lambda = 2.5$ these values were 1.20
and 0.752 times those in Table~2.10.1, respectively.
The problem is that high
moments are determined largely by the few exceptionally large values of $Z(t)$, and those are
rare.
(See the next section for some further evidence of this.)
To get a good sample, for large $\lambda$, one would
need to integrate $|Z(t)|^{2 \lambda}$ over much longer intervals.

The data in Table~2.10.1 are consistent with the Conrey-Ghosh
conjectures that $c( \lambda ) = c_1 ( \lambda )$ for $0 \le \lambda \le 1$.
However, given the differences between the empirical data for $\lambda =1$ and $\lambda =2$ and the known asymptotic values, it is hard to draw any
definitive conclusions.
For $\lambda =1$, estimates of the second moment of $Z(t)$ are known
that are better than \eqn{eq291}.
They are of the form
\beql{eq294}
\int_0^T ~Z(t)^2 dt = T( \log  T -1- \log (2 \pi ) + 2 c_0 ) + E(T) ~,
\eeq
where $c_0$ denotes Euler's constant
$(= 0.577215 \dd)$, and $|E(T)| = O(T^\alpha )$ for various $\alpha < 1/3$.
(The best current value of $\alpha$ is $139/429 + o(1)$ as $T \to \In$,
due to Kolesnik \cite{Ko} and in a slightly sharper form to Hafner and
Ivi\'{c} \cite{HI}.
Note that $139/492 = 0.3240 \dd$)
If we let $r^{\ast} ( \lambda , T, H)$ be defined similarly
to $r( \lambda , T, H)$, but with $\log T$ in \eqn{eq293} replaced by $\log T - \log ( 2 \pi ) + 2 c_0$, we find that for the values of $T$ and $H$ that were used to
compute Table~2.10.1, $r^{\ast} (1, T, H) = 1.004$, which is closer to the asymptotic value $c(1) =1$ than the value of
$r(1, T, H) = 0.989$.
(The other two sets of values that were considered give $r^{\ast} (1, T, H) = 1.0003$ and 0.9995, respectively.)
Thus a major
problem in using empirical data is that we do not have good conjectures about asymptotics of moments of $Z(t)$, and that second order terms in those asymptotics are likely to be only slightly smaller
than the main terms.
(See also Section~\ref{sc210} on deviations between observed and expected behavior of $Z(t)$.)

Some data were obtained also about the negative moments of $|Z(t)| $.
Table~2.10.2 shows some values of
$$\frac{1}{H} \int_T^{T+H}
|Z(t)|^{- 2 \lambda} dt$$
for $T$ and $H$ as in Table~2.10.1.
(The values for $T= \gamma_{n_2}$, $T+H = \gamma_{n_3}$,
were essentially identical.)
They were obtained by applying Simpson's rule to the
inner 38 subintervals in every gap between consecutive zeros, and approximating $|Z(t)| $ by a linear function on the two outer subintervals.

Conrey and Ghosh \cite{CG2} have shown (assuming the RH) that
\beql{eq295}
\frac{1}{M}
\sum_{m=1}^M \max_{\gamma_m < t < \gamma_m+1} Z(t)^2 \sim
\frac{1}{2} (e^2 - 5) \log  ( \gamma_M /( 2 \pi ))
\eeq
as $M \to \In$.
Since $c(1) =1$, this means that on average $Z(t)^2$ at its maxima is
$1+ \frac{1}{2} (e^2 -7) = 1.1945 \dd$ times the average of $Z(t)^2$ over the entire range $0 < t \le \gamma_M$.
(This surprisingly small factor of 1.1945...
occurs because
the values of $Z(t)^2$ at the critical points where they achieve
their maxima are not weighted by the lengths of the intervals on which the maxima are computed.
Large values of $Z(t)$ are usually associated to large gaps between consecutive
zeros.)
Computation over the range from $T= \gamma_{n_2}$ to $T+H = \gamma_{n_3}$ yielded a value of 1.224... instead of the asymptotic value of
$1.1945 \dd $.
(The value 1.224... is probably a slight underestimate of the correct ratio,
since the actual maxima were not
determined, but the largest of the values at the 40 evenly spaced points was used.)

Gonek \cite{Gon1} has shown, again assuming the RH, that
\beql{eq296}
\frac{1}{M} \sum_{m=1}^M Z( \gamma_m + i \alpha \Delta )^2  \sim
\lt 1 - \lt \frac{\sin \pi \alpha}{\pi \alpha} \rt^2 \rt \log ( \gamma_M / ( 2 \pi ))
\eeq
as $M \to \In$,
where $\Delta = 2 \pi ( \log ( \gamma_M /( 2 \pi )))^{-1}$.
Computations for $\alpha =0.1$, $0.2$, $\dd , 0.9$ and over the zeros
numbered $n_4$, $n_4 +1 , \dd, n_5 -1$ showed reasonably
good argument, but
with the ratio of empirical data to Gonek's asymptotic estimate
declining by 4\% as $\alpha$ goes from 0.1 to 0.9.
%\section{Distribution of values of $\mbox{\protect\boldmath $\zeta$}$ {\bf ( 1/2 + it)}}
\section{Distribution of values of $\zeta ( 1/2 + it)$}\label{sc210}
\hspace*{\parindent}
Since
$$
\log \zeta (1/2 + it) = \log  | Z (t) | + \pi i S(t) ~,$$
it is not surprising that methods that yield the distribution of $S(t)$
should give corresponding results for $\log  | Z(t) | $.
Selberg in unpublished manuscripts studied
mean values of $( \log  \zeta ( 1/2 + it))^h ( \log \zeta (1/2 - it ))^k$ for
nonnegative integers $h$ and $k$, and his results imply, for example, that
for rectangles $E$ in $R^2$
\beql{eq2101}
\lim_{T \to \In}  \frac{1}{T} \left | 
\left \{ t :~ T \le t \le 2T ,
\frac{\log  \zeta (1/2 + it )}{( 2^{-1} \log \log T)^{1/2}} \in E \right \} \right |
= ( 2 \pi )^{-1} \iint_E
e^{- ( x^2 + y^2 )/2} dx dy~,
\eeq
so that in particular, for any $\alpha < \beta$,
\beql{eq2102}
\lim_{T \to \In}  \frac{1}{T} \left | 
\left \{
t :~ T \le t \le 2T, ~\alpha < \frac{\log |Z(t)|}{( 2^{-1} \log \log T )^{1/2}} < \beta \right \} \right | =
(2 \pi )^{-1/2}~\int_\alpha^\beta e^{- x^2 /2} dx ~.
\eeq
Thus the real and imaginary parts of $\log \zeta (1/2 + it )$ behave like independent
normal variables with means 0 and variances $( \log  \log  t) /2$.
While Selberg's results have
not been published,
they were known to some mathematicians (see \cite{Hej6,Joy1,Jut,Mon6},
and some extensions of Selberg's results have been obtained
by Joyner \cite{Joy1} and Tsang \cite{Ts2}.
The weaker result \eqn{eq2102} has been reproved by
Laurinchikas \cite{Lau1,Lau2,Lau3,Lau4,Lau5}.

The critical issue here is whether the approximation \eqn{eq2102} is accurate even for $T$ fixed and
$\alpha$ and $\beta$ varying over wide ranges.
If that is the case, then we are led to expect that something like \eqn{eq291}
holds.
Furthermore, if the approximation is accurate even for $\alpha$ and $\beta$ relatively
large (compared to $T$), one would expect that the maximal size of $|Z(t)| $, for $0 \le t \le T$, would be
$\exp (( \log  T)^{1/2 + o(1)} )$,
which is conjectured by some to be the true rate of growth of $Z(t)$
(cf.~Section~\ref{sc28}).
Thus it is of substantial interest to find out more about the tails of the distribution
of $\log |Z(t)| $.

For $n_0 \le n \le n_1 -1$,
$n_0 = 10^{12} - 6032$, $n_1 = n_0 + 10^6$,
each interval $( \gamma_n , \gamma_{n+1} )$ was partitioned into 40 equal
subintervals, $Z(t)$ was evaluated at the endpoints of these subintervals, and
a linear approximation to $Z(t)$ between consecutive evaluation points
was used to estimate
\beql{eq2103}
b_{\alpha , \beta} =
\frac{1}{\gamma_{n_1} - \gamma_{n_0}}  \left |\{ t :
\gamma_{n_0} \le t \le \gamma_{n_1} , 
\alpha \le \log | Z(t) | \le \beta \} \right |
\eeq
for $\beta = \alpha + 1/100$,
$\alpha = k/100$,
$- 1000 \le k < 1000$.
The mean of this distribution
(as derived from the
$b_{\alpha , \beta}$ data) was $5.29 \times 10^-4$ and the variance
was 2.2930.
(In Fig.~2.11.1 it is labelled as the $10 = 10^{12}$ distribution.)
Similar data was obtained for $n_2 \le n \le n_3 -1$, $n_2 = 10^{20} + 15, 316, 087$,
$n_3 = n_2 + 10^6$, and there the mean
was $5.20 \times 10^-4$ and the variance was 2.5657.
(This is the $N=10^{20}$ distribution.)
Based on \eqn{eq2102}, one would expect mean values of 0, which is very close to the
calculated values, given the errors in the computation
and sampling errors.
The values for the variances would be expected to be $( \log  \log  T) /2$,
where $T$ is the height of the data set,
which equals 1.635 and 1.894 for the two data sets,
respectively.
Since $( \log  \log  T) /2$ is only the asymptotic value and
increases very slowly, lower order terms can be expected to be significant, and so the agreement between observed data and theory is reasonably good on this point
as well.
However, the shapes of the observed distributions of $\log  |Z(t)| $ appear to be different
from the asymptotic normal distribution.
To obtain a good comparison, the two distributions for $N=11^{12}$ and $10^{20}$ were each scaled to have variance~1, and were plotted in Fig.~2.10.1 together with the standard normal distribution.
We see that while the fit of the $N=10^{20}$ data is slightly better than
that for $N=10^{12}$, it is not much better.
This is in great contrast to the fit of the data for $S (t)$ (which, apart from a factor of $1/ pi$, is the imaginary
part of $\log \zeta (1/2 +it)$, while
$\log  | Z(t)| $ is the real part of it) which, as we see in Section~\ref{sc24} and Fig.~2.5.1, is much better.
It might be of some interest to compute second order terms in the expansion of moments of $\log  | Z(t) | $ to see what is responsible for the deviations from the asymptotic behavior that are visible in the data.
In view of Goldston's results \cite{Go2} (mentioned in Sections~\ref{sc24} and \ref{sc26}), it seems likely that such higher order terms depend on the pair correlation of zeros, and even
on higher order correlations.

The area between the empirical distribution curve for $N=10^{12}$ in Fig.~2.11.1
and the normal curve is 0.132, while for $N=10^{20}$ the corresponding area is 0.114.
In both cases these areas are much larger than those for the distribution
curves for $S(t)$ discussed in Section~\ref{sc24}, which confirms the
impression one obtains by comparing Fig.~2.5.1 to Fig.~2.11.1.

Table~2.11.1 presents extensive data on the moments of $\log |Z(t)| $.
The six sets of data summarized in this table were all obtained by choosing
$10^6$ random points in an interval of length $1.5 \times 10^5$.
For $N=10^{12}$, this interval started near zero number $10^{12} - 6, 032$.
For $N=10^{18} (a)$ and $N=10^{18} (b)$, the intervals were the same,
starting near zero number $10^{18} - 8, 839$ but the random sequences were different, since different seeds were chosen for the
pseudorandom number generator.
This was done to estimate the size of the sampling error.
For $N=10^{20} (c)$, the starting point was near zero number
$10^{20} - 48, 776$, while for $N=10^{20} (d)$, it was near zero number
$10^{20} + 15, 316, 087$.
The mean and second moment for each data set are shown in the $k=1^{\ast}$ and $k=2^{\ast}$ entries,
respectively.
These were then applied to translate and scale the data sets to obtain mean equal to 0 and variance equal to 1,
for ease of comparison with the standard normal distribution.
The $k$-th entry in the table, $1 \le k \le 10$, given the $k$-th moment of
each scaled data set, and the last column gives the corresponding value for the normal
distribution (0 for $k$ odd, $(k-1) \cdot (k-3) \cdot \dd \cdot 3 \cdot 1$ for $k$ even).

Given that the distribution of $\log |Z(t)| $ differs so much from the
expected normal one, we have to treat the data about moments of $|Z(t)| $, for example,
with extreme caution, as they may not be representative of the true asymptotic behavior.
Furthermore, the general distribution of $Z(t)$ may be even
further from
what happens higher up.

Figure~2.11.2 presents some empirical data
on values of $Z(t)$.
This figure is based on the values of $Z(t)$ in the three intervals covering $2.8 \times 10^6$ zeros that were described in the preceding
section.
For each interval between consecutive zeros, the function $|Z(t)| $ was approximated
on 40 equal-sized subintervals by a linear function,
and the length of the interval on which this linear approximation was in each range
$[k-1, k)$ for $k \ge 0$ was computed.
If $A_k$ denotes the length of all the intervals on which the linear
approximations were in $[k-1, k)$, and
$$q_k = \frac{A_k}{\sum_{j=1}^\In A_j}
$$
the fraction of time spent there, then the plot in Fig.~2.11.2 shows $\log q_k$.
From this graph and other graphs based on the data from each of the
three main intervals separately, it appears that for $k <\sim 150$,
the empirical data obtained so far represents well the long-run statistics
of $|Z(t)| $
at the heights that were investigated.
In particular, the curves of $\log q_k$ for the three main intervals
are almost identical in that range.
On the other hand, for $k >\sim 250$, the behavior of $\log q_k$ is dominated
by a few large peaks of $|Z(t)| $ (which also account for a large part of the values
of high moments of $Z(t)$ dealt with in the previous section).
In particular, the segments of the graph in Fig.~2.11.2 that shoot up are caused by
high peaks.
The final region $(k \ge 353)$ is due to two peaks
where $|Z(t)| $ reaches
about
460, and the preceding
region of increase in $\log q_k$ is due to a point where $|Z(t)| $ is around 351.
%\section{Values of $\mbox{\protect\boldmath $\zeta '$} {\bf (1/2 + i} \mbox{\protect\boldmath $\gamma$} {\bf )}$}
\section{Values of $\zeta' (1/2 + i \gamma )$}\label{sc211}
\hspace*{\parindent}
Under the assumption of the RH and of a weak consequence of the pair correlation conjecture, namely that for some $\tau > 0$, there is a constant $B$ such that
\beql{eq2111}
\limsup_{N \to \In} \frac{1}{N} \Biggl|
\{ n : N \le n 2 N ,~ \delta_n < c \} \Biggr| \le Bc^\tau
\eeq
holds uniformly for all $c \in (0, 1)$, Hejhal \cite{Hej6}
has shown that for all $\alpha < \beta$,
\beql{eq2112}
\lim_{N \to \In} \frac{1}{N} \left | \left \{
n : N \le n \le 2N ,~ \frac{\log \left | \frac{2 \pi Z' ( \gamma_n )}{\log ( \gamma_n ( 2 \pi )^{-1} )} \right |}
{( 2^{-1} \log \log N)^{1/2}} \in ( \alpha , \beta ) \right \} \right| =
( 2 \pi )^{-1/2} \int_{\alpha}^\beta
e^{- x^2 / 2} dx ~.
\eeq
(Note that under the RH, which we assume throughout this section,
$| \zeta' ( \rho ) | = |Z' ( \gamma )| $ if $\rho = 1/2 + i \gamma$.)

As was the case with the values
of $Z(t)$, we would like to obtain more information about the tails of the distribution of
$Z' ( \gamma_n )$, and in particular about the moments.
Let us define
\beql{eq2113}
J_\lambda (T) = \sum_{\gamma_n \le T} ~| Z' ( \gamma_n ) |^{2 \lambda} ~.
\eeq
Then $J_\lambda (T)$ exists for all $\lambda \ge 0$, and if the zeros of the zeta function are all simple (as they are conjectured to be, and as
is the case with all of the zeros that have been dealt with numerically) then $J_\lambda (T)$ also exists for $\lambda < 0$.
The only nontrivial
asymptotic result
was proved by
Gonek \cite{Gon1} under the
assumption of the RH;
$$J_1 (T) \sim \frac{T}{24 \pi}  ( \log  T)^4 ~~~\mbox{as} ~~~
T \to \In ~.
$$
It is clear that $J_0 (T) = N(T) \sim \frac{T}{2 \pi} \log T$ as
$T \to \In$, and it is known
(cf.~\cite[Section~14.27]{Tit2}) that
$J_{- 1/2} (T) /T \to \In$ as $T \to \In$.
Gonek \cite{Gon3} has also shown that (under the RH)
$J_{-1} (T) \ge cT$ for some $c > 0$.
If the limit law \eqn{eq2112} holds even for small $N$,
and the tails of the distribution of $Z' ( \gamma_n )$ are not too large,
then we might expect (as was suggested by Hejhal \cite{Hej6} and stated explicitly
by Gonek \cite{Gon3}) that
$J_\lambda (T)$ is on the order of
\beql{eq2114}
T( \log T )^{( \lambda +1)^2} ~~~\mbox{as} ~~~ T \to \In ~.
\eeq
Furthermore, Gonek \cite{Gon3} has conjectured that
\beql{eq2115}
J_{-1} (T) \sim 3 \pi^{-2} T ~~~\mbox{as} ~~~T \to \In ~.
\eeq

Approximate values were obtained for $|Z' ( \gamma_n )| $,
$n_0 \le n \le n_0 + 10^6 -1$, where
$n_0 = 10^{20} + 15, 316, 107$.
Since the behavior of $Z(t)$ is determined primarily by zeros close to $t$
(cf.~\cite{BH,Hej6}),
it was assumed that for $t$ near $\gamma_n$, $Z(t)$ is approximated well by
\beql{eq2116}
a \prod_{j=-20}^{20}  (t- \gamma_n+j ) ~,
\eeq
where $a$, representing the influence of zeros further from $\gamma_n$, is almost a constant, and this led to approximating $Z' ( \gamma_n )$ by
\beql{eq2117}
\epsilon^{-1} Z( \gamma_n + \epsilon )
\prod_{j=-20 \atop j \neq 0}^{20}
\frac{\gamma_n+j}{\gamma_n + \epsilon - \gamma_n+j} ~,
\eeq
where $\epsilon = ( \gamma_{n+1} - \gamma_n ) /40$.
Varying the number of terms in the heuristic approximation
\eqn{eq2116} as well as varying $\epsilon$ showed that \eqn{eq2117} does produce
good approximation to $Z' ( \gamma_n )$.

The smallest value of $| Z' ( \gamma_n )| $ that was found was 0.13,
while the largest was $2.47 \times 10^3$.
The values of $\log  |Z' ( \gamma_n )| $ had a mean of 3.35 and a variance of 1.14,
in contrast
to 1.91 and 1.9, respectively, which are
predicted by Hejhal's result \eqn{eq2112}.
Given the slow rate of growth of these quantities, second order terms
in the asymptotic results are likely to be large, so this difference between expected and observed values is probably not significant.
If we let
\beql{eq2118}
v_n = ( \log  | Z' ( \gamma_n )| - m ) / \sigma ~,
\eeq
where $m$ is the mean and $\sigma$ the standard deviation of our set of $\log  |Z' ( \gamma_n )| $, then Fig.~2.12.1 shows a
comparison of the distribution of $v_n$ with the standard normal distribution.
The line is the standard normal density,
while the scatterplot represents a histogram of $v_n$;
for each interval $[ \alpha , \beta )$, $\alpha = k/50$, $\beta = \alpha + 1/50$,
a star is placed at $(x, y)$, $x = (( \alpha + \beta ) /2-m)/ \sigma$,
$y = \sigma  b_{\alpha , \beta}$, where
\beql{eq2119}
b_{\alpha , \beta} = {50}{10^6} \left |  \{ n :~ v_n \in [ \alpha , \beta ) \} \right | .
\eeq
It is worth noting that the distributions of $\log  | Z (t)| $ and $\log  |Z' ( \gamma_n ) | $ are both supposed to be asymptotically normal,
but the convergence appears much faster for $\log  | Z' ( \gamma_n )| $,
as is revealed by a comparison of Fig.~2.12.1 to Fig.~2.11.1.
This is true even though the asymptotic normality of $\log  |Z(t)| $ is an unconditional
theorem, while that of $\log |Z' ( \gamma_n ) | $ depends on unproved assumptions.

Table~2.12.1 lists the moments of the $v_n$ and of the asymptotic normal distribution.
The entry for $k=1 , \dd , 10$ denotes the $k$-th
moment of $v_n$ and the normal distribution, while the $k=1^{\ast}$ and $k=2^{\ast}$ entries give the first two moments of $\log  |Z' ( \gamma_n ) | $,
respectively.
Comparison with Table~2.11.1 again shows much better agreement between empirical and expected values
for $\log |Z' ( \gamma_n ) | $ than for $\log |Z(t)| $.

Moments of $Z' ( \gamma_n )$ for the $10^6$ values that
were computed are shown in Table~2.12.2.
Since \eqn{eq2114} suggests that for $M$ relatively large,
\beql{eq21110}
J_\lambda^{\ast} (M, N) = \frac{1}{M} \sum_{n=N+1}^{N+M} 
| Z' ( \gamma_n ) |^{2 \lambda}
\eeq
ought to be of the order of magnitude of
$$
( \log  \gamma_N )^{( \lambda +1)^2 -1} ~,
$$
while for $\lambda =1$ and $\lambda = -1$ we ought to have the more precise relations
\begin{eqnarray}
\label{eq21111}
J_1^{\ast} (M, N) & \sim & \frac{1}{12} ( \log  \gamma_N )^3~, \\
\label{eq21112}
J_1^{\ast} (M, N) & \sim & 6 \pi^{-2} ( \log \gamma_N)^{-1}
\end{eqnarray}
(the asymptotic relations holding as $M, N \to \In$ with $M$ relatively large).
Table~2.12.2 shows the ratio of empirical to
expected values, namely
\beql{eq21113}
r_\lambda = J_\lambda^{\ast} ( 10^6 , n_0 -1 ) ( \log  \gamma_{n_0} )^{1- ( \lambda +1)^2} ~.
\eeq

The value for $\lambda =1$ is in excellent agreement with \eqn{eq21111} (which is a theorem under the assumption of the RH),
while the value for $\lambda =-1$ is reasonably consistent with that of
\eqn{eq21112}.
Since \eqn{eq21112}
is derived from Gonek's conjecture \eqn{eq2115}, this
supports the conjecture.

A theorem announced by Fujii \cite{Fu4} (which assumes the RH) states that
\beql{eq21114}
\begin{array}{r@{~}l@{~}l}
\ds_{0 < \gamma_n \le T} \zeta' ( 1/2 + i \gamma_n ) & = &
\df{T}{4 \pi}  \log^2  \df{T}{2 \pi} +
c_0  \df{T}{2 \pi}  \log \df{T}{2 \pi} \\
~~ \\
&& + c_1 ~ \df{T}{2 \pi} + O(T^{9/10 + o(1)} )
\end{array}
\eeq
as $T \to \In$, where $c_0$ and $c_1$ are explicit constants.
This turns out to be in excellent agreement with the empirical result
\beql{eq21115}
\sum_{n=n_0}^{n_0 +10^6 -1} \zeta' (1/2 + i \gamma_n ) =
2~.181 \times 10^6 + i 8.7 \times 10^3 .
\eeq

The approximate procedure \eqn{eq2117} that was used to estimate $Z' ( \gamma_n )$ can be replaced by a much more
rigorous and accurate method.
The algorithm of
cite{OS} that was used to compute $Z(t)$ precomputes a set of values from which $Z(t)$ is obtained
by interpolation.
However, the main interpolation formula \eqn{eq4315} can be differentiated
with respect to $t$, which enables one to compute
$Z' (t)$ (and therefore also $\zeta' (1/2 + it )$) from the
basic data.
If such a program were written, it could be used to check the speed of
convergence of the distribution of
$\log  | \zeta' (1/2+ it)| $ to the Gaussian
limit that has been shown to hold under the assumption of the RH by Hejhal \cite{Hej6}.
\section{Gram points and blocks}\label{sc212}
\hspace*{\parindent}
{\em Gram's law} is the empirical observation that
$Z(t)$ usually changes sign in each
{\em Gram interval}
$G_n = [g_n , g_{n+1} )$, $n \ge -1$.
(The Gram points $g_n$ are defined in Section~\ref{20}.)
Gram \cite{Gram} observed that it held in the range of values he
investigated, but he conjectured that it would fail eventually.
The first counterexample occurs for $G_125$, and was discovered by Hutchinson \cite{Hu}.
If Gram's law held universally, the RH would obviously be true.
However, it is known that this ``law'' fails infinitely often.
On the other hand, it does hold for a large fraction of cases.
For $n \le 1.5 \times 10^9$, Gram's law holds 72.79\% of the time \cite{LRW2}, among $10^6$ Gram intervals near zero number $10^{12}$, it holds
70.82\% of the time, and among $10^6$ Gram intervals near zero number $10^{20}$, it holds 68.9\% of the time.
(Under the GUE and some further assumptions to be discussed later, one might expect
that asymptotically, Gram's law should hold about 66.3\% of the time.)

One barely plausible reason Gram's law might hold (and why the RH might hold) is that in the Riemann Siegel formula for $Z(t)$ (see Eq.~\eqn{eq412}) the
leading term equals $2(-1)^n$ at $t=g_n$.
If this term, which is the largest, were always dominant, then Gram's law and the RH would follow.
We now know this to be false, but there is still interest in the behavior
of $Z(t)$ at Gram points, since sign changes of $Z(t)$ correspond to zeros of the zeta function on the critical
line.

A Gram point $g_n$ is called
{\em good} if
$(-1)^n Z(g_n ) > 0$, and
{\em bad}
otherwise.
A {\em Gram block} is an interval
$B_n = [g_n , g_{n+k} )$ such that $g_n$ and $g_{n+k}$
are good Gram points, while $g_{n+1} , \dd , g_{n+k-1}$ are bad Gram points.
The {\em length} of a Gram block
$B_n = [g_n , g_{n+k} )$ is $k$.
The {\em pattern of zeros} in a Gram block $B_n = [g_n , g_{n+k} )$ is the string $a_1 \cdots a_k$, where $a_i$ denotes the number of zeros of
$Z(t)$ in $[g_{n+i-1} , g_{n+i} )$.
Since no Gram interval with more than 4 zeros has even been found,
writing $a_1 \cdots a_k$ without comma separators is unambiguous.
(Gram intervals with arbitrarily many zeros almost surely exist, but given the GUE predictions about zeros repelling each other, they
are likely to be rare.)

The statistics that have been collected on Gram intervals and blocks (as well as on exceptions to Rosser's rule, which are discussed in Section~\ref{213}) are subject to
errors, not only because of the roundoff problems that have been 
mentioned before and are discussed extensively in Section~\ref{sc4}, but also
because
even if the
computations of $Z(t)$ were exact,
Gram points were determined only approximately,
so that the determinations of the signs of $Z(g_n )$ were not always certain.
No special precautions were taken to deal with this problem
(such as checking on the size of the computed value of $Z(g_n )$) as it was felt that this was unlikely to affect general statistics.

The computations of \cite{LRW2} of the first $1.5 \times 10^9$ zeros found only 6 Gram
blocks of length 9, and none of lengths $\ge 10$.
In contrast, the maximal lengths of Gram blocks found during the present
computations were 9 for $N=10^{12}$,
9 for $N=10^{14}$, 11 for $N=10^{16}$, 13 for $N=10^{18}$ (1 case), 12 for $N=10^{19}$, 14 for $N=10^{20}$ (2 cases, with zero patterns $01111111113110$ and $01111111111130$),
and 13 for $N = 2 \times 10^{20}$.

Table~2.13.1 gives the fraction of Gram blocks in given data sets with given
lengths.
The $N=1$ and $N=1.4 \times 10^9$ data is derived from
Table~1 of \cite{LRW2},
and comes from two sets of $10^8$ Gram intervals each, the first one starting
at $g_0$, the second at $g_n$ for $n=1.4 \times 10^8$.
The $N=10^{12}$ data is based on only 1,590,000 Gram intervals.

The main program did not keep track of Gram blocks according to their pattern of zeros.
However, a special study was made of two blocks of $10^6$ Gram intervals each,
one starting at $g_{n_1}$,
$n_1 = 10^{12} - 6, 034$,
the other at $g_{n_2}$,
$n_2 = 10^{20} - 42, 780$,
which for the remainder of this section will be
referred to as the $N=10^{12}$ and $N=10^{20}$ data sets, respectively.

If a Gram block $B(n, k)$ contains exactly $k$ zeros (so is not associated with a violation of Rosser's rule, see Section~\ref{sc213}) then its zero pattern must be either $211...110$,
or $011...112$, or $011...131...110$ (where
the data~... refer to any strings of consecutive 1's, but these strings might be
even shorter than indicated).
pattern might be missing).
Van~de Lune
et~al. \cite{LRW2}
noted
in their computations that for a fixed
$k$, the first two zero patterns seemed to be much more frequent than the third one,
and that the frequencies seemed stable as the height of zeros increased.
The new computations, however, show a steady decrease in the frequency with which the third pattern
appears
as the height increases.
Table~2.13.2 shows the
observed frequencies.
The $N=1$ entry is drawn from Table~2 of \cite{LRW2}, which is based on statistics of 3 sets of $10^8$
Gram intervals each, starting at
$g_m$ with $m=0$, $7 \times 10^8$, and $1.4 \times 10^9$,
respectively.
Only
Gram blocks of length $k$ with exactly
$k$ zeros are considered, and the entry in the table gives the fraction
of all such Gram blocks that have a zero pattern
containing a 3.
The decrease in the frequency of the third zero pattern is puzzling.
The GUE theories suggest that this pattern ought to occur a positive proportion of the time.

Table~2.13.3 presents data on the fraction of Gram intervals that contain a given number
of zeros.
The $N=1$ and $N= 1.4 \times 10^9$ data sets are the same as in Table~2.13.1,
and these entries come from Table~5 of \cite{LRW2}.
Note that there were no Gram intervals with $\ge 4$ zeros in the $N=10^{12}$ and $N=10^{20}$ sets (although such intervals did turn up in other data sets around the $10^{20}$-th zero, for example).

The GUE entry in Table~2.13.3 was derived by assuming that a Gram interval does not differ
from any other interval of that length, and so the entry in the table for a given $m$ in the GUE row is the probability that an interval of length 1 contains exactly $m$ zeros.
Since the averages of $S(t)$ do increase as $t$ increases, i