Re
«
((.111Y)4(1-a)
Now the best choice is Z = H
Re For M
+ z3(1-a) + Hz4-6")£A. giving
((D.llf2)2(1-a)
+ H3;~~~~ )£A.
= D'/ 4 this establishes Theorem 10.4 in the range
EXERCISE
(10.91)
«
11( 3 "-l)
3. Show that for ~ < a ,;;; 1 and
N(a, k, Q, T)
«
(D(3+<)(1-n)
E
~ ,;;; a ,;;; 1.
> 0,
+ Hb(a)(l-a))£A,
where
(10.92)
4 8 -). 5- 2a 5a- 2
b(a) = min(-- , -
[Hint: Raise the polynomial De(s, x) to a power 2k with k ;? 3 so that Z,;;; P,;;; (MY)3 + z4/3.J
10. ZERO-DENSITY ESTIMATES
264
EXERCISE 4. Show that for ~
N(a, k, Q, T)
(10.93)
«
< a:
,;;; 1 and E
(D(H<)(!-a)
> 0,
+ Ha(a)(l-a)).CA
where 5 -) . a(a) =min ( - -5 - , - 3 -a 7a- 3
(10.94)
[Hint: Raise the polynomial De(s,x) to a power 2k with k;? 4 so that Z,;;; P,;;; + zs/4 .]
(11IY)4
REMARKS. For a: near 1, the estimates (10.91) and (10.93) are stronger than ( 10. 72) in terms of Q2 , but not in terms of kT. In particular, if kT ,;;; Q<, then (10.93) yields
N(a, k, Q, T)
« Q4 (!-a)+c
f1 ,; ;
in the range a ,;;; 1, which is essentially the density conjecture. By using his Theorem 9.10, M. Jutila [Jut] established the density conjecture for zeros of ((s) in the same range. Every exponent a(a),b(a),c(a) takes its maximum at a=~, being a(~) = ~,b(~) = lf, c(~) = For a= 1 we get a(1) = ~, b(a) = c(1) = ~, and for a= ~ all three exponents take value 2.
¥
£,
10.5. The gaps between primes. This section is written for the purpose of giving an example of how the zerodensity theorems can be applied. We have chosen a few questions about primes in short intervals, because they were inspirational for the development of density theorems in the first place, and for other statistical studies of zeros of ((s). By no means do we attempt to show the best results from the vast literature. By the Prime Number Theorem,
'1/J(x) =
L
A(n) = x + E(x),
n(x
where E(x) is a suitable error term, it follows directly that (10.95)
'1/J(x+y)- '1/J(x)
~
y
as x---> oo, provided y = y(x) is somewhat larger than E(x). Hence there is a prime number in the short interval (x, x + y] for all sufficiently large x. Without improving the existing error term in the Prime Number Theorem, G. Hoheisel [Ho] succeeded, nevertheless, in showing in 1930 that (10.95) holds for y = x 0 with some absolute constant () < 1. This is an impressive achievement given the fact that the error term as good as E(x) = O(x 0 ) seems to be far beyond the current technology. Recall that the latter bound translates into the non-vanishing of ((s) in Res > 0. However, Hoheisel required only two results which were available in his time. First is a zero-free region of ((O" +it) of the type (10.96)
ltl,;;; T,
0";? 1- BloglogT logT
for some constant B > 0 and all T sufficiently large. The second ingredient is the zero-density estimate of type (10.97)
10.
ZERO-DENSITY ESTIMATES
265
for all ~ :( a :( 1 with some constants c ;? 2 and A ;? 1. From these results one derives
e=
THEOREM 10.5. Put
1- (c+ (A+ 1)/B)- 1 . Then
(10.98)
1/;(x + y) -1/;(x) = y + oCo~x)
for all y with x 0 (log x) 3
:(
y ( x.
PROOF. One uses the approximate "explicit formula" (see Section 5.9)
1/J(x)=x-
L
2) xP p-+0 (X y;(iogx)
j-y[!(T
with T = x 1-
0.
Hence
1/;(x+y)-1/J(x) _ 1 =""" (x+y)P-xP +0(-1-). ~
y
[1[!(T
~
~X
Here the sum over the zeros is bounded by
L
x 13 -
1
:(
j-y[!(T
21
1
x"- 1 dN(a, T)
2
:(
2.r-~N(~, T) + 2(logx)
1 1
xn- 1 N(a, T)da
2
1
«
x-1T1ogT + (logx)(logT)A
«
x-1T1ogT + (Tc jx)~(logT)A
1
(Tcjx)"da
where"'= B(loglogT)/logT. Since (Tcj:r)~ = (logT)(c-,.:.)s = (logT)-A-1,
the sum over zcroB iB 0(1/ logx) completing the proof of Theorem 10.5.
0
Hoheisel used the zero-free region (10.96) with a positive but very small constant B due to J. E. Littlewood [Lit], and the zero-density estimate (10.97) with exponent c = 4 due to F. Carlson [Car] (see (10.26)). After Vinogradov's widening the region (10.96) (see Corollary 8.28) one can take B arbitrarily large so e = 1 - c- 1 + E satisfies the conditions of Theorem 10.5. From now on the Hoheisel exponent e depends only on c in the zero-density estimate. Note that one needs (10.97) to hold throughout the segment ~ :( a :( 1 with the same c, so an improvement in a subrange does not help. In other words, all one gets from (10.6) iB c = maxc(a). By the Grand Density Theorem 10.4 for ((s) one gets (10.97) with c = 12/5 (the maximum of c(a) is attained at a = ~),which yields (10.98) for x 0 ,::; y,::; X with e = f2 +E. This is the best result of its kind obtained so far (in 1972 by M. N. Huxley [Hu3}). The density conjecture c = 2 yields e = ~ +E. Various combinations of the above analytic arguments with sieve methods produced estimates (10.99)
y « 1/J(x + y) -1/;(x)
« y
10.
266
ZERO-DENSITY ESTIMATES
in place of the asymptotic formula (10.95), however, for shorter intervals. For example, R. Baker and G. Harman [BH] got (10.99) for y = x 0 with()= 0.534. From (10.99) with y = x 0 it follows that the difference between consecutive primes satisfies (10.100)
dn
= Pn+l
- Pn
« P~·
Therefore we have (10.100) with()= 0.534 as a consequence of the Baker-Harman 1
work. Recall that the Riemann hypothesis yields dn « p~ logp11 , but even on the Pair Correlation Conjecture (see Chapter 25), the best that has been achieved is (due to D. Goldston and D. R. Heath-Brown [GHB]) (10.101)
dn
«
(Pn logpn)!.
Creating a probabilistic model for primes in 1937 H. Cramer [Cra] was led to a conjecture that (10.102) In the other direction R. Rankin [Ra2] showed by constructing special composite numbers that (10.103)
dn;? (e'- c:)(logpn)(loglogpn)(loglogloglogp,)(logloglogpn)- 2
infinitely often, where 1 is the Euler constant. Paul Erdos offered a price of $10,000 to anyone who can replace e' in (10.103) by a function increasing to infinity (the largest price ever offered by Erdos for a solution to a mathematical problem).
It is conjectured that the normalized gaps between consecutive primes dn * = (Pn+l - Pn)(logpn)- 1 has a Poisson distribution, that is for any t > 0,
(10.104)
. 1 hm -[{n,;:;x; dn*,;:;t}[=1-e-t.
x-oc X
Many estimates for 1/;(x to x.
+ y) -1/J(x)
were established on average with respect
EXERCISE 5. Assuming (10.97) with c;? 2 prove that (10.105) for X 0 ,;:; y ,;:; X with()= 1- 2c- 1 + c:, for any E > 0 and any A> 0, the implied constant depending only on c: and A. In particular, by taking c = 12/5 deduce that (10.95) holds true withy= x 0 for almost all x, where()> ~ is a fixed number. We close this section by pointing·out a formal similarity between the problems of primes in short intervals and primes in arithmetic progressions to large moduli.
. j
. .
10. ZERO-DENSITY ESTIMATES
267
EXERCISE 6. Assume the following estimates for the zeros of Dirichlet £functions with characters x(mod q): (1) L(s, x) =/= 0 in the regions= tJ +it with ltl ,;;; T and
(10.106)
tJ')1- B
loglogqT logqT
where B is a positive constant, provided qT is sufficiently large, (2) The number N(a, q, T) of zeros of all L(s, x) with x(mod q) in the rectangle it! ,;;; T, tJ ') a satisfies (10.107)
N(a,q,T)
«
(qT)"(l-n)(logqT)A
for ~ ,;;; a ,;;; 1, T ') 3, where c ') 2 and A ) 1 are suitable constants. Prove that (10.108) uniformly for x ) q0 (log q) 3 with B = c + (A tional results.
+ 1) j B.
See Chapter 17 for uncondi-
CHAPTER 11
SUMS OVER FINITE FIELDS 11.1. Introduction. In this chapter we consider a special type of exponential and character sums, called sometimes "complete sums", which can be seen as sums over the elements of a finite field. Although the methods of Chapter 8 can still be applied to the study of such sums, disregarding this special feature, the deepest understanding and the strongest results are obtained when the finite field aspect is taken into account and the powerful techniques of algebraic geometry are brought to bear. We have already encountered in the previous chapters some examples of exponential sums whith can be interpreted as sums over finite fields, for example, the quadratic Gauss sums
Ga(P) =
L G)e(a;) x mod p
or the Kloosterman sums (1.56)
"'*
S(a,b;p) = ~
f
I,,,~ ~!
rl
I
I !
(ax+ bx)
e -p- .
x mod p
In this chapter we will study these sums in particular. The culminating point of our presentation is the elementary method of Stepanov which we apply for proving Weil's bound for Kloosterman sums
IS(a,b;p)l (
2.JP
and Hasse's bound for the number of points of an elliptic curve over a finite field. Then we survey briefly, without proofs, the powerful formalism of l'-adic cohomology developed by Grothendieck, Deligne, Katz, Laumon and others, hoping to convey a flavor of the tools involved and to give the reader enough knowledge to make at least a preliminary analysis of any exponential sum he or she may encounter in analytic number theory.
11.2. Finite fields. We first recall briefly some facts about finite fields, and establish the notations used in this chapter. For every prime p, the finite ring 7lfp7l of residue classes modulo p is a field, which we denote IF p· The Galois theory of IFP is very easy to describe: for any n ;::, 1, there exists a unique (up to isomorphism) field extension of IFP of degree n, written IF p". Conversely, any finite field IF with q elements is isomorphic (but not canonically) to a unique field IFPd, so q = pd, and IF admits also a unique finite extension of degree n for any n ;::, 1, namely IFP"". Let now IF= 1Fq be a finite field with q = pd elements. In most of the chapter, p is fixed and we change notation slightly, denoting by iF an algebraic closure of IF
269
11.
270
SUMS OVER FINITE FIELDS
and by IF n c lF the unique extension of degree n of IF for n ;::, 1. The context will always indicate clearly that the cardinality of 1Fn is qn and not n. The extension 1Fn/1F is a Galois extension, with Galois group Gn canonically isomorphic to 7l/n7l, the isomorphism being the map 7l/n7l ___, Gn defined by 1 >--> CY, where CY is the Frobenius automorphism of 1Fn given by CY(x) = xq. Let lF be a given algebraic closure of IF, so by the above,
By Galois theory, for any x E
JF,
we have x E IF<===? CY(x) = x if and only if
xq = x and more generally
(11.1) From this we can deduce that IF n is the splitting field of the polynomial Xqn - X E IF[X]. More precisely, one can state the following result of Gauss: LEMMA
11.1. For any integer n ;::, 1, we have
I1 I1
(11.2)
P = Xqn - X
din deg(P)=d
where the product ranges over all irreducible monic polynomials P of degree d dividing n. PROOF. This is an immediate consequence of the description of finite fields: the roots of the polynomial on the right side (in an algebraic closure) are exactly the elements x E 1Fn with multiplicity one and, conversely, every such x has a minimal polynomial which must occur, exactly once, among the polynomials P on the left side. 0
Associated to the extension IF n/"F are the trace map and the norm map. Because of the above description of the Galois group of 1Fn/1F, the trace map Tr = Tr 1Fn/1F IF n ___, IF is given by (11.3) O(i(n-1
while the norm map N = N'fn/IF (11.4)
N(x) = O(i(n-1
IF~
O(i(n-1
___, IF* is similarly
I1
O(i.(n-1
The equations Tr (x) = y and N(x) = y, for a fixed y E IF are very important. Because the extension 1Fn/1F is separable, the equation Tr (x) = y always has a solution. If x 0 is a given solution, then all solutions are in one-to-one correspondence with solutions of Tr (a) = 0, by x = xo +a. Moreover, any solution of Tr (a) = 0 is of the form a= CY(b)- b = bq- b for some bE 1Fn, unique up to addition of an element in IF. Similarly, for any y E IF*, the equation N(x) = y has a solution, and if xo is a given solution, the set of solutions is in one-to-one correspondence with solutions of N(a) = 1, which by Hilbert's Theorem 90 (or by direct proof) are all given by
L
11. SUMS OVER FINITE FIELDS
271
a= CT(b)b- 1 = bq-l for some bE IF~, unique up to multiplication by an element in IF*. As the additive group of IF is finite, the general theory of characters of a finite abelian group (see Chapter 3) can be applied. Characters of IF are called additive characters, and they are all of the form x >-> 1/J(ax) for some a E IF, where 1/J is some fixed non-trivial additive character. For instance, let Tr : IF--> ZjpZ be the trace map to the base-field, then
1/J(x) = e(Tr(x)jp)
(11.5)
is a non-trivial additive character of IF. For a given additive character 1/J and a E IF, we denote by 1/Ja the character x >->1/J(ax). Applying the general theory of characters of finite abelian groups, we get the orthogonality relations
L1fJ(x) = { q "'
0
if
X=
1,
otherwise
(which is used to "solve" the equation x = 0 in IF) and if 1/J = 1 is the trivial character, otherwise. The description of characters of the multiplicative group IF* (also called multiplicative characters of IF) is not so explicit. The group structure of IF* is well-known (dating back to Gauss): it is a cyclic group of order q- 1. Generators of IF* are called primitive roots, and there are -> n such that zn = x and all multiplicative characters of IF are expressed as
x(x) = e(alog(x)) q-1 for some a E Z/( q- 1)Z, but such a description is usually of no use in analytic number theory. As examples of multiplicative characters, suppose IF = ZjpZ and p of 2. Then the Legendre symbol
is a non-trivial quadratic character. In general, if 8 I (q- 1), there is a cyclic group of order 8 consisting of characters x of IF* of order 8. The orthogonality relations become
q-1 L:x(x) = { 0 X
if X= 1, otherwise,
11.
272
SUMS OVER FINITE FIELDS
(the sum over all multiplicative characters), and
L xEJii'>
q-1 x(x) = { 0
if
x=
1 is the trivial character,
otherwise.
It is usual to extend multiplicative characters to lF by defining x(O) = 0 if X of 1, and x(O) = 1 if X= 1. Notice then that for any (j I q- 1 the formula (11.6)
L
x(x) = I{Y ElF
I y6 =
x}l
x'=l (also a particular case of the orthogonality relations for the group JF* /(lF*)d, as described in Chapter 3) is true for all x E JF. 11.3. Exponential sums. Let lF = lF q be a finite field with q = pm elements, p a prime. Exponential sums over lF can be of various kinds. For the simplest case, consider a polynomial P E JF[X] and an additive character 'lj;, and define the sum
S(P) =
L 'lj;(P(x)). xEIF
Slightly more generally, take a non-zero rational function consider S(J) = 'lj;(J(x));
f
=
P /Q E JF(X) and
L
xEIF Q(x)#O
for instance, taking q = p and f(x) = ax+ bx- 1 , we have S(J) Multiplicative characters can also be used, getting sums of the type
Sx(J) =
S(a,b;p).
L* x(J(x)) xEIF
(where the star in :L* means here and henceforth that the summation extends to all x which are not poles of f). For q = p, x = ())) (the Legendre symbol) and f(x) E Z[X] a cubic polynomial without multiple roots modulo p, we see that -Sx (f) is the p-th coefficient aP of the Hasse-Wei! zeta function of the elliptic curve with equation y2 = f(x) (see Section 14.4). Still more generally, one can mix additive and multiplicative characters, and define sums such as (11.7)
Sx(J,g) =
L* x(J(x))'lj;(g(x)), xEIF
an example of which is the Salie sum T(a, b;p) defined by
T(a,b;p) =
L* x mod p
(~)e(ax+bx) p
which occurs in the Fourier expansion of half-integral weight modular forms; see [16] for instance. In contrast with the seemingly simpler Kloosterman sums S(a, b;p), the Salie sums T(a, b; p) can be explicitly computed (see Lemma 12.4, and Corollary 21.9 for the uniform distribution of the "angles" of the Salie sums).
11. SUMS OVER FINITE FIELDS
273
To end this list, we mention that all these definitions can again be generalized to sums in more than one variable, and that the summation variables can be restricted to the rational points of an algebraic variety defined over JF: some examples will appear in the survey sections of this chapter. The exponential sums which directly arise in analytic number theory are sums over the prime field 7ljp7l. However, the deeper understanding naturally requires considering sums over the extension fields JF Pn. Indeed, the very reason for the success of algebraic methods lies in the fact that an exponential sum over JFP doesn't really come alone, but has natural "companions" over all the extension fields lFPn, and it is really the whole family which is investigated and which is the natural object of study. Those companion sums are easily defined: take the most general sumS= Sx(f,g) we have introduced, then for n:;:, I let (II.8)
Sn =
2:* x(N1Fnf1F(f(x)))7/l(TrFnfF(g(x))) .xEFn
where we use the multiplicative character xo Nand the additive character 7/JoTr of lFn. All the sums Sn are incorporated into a single object, the zeta function of the exponential sum, which is the formal power series Z = Zx(f,g) E IC[[T]] defined by the formula
Z = exp(L
~nTn ).
n~!
Justification for the introduction of the zeta function comes from the following rationality theorem, conjectured by Wei!, and proved by Dwork. THEOREM 11.2 (DWORK). The zeta function Z is the power series expansion of a rational function; more precisely, there exist coprime polynomials P and Q in IC[T], with P(O) = Q(O) = 1, such that Z =g.
As a corollary, denote by (n;) (resp. (/3j)) the inverse of the roots (with multiplicity) of P (resp. Q), so
p
=II (I - ct;T),
Q
=II (1- (3jT).
Then using the power-series expansion 1 Tn log-="'I-T ~ n n~l
we find that the formula Z = PjQ is equivalent to the formula
for any n :;:, I, which shows how the various sums Sn are related. In particular, note that they satisfy a linear recurrence relation of order d equal to the number deg P + deg Q of roots ct;, f3;.
11.
274
COROLLARY
SUMS OVER FINITE FIELDS
11.3. We have for any n ;::, 1 the upper bound
In particular, (11.9)
A common abuse of language is to speak of the O
s = 2..:::* x(f(x))'lj;(g(x)), xEF
we also introduce for a E IF,
Sa= L* x(f(x))7f;a(g(x))
=
xEIF
L* x(f(x))'lj;(ag(x)). xEIF
Estimates on average over a for the first few power moments of Sa are often easily derived by elementary means, and they can be of great use in estimating S, even in addition to the methods of algebraic geometry. See the proof of Weil's bound for Kloosterman sums in Section 11.7 and the examples in Section 11.11. More general types of families have been (and still are) extensively studied by Katz; see for instance [Kl].
11.4. The Hasse-Davenport relation. We consider general Gauss sums over a finite field. Let IF = IF q be a finite field with q = pm elements, and let 'lj; be an additive character and X a multiplicative character of IF. The Gauss sum G(x, 'lj;) is (11.10)
G(x,'lj;)
=
Lx(xh''(x). xEIF
(recall that X is extended to IF by x(O) = 1, 0 according to whether x is trivial or not). When X is the Legendre symbol, one recovers quadratic Gauss sums. The associated sums over the extensions fields are
Gn(x,'lj;) = L
x(NFnflF(x))'lj;(TrlFnfiF(x))
xElFn
and the zeta function is (11.11)
Z(x,'lj;) = exp(L n~l
Gn(~,'lj;)Tn).
11.
SUMS OVER FINITE FIELDS
275
In this case, Dwork's Theorem was proved by Hasse and Davenport and is known as the Hasse-Davenport Relation. THEOREM 11.4 (HASSE-DAVENPORT). Assumex and'!j; are non-trivial. Then we have for any n ) 1, -Gn(X,'I/J) = (-G(x,'I/J))n or equivalently the zeta function is a linear polynomial
Hence the only "root" for the Gauss sum is G(x, 1/J) itself. This can be estimated elementarily, as was done for the Gauss sums considered in Chapter 3. PROPOSITION 11.5. We have
IG(x, 1/J)I
=
.;q
if neither X nor 1/J is trivial, while
IG(1, 1/J)I = {
~
IG(x, 1)1 = {
~
if 1/J non-trivial, if 1/J = 1 if X non-trivial, if X= 1.
PROOF. The last two statements are immediate, so assume neither X nor 1/J is trivial. We have
IG(x,1/J)I 2
L x(x)x(y)1/J(x)1/J(-y) = L x(z) L 1/J((z- 1)y) (on writing z = xy-
=
x,yEF'
zEF
1
)
yEF'
= q (by orthogonality, applied twice.)
D We now turn to the proof of the Hasse-Davenport Relation. We consider the field F = IF(X) of rational functions on IF and the ring R = IF[X] of polynomials. Recall that R is a principal ideal domain. For h E R of degree d ) 0, we define the norm N(h) = qd. The zeta function of F is the Dirichlet series (analogous to the Riemann zeta function) (F(s) = N(h)-s.
L
hER h monic
REMARK. This could also be written as a sum over the non-zero ideals a in R,
where N(a) = IR/al = N(h) for any polynomial h such that a= (h). But we will work with polynomials to emphasize the elementary spirit here.
11.
276
SUMS OVER FINITE FIELDS
The Dirichlet series (F ( s) converges absolutely for Re ( s) > 1. Indeed, putting
n(d) ={hER
f
deg(h) = d, his monic}= qd
we obtain immediately
d)O
d)O
On the other hand, unique factorization into irreducible polynomials yields an expression of (F as an Euler product
II
(p(s) =
PER P monic irreducible
which is convergent for Re (s) > 1. The first step in the proof of the Hasse-Davenport Relation consists of writing the zeta function of Gauss sums as an £-function for the field F. Let H C F* be the subgroup of rational functions which are quotients of monic polynomials, and G C H a subgroup with the property
Then if a : G--+ IC* is a character of the group G, it can be extended to a totally multiplicative function of the set of monic polynomials hER by putting a( h)= 0 if h 1/ G. The corresponding £-function is defined analogously to the classical £-functions by the Dirichlet series L(s,a) =
L
a(h)N(h)-s
=II (1- a(P)N(P)-s)-
hER
1
P
h monic
for Re(s) > 1. For dealing with Gauss sums we consider the subgroup G c H of rational functions f defined and non-vanishing at 0. Define a character A on G by
for h = Xd- a 1 Xd- 1 + · · · + (-l)dad E R. Clearly .A is multiplicative on ~onic polynomials, and extends to a character of G. In this case we get the following: LEMMA 11.6. We have L(s, .A)= 1 + G(x,-l/!)q-s. PROOF.
We arrange the Dirichlet series for L(s, .A) according to the degree of
h:
L(s,.A)=L( d)O
L
.A(h))q-ds
deg(h)=d
and evaluate each term in turn. For d = 0, the only monic polynomial occurring is h = 1, and .>-(1) = 1. Ford= 1, we have h =X- a so that .A(h) = L.A(X- a)= LX(a)-lj!(a) = G(x,"l/!). deg(h)=1
a ElF
a ElF
ll. SUMS OVER FINITE FIELDS
277
For any d ): 2 we have
= qd- 2
L
x(ad)'!j!(al) =
o
a1,adElF
by orthogonality, because at least one of the characters
x, '1)! is non-trivial.
D
On the other hand, appealing to the Euler product we will prove: LEMMA 11.7. We have L(s,>.) = Z(q-•), where Z tion (11.11) associated with Gauss sums.
= Z(x,'l)!)
is the zeta func-
Theorem 11.4 follows from Lemmas 11.6 and 11.7. PROOF OF
LEMMA 11.7. Taking the logarithmic derivative of the Euler prod-
uct, we get __1_L'(s,>.) = Ldeg(P)L>.(P)"q-rds
1ogqL(s,>.)
P
=
r?l
L
(L d
n)l
rd=n
L
d>.(PJ")q-ns
p
deg(P)=d
while, on the other hand,
It therefore suffices to prove the formula
(11.12)
L
d>.(Pr 1d = Gn(x,'l)!)
p d=deg(P)In
for n ): 1, the equality of the logarithmic derivatives being sufficient to imply Lemma 11.7 since both sides are Dirichlet series with leading coefficient 1. To prove (11.12), let P be one of the irreducible polynomials appearing on the left side, of degree d \ n. Its roots, say x 1 , ... , xd, are in lFn. Fix one root x =xi and write We get
hence
11.
278
SUMS OVER FINITE FIELDS
Summing over all roots of P we derive d
d>..(Ptld = LX(N(xi)),P(Tr(xi)), t=l
and summing over all P with deg(P) In, we get (11.12) by Lemma 11.1 since every element in 1Fn will appear exactly once as one of the roots x, for some P. D 11.5. The zeta function for Kloosterman sums.
Next we consider Kloosterman sums. Let IF be a finite field with q = pm elements and this time consider additive characters'¢ and rp. We define the Kloosterman sum associated to '¢ and rp by
(11.13)
,P(x)rp(x- 1),
S(,P, rp) = - L xEIF•
(the minus factor is only for cosmetic reasons). When q = p is prime, and ,P(x) e(ax/p), rp(x) = e(bxjp), we have therefore S('¢, rp) = -S(a, b;p).
=
The companion sums over the extension fields IF n are
xEIF~
and the Kloosterman zeta function is
Sn(~,rp)rn).
Z = Z(,P,rp) = exp(L n;?:l
We will prove Dwork's Theorem in this case, which is due to Garlitz. THEOREM
11.8. Assume that '¢ and
Z(,P, rp) = 1- S(,P, rp)T + qT2 The proof is very similar to that of Theorem 11.4. We put R = IF[X], F = IF(X) as before, and consider the same group G C F* of quotients of monic polynomials defined and non-vanishing at 0. We define a character 1) : G-+ C* by putting
for a monic polynomial hE G, where we write (compare the previous section)
h = xd
+ a1xd-! + ... + ad-1x +ad
(with ad ol 0 since h E G). The following computation verifies that 1) is indeed a character of G: let h' = xe + b!xe- 1 + ... + be-1X +be with be "' 0, then
hh' = xd+e and
+ (a1 + b1)Xd+e- 1 + · · · + (ad-1be + adbe_I)X + adbe
r
11. SUMS OVER FINITE FIELDS
279
I
Recall that we extend TJ to all hER by putting ry(h) = 0 for h rf_ G. LEMMA
11.9. For 'ljJ and 'ljJ non-trivial, the L-function associated to TJ is given
by
L(s,ry) = LTJ(h)N(h)-s = 1- S('lj!,
By arranging terms according to the degree of h, we write
PROOF.
L(s,ry) = L ( d?O
L
ry(h))q-ds
deg(h)=d
and evaluate the inner sums. Ford= 0, we have only h = 1 and ry(1) = 1. For d = 1, we have h = X+ a with a f' 0, hence
L
ry(h) = L
ry(X +a)= L
'lj!(a)
deg(h)=1
Ford= 2, we get
= q -1 + (L 'lj!(a)) (L
bEF•
by applying twice the orthogonality of characters, since neither 'ljJ nor
"L
ry(hJ="L
deg(h)=d
L
= qd-J
'ljJ(aJ)
a1,ad-1EF
a ElF•
D
since there is free summation over a 1 E IF. LEMMA
11.10. For 'ljJ and
+ q1-2s.
This lemma completes the proof of Theorem 11.8. PROOF.
The £--function has an Euler product
L(s,ry)
=IT (1- ry(P)N(P)-s)-
1
.
p
Taking the logarithmic derivative we get __1_ L'(s, TJ) = L deg(P) L ry(P)' q-rdeg(P)s logq L(s,ry) P r? 1
="L("Ld n?1
rd=n
L deg(P)=r
ry(Pr)q-n•
11.
280
SUMS OVER FINITE FIELDS
and as before it suffices to prove the formula
L
(11.14)
dry(P)nfd = -Sn('I/J,tp)
d=deg(P)In
for n ): 1. Let p = xd
+ alxd-l + ... + ad-lx +ad
be one of the irreducible polynomials on the left side of (11.14), of degree d I n, and x 1 , ... , xd its roots, which lie in IF d. We have for each i, n n Tr(x;) = dTriFd/IF(x;) = -da1 (since aci 1 XdP(X- 1 ) = Xd +a~:' xd-l + ... +
Hence ry(P)nfd
aci 1 )
and
= ?/! Ga~ )'P(;r~:~) = ?/!( -x;)tp( -x;l)
and summing over the roots X;, then over the polynomials P of degree d obtain (11.14) by Gauss's Lemma again.
I n, we D
Theorem 11.8 allows us to factor the Kloosterman zeta function
where a and (3 are complex numbers, and of course a+ (3 = S(?j!, tp), a(J = q. In sharp contrast to the case of Gauss sums, however, the roots a and (3 cannot be explicitly computed. THEOREM 11.11 (WElL). Assume that ?)! and tp are non-trivial and p # 2. Then the roots a and (3 for the Kloosterman sum S(?j!,tp) satisfy lal = lfJI = ,;q, and therefore we have (11.15)
IS(?)!, 'P)I ( 2.fii..
We will prove Theorem 11.11 in the next two sections. COROLLARY 11.12. Let a, b, c be integers, c positive. We have (11.16) PROOF. By the twisted multiplicativity (1.59) for Kloosterman sums, it suffices to consider c = pv with p prime and v ): 1. If p I nm, we have Ramanujan sums for which the result is easy (see (3.2), (3.3)). Otherwise, the case v = 1 follows from Theorem 11.11 for p ): 3, and for p = 2 one checks immediately that the Kloosterman sums modulo 2 satisfy Theorem 11.11: we have S(1, 1; 2) = 1, and the associated zeta function is therefore Z(T) = 1+T +2T2 , with roots ( -1±i/7)/4 of modulus 1//2. The case p f nm and (3 ): 2 can be dealt with elementarily; see Exercise 1 of Chapter 12. D
11. SUMS OVER FINITE FIELDS
281
EXERCISE 1. Consider a general Kloosterman-Salie sum
S(x.; 'lj!,
L x(x)'!j!(x)
and its associated companions Sn and zeta function Z, where 'ljJ and
'!j!(x) =
e(Tr~ax)),
e(Tr~x)).
11.6. Stepanov's method for hyperelliptic curves. We will prove Theorem 11.11 by deducing it from the Riemann Hypothesis for certain algebraic curves over finite fields. However, we use Stepanov's elementary method (see [Ste], [Sch], [Bo3]) instead of Weil's arguments. Let IF be a finite field with q element, of characteristic p. We will only consider algebraic curves C 1 over IF given by equations of the type (11.17)
for some polynomial condition (11.18)
f E IF[ X] of degree m ): 3. We assume moreover the following
The polynomial Y 2
-
f(X) E IF[X, Y] is absolutely irreducible
(i.e. it is irreducible over the algebraic closure of IF). This is a minimal regularity assumption on the curve C1 . It is easily seen to be equivalent to the condition that f is not a square in iF[ X], and we will use it in this form. REMARK. Stepanov's method has been refined by Schmidt [Sch] and Bombieri [Bo3] and is capable of handling the general case of the Riemann Hypothesis for curves; the case of curves with equation of the type yd = f(x) is not much harder than the one treated here. We limit ourselves to the curves C1 for simplicity, and because it suffices for the application to Kloosterman sums and elliptic curves. Note that curves of the type y 2 = f (x) are instances of so-called hyperelliptic curves, which are quite naturally distinguished among algebraic curves (but not all hyperelliptic curves are of this form; see for instance elliptic curves in characteristics 2 and 3). The problem we consider is that of estimating the number ICJ(IF)I of IF-rational points of C1 , i.e., the number N of solutions (x, y) E IF 2 to the equation y 2 = f(x). We are especially interested in this question when q is large (typically, as with exponential sums, the polynomial f E IF[X] is fixed, and we-consider the IF n-rational points for all n): 1), although we will obtain completely explicit inequalities. THEOREM 11.13. Assume that f E IF[X] satisfies {11.18), and m = deg(f)): 3. If q >4m 2 , then N = ICJ(IF)I satisfies
IN - ql < 8myq.
lL
282
SUMS OVER FINITE FIELDS
Clearly we can assume that p > 2, as otherwise the map y >-> y 2 is an automorphism of 1F and N = q. Stepanov's idea, which was inspired by results of Thue [Thu] in diophantine approximation, is to construct an auxiliary polynomial of degree r, say, having zeros of high multiplicity (at least e, say) at the x-coordinates of points of CJ(IF). Hence one gets easily the inequality N ( 2r£- 1 , the factor two being the highest possible multiplicity of a given x-coordinate among points in CJ(IF). This inequality turns out to be so strong that it gives the upperbound of the theorem (certainly a surprising fact!). A trick then deduces the lower-bound from this. We first distinguish among the points (x, y) these with y = 0. Let No be the number of distinct zeros off in IF, which is also the number of points (x, 0) E CJ(IF). If (x,y) is a point of CJ withy =F 0, it follows that f(x) is a square in IF, which is true if and only if g(x) = 1 where
g=
r
with
c
=
~(q- 1).
Conversely, given x E IF with g(x) = 1, there are exactly two elements y E IF* with y 2 = J(x). Hence, writing
(11.19)
N1
= l{x E IF I g(x) = 1}1
it follows that
N=No+2N1 .
(11.20)
We will estimate N 1 by following the strategy sketched above, but in order to handle the lower bound later, we generalize slightly and consider for any a E IF the set (11.21)
Sa= {x
E IF
I f(x) = 0 or g(x) =a}.
To produce polynomials vanishing to a large order, we wish to use derivatives to characterize when this occurs. In characteristic 0, a polynomial P has a zero of order e at Xo if and only if all the derivatives p(i) with 0 ( i < e vanish at Xo. In characteristic p > 0, however, this is no longer true if e > p, as the example of the polynomial P = XP shows, since p(k) = 0 for all k;;, 1, in particular, p(Pl(O) = 0. A satisfactory solution follows by considering other differential operators. DEFINITION. Let K be any field. For any k ;;, 0, the k-th Hasse derivative is the linear operator Ek : K[X] -> K[X] defined by
Ek xn = (~)xn-k for all n ;;, 0, and extended to K[X] by linearity. We also write E = E 1 (but beware that Ek =FE o Eo··· o E). REMARK.
From the binomial expansion
xn =(X- a+ at=
t
(~)an-k(X- a)k,
k=O
and by linearity, we see that the value of Ek P at a point a E K, for P E K(X], is simply the coefficient of (X- a)k in the Taylor expansion of P around a. This
11.
SUMS OVER FINITE FIELDS
283
explains the properties of the Hasse derivatives, but we cannot take this as a definition, because the values of a polynomial over a finite field do not characterize the polynomial. Note that for K of characteristic p > 0, we get EXP = E 2 XP = · ·. = Ep-l XP = 0, but EP XP = 1 # 0 and we see that the Hasse derivatives detect the zero of XP of order exactly pat 0. This is a general fact, as Lemma 11.16 will show. LEMMA 11.14. The Hasse derivatives satisfy k
Ek(Jg) =
L (Ei f)(Ek-ig) j=O
for all f, g E K[X], and more generally,
(11.22) for fi, ... , fr E K[X].
PROOF. It suffices to consider from the identity
f =
Xm, g
= Xn ,·and the first formula follows
which is obvious from the combinatorial interpretation of the binomial coefficients. Then the second formula follows by induction. 0 CoROLLARY 11.15. {1) For all k, r ~ 0, and all a E K, we have Ek(X- a)'=
{2) For all k, r
~
G)
(X- ark.
0 with k ( r, and all f, g E K[X], we have Ek(Jgr) = hgr-k
for some polynomial h such that
deg(h) ( deg(J) PROOF. For (1), we apply (11.22) to
+ kdeg(g)- k. f1
= · · · = fr =X- a, getting
iJ +···+Jr=k
and only terms with all j; E {0, 1}, 1 ( i ( r, give non-zero contributions since Ei(X- a)= 0 for j ~ 2, from the definition. Hence (1) follows. For (2), we observe that if k ( r, we have j, = 0 for at least r - k indices 0 in (11.22), which gives (2). LEMMA 11.16. Let f E K[X] and a E K. Suppose that (Ek!)(a) = 0 for all k
11.
284
PROOF.
SUMS OVER FINITE FIELDS
Let f=
L
a;(X-a)i
O(i(d
be the Taylor expansion off around a. By (1) of Corollary 11.15, we obtain
L
Ekf=
a;G)(x-a)i-k
k(i(d
and evaluating at a we get ak = 0 for all k < claimed.
e, hence f
is divisible by (X- a)e as D
We need another technical lemma. LEMMA
and let r
11.17. Let K =IF be a finite field of characteristic p with q elements,
= h(X, Xq) E IF[ X], where hE IF[ X, YJ. Then Ekr = (E1.:h)(X, Xq)
for all k < q, where on the right side E1.:h denotes the Hasse derivative of h performed with respect to X. PROOF. It suffices to consider h = xn ym, so we must prove that Ek xn+mq (Ek xn)xmq. From Lemma 11.14, we get
=
k
Ek xn+mq =
L Ek-j xn Ej xmq j=O
so it suffices to show that Ei xmq = 0 for 0 < j < q to prove the lemma. But
mq) ( J
= mq (mq-1) = 0 J- 1
J
D
in characteristic p, and the result follows.
We come to the heart of Stepanov's method, the construction of the auxiliary polynomial. PROPOSITION 11.18. Assume that q > 8m, and let f! be an integer satisfying m
deg(r) < d +2m£(£- 1) + mq which has a zero of order at least
e at all points x E Sa
(recall c
=
~(q- 1)).
We will look, by the method of indeterminate coefficients, for a polynomial r of the special form (11.23)
r
=
fe
L
(rj
+ s 1 g)Xiq
o,;;j
for some polynomials rj, Sj E IF[X], to be constructed, each of which has degree bounded by c - m. Hence such a polynomial r has degree bounded by (11.24)
deg(r) (em+ c- m +em+ Jq ( (J + m)q.
The next lemma is crucial to ensure that r assumption (11.18).
#
0. This is where we need the
ll. SUMS OVER FINITE FIELDS
285
LEMMA 11.19. We haver= 0 E lF[X] if and only if1·1 = s 1 = 0 E lF[X] for all j. PRooF. We can assume (by a shift X >-> X+ a if necessary) that f(O) ¥c 0. Suppose that r = 0 but not all Tj, s1 are zero; let k be the smallest index for which one of rk, Skis non-zero. Dividing by fe Xkq, we get from (11.22) the identity
L
(rk
+ skg)X(j-k)q =
0
k,;;_j
ho =
L
1·
1
+ h 1g =
0, where
XU-k)q,
h1
=
k,;;_j
L
s 1 x(j-k)q.
k,;;_J
We square this equation, then multiply both sides by
f,
getting
h6f = hir. Since
f
E lF[X], we have
f(X)q = f(Xq) hence
=f(O)
(mod Xq)
=
rU sU(o) (mod Xq). However, the degree s of the polynomials in this congruence are bounded by 2deg(rk) + m,;; 2(c- m) + m < q, and 2deg(sk) < 2(c- m) < q, respectively. So there must be equality rU = s%f(O), which contradicts the assumption (11.18) that f is not a square in iF[X]. 0 We now evaluate the Hasse derivatives of r. LEMMA 11.20. Let k,;; f.. Then there exist polynomials rj'), s)k) each one of degree ,;; c- m + k(m- 1) such that
Ekr = Jf-k
L
(rjk)
+ s)k)g)Xjq.
o,;;_j
We can writer= h(X, Xq) where hE lF[X, YJ is the polynomial
L
h = fe
(r;
+ sjnyiq.
O~j~J
Hence by Lemma 11.17, we have
Ekr = (E'}.;h)(X, Xq) =
L
(Ek(Jfrj)
+ Ek(ff+csj))Xiq.
o,;;_J
fe-kr?) and Ek(Jf+csj) = jl-k+csj') with deg(rjk)),;; deg(rj) c- m
+ k(m- 1)
and deg(sjk)) ,;; c- m
+ k(m- 1).
+ kdeg(f)- k,;;
This is the desired result. 0
Recall that we wish r to have zeros of order ? f. at points in Sa (see (11.20)). If f(x) = 0, clearly this is the case. So let x E Sa, with f(x) i' 0. Applying Lemma
lL
286
SUMS OVER FINITE FIELDS
11.21, we evaluate Ekr at a point x E Sa, using g(x) xq = x:
=
where
a(k)
= a, and most importantly
f(x)e-ku(k)(x)
E IF(X] is the polynomial
L
u(k) =
(rjk)
+ as)kl)xi.
O(,j
We can now prove Proposition 11.18: if u(k) = 0 for all k < e, Lemma 11.16 shows that r has a zero of order ): e at all points in Sa. The system of equations (11.25)
(J(k)
=
0, for all k <
e
is a homogeneous system of linear equations, the unknowns being the coefficients of the polynomials rj, Sj, the equations corresponding to the coefficients of the a(k). We observe that deg(u(k)) < c- m + k(m- 1) + J, so the number of equations does not exceed B = £(c- m + J) + ~£(£ -1)(m- 1) while, on the other hand, the number of coefficients of the rj and Sj is at least A = 2(c- m)J. By choosing J large enough, we can make A > B. Then the system (11.25) has a non-trivial solution, and by Lemma 11.19 this produces r =F 0 such that r has zeros of order ): e at all points x E Sa. Taking
e
J = -(c +2m(£- 1)) q
one can check that A > B (recall that 2c = q- 1 and 8£ bounded by (11.24), which gives Proposition 11.18.
~
q). The degree of r is
We now prove Stepanov's Theorem 11.13. First, let a be arbitrary, and apply Proposition 11.18. Since the auxiliary polynomial r is non-zero and vanishes to order ;?: e at points in Sa, we have £!Sal ~ deg(r) ~ d +2m£(£- 1) + mq so !Sal ~ c +2m(£- 1) + mqe- 1 • We choose e = 1 + [y'q/2], which gives the bound
ISal < c + 4m,jq.
(11.26)
To prove Theorem 11.13, take first a = 1 getting q No+ N1 =!Sal < 2 + 4m,/q hence the upper bound (11.27)
N =No+ 2N1 < 2(No + N 1 ) < q + 8m,jq.
To get a lower bound, by the factorization Xq- X= X(Xc -1)(Xc + 1) we have
f(x)(g(x)- 1)(g(x)
+ 1) =
0
for all x E IF, hence N 0 + N 1 + N2 = q where N2 = l{x E IF By (11.26) applied to S_ 1 , we have q No+ N2 = IS- 1 1< 2 +4m,/q
I
g(x)
-1}1.
r
11. SUMS OVER FINITE FIELDS
hence N1 = q- No - N1
287
q 2
> - - 4my/q,
and finally, (11.28)
N =No+ 2NI) 2NI
> q- 8my/q.
Clearly (11.27) and (11.28) prove Theorem 11.13. 11. 7. Proof of Weil's bound for Kloosterman sums. Let lF be a finite field with q elements, of characteristic p op 2. Let 1/J be any fixed non-trivial additive character of lF. For any additive character t.p there exists a unique a E lF such that t.p = 1/Ja, hence any Kloosterman sum S('lf;, t.p) is of the form xEIF"
for some a, b E lF. We consider a and b as fixed and write g = aX + bX- 1 • We will prove Weil's bound (11.15) by relating the average of the Kloosterman sums S('l/Ja, 1/Jb) over 1/J to the number of points on an hyperelliptic curve, where the contribution of the trivial character 1/Jo = 1 will be the main term. LEMMA
11.21. For any n) 1 and any x E lFn, we have
L 1/J(Tr(x)) "' where the sum ranges over all additive characters of lF and Tr is the trace lF
(11.29)
I I 9
l{x E lFn I yq_ Y = x}l
=
n -+
lF.
PROOF. If Tr (x) = 0, then the equation yq- y = x has q solutions exactly, as recalled in Section 11.2, and in this case we have 1/J(Tr (x)) = 1 for all'!f;, hence the right side of (11.29) is also equal to q. On the other hand, if Tr (x) op 0, the equation yq - y = x has no solution, and the character sum is zero by orthogonality. 0
From this lemma we deduce that
V> xEIF~
(11.30)
=
l{(x, y)
ElF~ X
lFn I yq- y = g(x)}l = Nn, say,
for n ) 1. If 1/J = 1/Jo, the trivial character, we have
For 1/J op 1/Jo, let a.p, (3.p be the "roots" of the Kloosterman sum S('l/Ja,'l/Jb), so by Theorem 11.8 we have nv.f3v. = q and
for all n ) 1. We can therefore write Nn = qn - 1 -
L V>#V>o
(a~+ (3~).
11.
288
SUMS OVER FINITE FIELDS
The equation yq - y = g(x) does not obviously describe a curve, since g is not a polynomial, but multiplying by x it is equivalent with Ca,b : ax 2
-
(yq - y)x
+b=
0
(note that x = 0 is not possible since b fo 0). Because p fo 2, the number of solutions is equal to the number of solutions of the discriminant equation of this quadratic equation Da,b : (yq- y) 2 - 4ab = v 2 , i.e., N, = IDa,b(lFn)l. This is of the form (11.17) with deg(f) = 2q, and because 4ab fo 0 it satisfies (11.18). Hence by Theorem 11.13 we have
INn- qnl < 16qi+n/2 if n is large enough, so that qn > 16q. By (11.30) we get a sharp estimate for the roots
~~
(11.31)
L
q V>#V>o
(a~+ f3~)~
aV>, jJV>, on average
( 16qn/2
for n large enough. The following simple lemma shows that the individual roots must be of modulus ( ,fii: LEMMA 11.2 2.
and assume that
Let w 1 ,
... , Wr
be complex numbers, A, B positive real numbers
ltwjl ( ABn j=l
holds for all integers n large enough. Then lwi I ( B for all j. PROOF. One can do this by hand (using Dirichlet's box principle), but a nice trick gives the result immediately: consider the complex power series
izi
The hypothesis implies that f converges absolutely in the disc < B- 1 , hence f is analytic in this region. In particular, it has no poles there, which means that we must have lwil- 1 ~ B- 1 for all j. D From this lemma applied with A= 16q, B =
,;q, we deduce the upper bounds
laV>I ( ,fii, if3V>I ( ,fii for all 1/J fo 1/Jo. Since aV>f3V> lf3V>I = ,fii, and so Theorem 11.11 is proved.
=
q, we have in fact
iaV>I
=
REMARKS. (1) We see here twice how crucial the introduction of the companion sums Kn is: first because the curve Da,b has very high degree, so Stepanov's bound IN- ql < 8m,fii is trivial when applied to lF itself, and secondly because only by the consideration of all extension fields can we determine the exact order of magnitude of the roots, and obtain Wei!'s bound S('ljJ,
(2) The constant 2 is optimal in Weil's bound for fixed a, b and q. Indeed we have
11.
SUMS OVER FINITE FIELDS
289
This is a non-zero rational function with poles on the circle [z[ = is its radius of conveTgence. Therefore
This means that for any
E
1/ ,fii, hence this
> 0 there exist infinitely many n such that
It is conjectured (this follows from the Sato-Tate Conjecture for the angles of Kloosterman sums described in Chapter 21) that the Wei! bound is also optimal when a, b are fixed, n = 1, and q = p --> +oo. However, this remains very much open. See the remark at the end of the introduction to Section 11.8 for the case of elliptic curves.
(3) Using the extension of Stepanov's method to curves of the type yd = f(x) and an analysis of the corresponding zeta function, one can prove the following estimate for complete character sums: THEOREM 11.23. Let lF be a finite field with q elements and let X be a nontrivial multiplicative character of !F* of order d > 1. Suppose f E !F[X] has m distinct roots and f is not a d-th power. Then for n ) 1 we have
IL
x(N(J(x)))l '( (m -1)qnf2.
xEIFn
This is Theorem 2C', p. 43, of [Sch]. In particular, we get the following corollary which will be used in proving the Burgess bound for short character sums (Theorem 12.6). COROLLARY 11.24. Let x (mod p) be a non-principal multiplicative character. If one of the classes bv (mod p), v = 1, ... , 2r is different from the remaining ones then
I
L
x((x+bJ) ... (x+br))x((x+br+I) ... (x+b2r))l'(2rp!.
x(mod p)
PRooF. Observe that x((x
+ bi) ... (x +br))x((x + br+l) ... (x +b2r) =
with f(x)=
II l'j'r
(x+b1 )
II
x(J(x))
(x+b1 )P- 2.
r+l'j'2'·
From the assumption, one of the b, is a root off of order either 1 or p- 2, which is coprime with the order d I (p- 1) of x, so we can apply Theorem 11.23. D
290
11.
SUMS OVER FINITE FIELDS
11.8. The Riemann Hypothesis for elliptic curves over finite fields. A particularly important case of the Riemann Hypothesis is that of elliptic curves. Historically this was first established by Hasse using global methods. In the notation of Section 11.6, this means that we consider curves C1 with deg f = 3, so the equation is of the form (11.32) (the numbering reflects the traditional notation for elliptic curves, see Section 14.4). In contrast with that section, we emphasize that we are considering the affine curve, without the point at infinity. The cubic polynomial f(x) = x 3 + a 2 x 2 + a 4x + a 6 cannot be a square, so this curve satisfies the assumption (11.18). Moreover, we assume that f does not have a double root; this means that the curve C is smooth (see Section 11.9), and it is a necessary condition for what follows. In this case, Theorem 11.13 implies that for q > 36 the number N = jC(lF)j satisfies IN- qj < 24y/q. In Section 11.10 we will prove, as before for Kloosterman sums, the rationality and the functional equation of the corresponding zeta function, from which we will deduce: THEOREM 11. 25. Let C be an elliptic curve over lF given by 2 2 C : y = x 3 + a2x + a4x + a6
with ai E lF. Then for all n ) 1 we have
(11.33) REMARKS. Theorem 11.25 is optimal. Indeed letting n--+ +oo, this follows as for Kloosterman sums from Lemma 11.22. However, it is also true in the horizontal sense as the following example shows: let E/IQ be the elliptic curve with equation
E: y 2 = x 3
-
x
which has complex multiplication by Z[i]. As before we consider the affine points, not the projective ones. The discriminant of E is 64 so E can be reduced modulo p to an elliptic curve over ZjpZ for any odd prime p. One shows (for instance by relating E to the curve y 2 = x 4 + 4 by changing (x,y)....., (yx- 1 ,2x- y 2x- 2 ) for (x, y) =/= (0, 0), see e.g. [14] or [IR]) that jE(ZjpZ)I = p if p 3 (mod 4) and jE(Z/pZ)I = p- 2ap if p 1 (mod 4), where
=
=
p =a;+
b;
=
with 7r = ap + ibp 1 (mod 2(1 + i)) (this congruence determines 7r up to conjugation). For any E > 0, Theorem 5.36 (generalized slightly to add the congruence condition) shows that there exist infinitely many Gaussian primes 1r 1 (mod 2(1+i)) such that jarg1rj
=
!Pfor infinitely many p.
IE(Z/pZ)II
=
2japj ) 2(1- t:
2
)/P
11. SUMS OVER FINITE FIELDS
291
Theorem 11.25 will be proved in Section 11.10 after some geometric and algebraic preliminaries. This goes a bit further away from the heart of analytic number theory, yet we include full details because the Hasse bound is also important as being the simplest case of the very important Deligne bound for Fourier coefficients of modular forms. The reader will also certainly appreciate the elegance and beauty of the geometry involved. 11.9. Geometry of elliptic curves.
In explaining the special geometric features of elliptic curves, we may as well consider a more general case. So let k be an arbitrary field, k an algebraic closure, and let C be the curve given by the equation
identified with the set of solutions (x,y) E P. We assume as before that f does not have a double root. If k' /k is any extension, we let C(k') be the set of solutions in (k') 2 . The geometry of the elliptic curve becomes much clearer if we work with the projective version of the curve C, namely the curve E in the projective plane given, in homogeneous coordinates (x: y: z), by the equation
Putting z = 1 gives back C; on the other hand, "at infinity", we are only adding one point: taking z = 0 yields x = 0, and all elements (0 : y : 0) (with y =/= 0) correspond to a single point oo = (0 : 1 : 0) in the projective plane. Notice that this point oo is rational over the base field k, so that for all extensions k' / k we have
E(k')
=
C(k') u {oo}.
The main property of the curve E that we will use is the beautiful fact that its points form an abelian group, with identity element oo. Throughout, p denotes points on E, not the characteristic of the field k. The group law (denoted by +) is described by the geometric condition that for any three (distinct) points p 1 , P2 and p 3 in E, we have p 1 + P2 + p 3 = 0 if and only if the three points are collinear (in the projective plane), and the opposite of a point (x: y: z) is the point (x: -y: z) (symmetry with respect to the x-axis). This way one can construct the sum of any two distinct points, by computing the equation of the line joining them, and taking the opposite (in the sense above) of the third intersection point with the curve. That there are exactly three intersection points follows immediately from the fact that the polynomial f is of degree 3. In addition, to compute the double p + p of a point p, the same construction is done with the tangent line at p; the condition that f has no double root ensures that this tangent line always exists. Also, because f E k[X], it follows easily that the k'-rational points E(k'), for any extension k' jk, form a subgroup of E(k). We do not prove those facts here; completely elementary proofs, by computing explicitly the coordinates of the sum P1 + P2 of two points according to the recipe above and checking the abelian group axioms (associativity is the only difficulty), are fairly straightforward (see for instance [IR], ch. 18, 19).
292
11.
SUMS OVER FINITE FIELDS
We now introduce some further geometric objects related to E or, more generally, to any smooth, projective, algebraic curve 1 . Thus consider again a more general case: let k be an algebraically closed field, and let E be a plane algebraic curve over k, i.e. given by an equation
f(x,y,z) = 0 for some homogeneous f E k[X, Y, Z]. We identify E with the set of points in the projective plane. We assume that E is smooth, which means here that for any point p = (x : y : z) of E, not all partial derivatives 8f j8X(p), 8f j8Y(p), 8f j8Z(p), are zero. In this case the line with equation
8f ax(P)(X- x)
8f
8f
+ BY(p)(Y- y) + az(P)(Z- z) =
0
is well-defined and is the tangent line to E at p. For elliptic curves y2 = f(x), this smoothness condition is equivalent to the fact that the polynomial f has no double roots, by a simple calculation. Let C be the affine curve corresponding to E given by
C : f(x, Y, 1) = 0 in P. Let g(X, Y) = f(X, Y, 1) E k[X, Y]. We define k[C] = k[X, Y]j(g). Elements of k[C] can be interpreted as functions on C. We assume that (g) is a prime ideal (this is easily checked in the case of elliptic cnrves) so that k[C] is an integral domain, and we let k(C) or k(E) be its quotient field, called the function field of C or of E. It is a finite extension of the field k(X) of rational functions over k (for elliptic curves y 2 = f(x), it is a quadratic extension k(X)(v'J)). We interpret elements of the function field as rational functions on E, so given a point p E E and an element
k(E)x
~ Z,
which gives the order of the zero (if) 0) or pole (if< 0) of a rational function at p. As a discrete valuation, it satisfies
ordp(c)
=
0, forcE k*,
ordp(
+ ordp(1/l),
ordp('P + V')) min(ordp(
11.
SUMS OVER FINITE FIELDS
ideaL Let 1r be a generator; then m~ is generated by ordp can be defined for
1rd
293
for any d ~ 1. The order
I
and extended to a homomorphism k(E)* -> Z. The properties above are then quite easy to check. For elliptic curves y 2 = f(x), one can easily see that if p # oo, and p = (x,y) withy# 0, it is possible to take 1r =X- :r. For p = oo, one can take 1r = XjY, and one finds that ord 00 (x) = -2; ord=(Y) = -3. Every non-zero element
and the second gives the degree of a divisor: Div(E)--. Z, deg { [p] >->1. As suggested by the notation, divisors of the type (-> rp defined by
L(D)={O}U{
If D = n 1[pi]+ ... +nk[pk]-mi[q 1]- ... -mj[qj] with ni, mi ~ 0, then
# 0,
11.
294
SUMS OVER FINITE FIELDS
(1) Poles of order at most ni at p;, 1 :( i :( k; (2) Zeros of order at least m, at q;, 1 :( i :( j. It follows immediately that if D2 :( D1, then L(D2) C L(DI). Also, if D1 ~ D2, then writing D 1 = D2 + (1/J), the map --> V''P induces an isomorphism L(DI) --> L(D2), in particular, £(DI) = £(D 2) only depends on the linear equivalence class of the divisor. The following interpretation is al~o clear: there is a bijection
(11.34)
P(L(D)) __,{Effective divisors linearly equivalent to D}, { 'P .__. (
between the projective space P(L(D)) of L(D) and the effective divisors linearly equivalent to D (by definition of L(D), the map has image in the set of effective divisors). This also requires the important fact that (11.35)
L(O) =
k,
£(0) = 1.
In other words, an everywhere defined rational function on E is constant: this is obvious for elliptic curves, since regularity on C forces such a 0. deg(D) > 0 or D ~ 0.
Then either
PRooF. If £(D) > 0, there is a non-zero element
11.27. Let E be an elliptic curve over an algebraically closed field
k. For any divisor D on E, £(D) is finite and we have the formula (11.36)
£(D)-£( -D)= deg D.
Equivalently, by Lemma 11.26, we can compute £(D) for any D by: 1. If deg(D)) 0, and D f 0, then £(D)= deg(D). 2. If D ~ 0, then £(D) = 1. 3. If deg(D) < 0, then £(D) = 0. This is the Riemann-Roch theorem specialized for elliptic curves. See the remark below for the general case. The simple proof for the Riemann-Roch theorem hinges on the remarkable interaction of the group structure on E with the divisor group. Indeed, consider the map a : Div(E) __, E defined by a(n1 [pi] + ... + nk[pk]) = n1P1 + ... + nkpk, the + on the right side corresponding to the group law on E.
11.
PROPOSITION
SUMS OVER FINITE FIELDS
295
11.28. Let D be a divisor on E. Then we have D ~ [a(D)] + (deg(D)- 1)[oo].
PROOF. In essence this "is" the group law itself: by induction, we need only consider D = [p] + [q] and D = [p] - [q] for some points p and q. If p or q is the origin oo, the result is obvious. So consider first the case of D = [p] + [q] with p, q =/= oo and p =/= q. Then the equation aX + bY + c = 0 of the line joining p and q defines an element
(
~
0 we get
D = [p] + [q] ~ (
-[p] + [oo] ~ [-p]- [oo]
and hence D ~ [a(D)] + [oo], as desired. If p = q, the equation of the tangent line gives 2[p] + [2p) - 2[oo) ~ 0, and the result again follows. Finally if D = [p] - [q], use (11.37) to reduce to the previous case: [p]- [q] = [p] + ( -[q] + [oo])- [oo] ~ [p] + [-q]- 2[oo] ~ [p + q]- [oo]. D PROOF OF THE RIEMANN-ROCH THEOREM. Notice that (11.36) forD or -D are equivalent. By Proposition 11.28, we have £(D)= £([a(D)] + (degD -l)[oo]), £(-D)= £([-;,(D)]+ (1- deg D)[oo]). One of the two divisors on the right is effective, so we can assume that Dis effective and of the form D = [p] + n[oo] with P E E and n ): 0. If p = oo, then D = m[oo] with m): 1. We must prove £(D) = m. Any element
I n :s; m}l
= m.
Now we consider D = [p] +n[oo] with n ): 0 and p =/= oo. If n = 0, we can use the automorphism q >-> q-p which sends p to oo to get an isomorphism L([p]) c:- L([oo]) which implies £([p]) = 1.
11.
296
SUMS OVER FINITE FIELDS
Ifn;, 1, we have D) n[oo] hence L(n[oo]) C L(D). Thus n :(£(D). Moreover, because only a simple pole is allowed at p, we have £(D) :( n + 1: if 'lrp is a function with a simple zero at p, we have a k-linear map
L(D)/ L(n[oo]) ___, k { 'P >--> (7rp
(X- x) = [p] + [-p] - 2[oo], (Y + y) = [-p] + [p'] + [p"]- 3[oo] for some p' and p". Let
I (cp)+D>O}
and we let Ck(D) = dimk Lk(D). It is clear that Ck(D) :(£(D). THEoREM 11.29. For any k-rotional divisor D, we have
In particular the Riemann-Roch formula holds with Ck(D) instead of £(D).
PROoF. This is a special case of the following theorem, which is a formulation of Hilbert's Theorem 90 for GL(n): let k be a field, k an algebraic closure, V a kvector space with an action of G k. Then there is a basis of V made of elements which are Gk-fixed (equivalently, let Vk = yGk, then V = Vk Q9 k, or dimk Vk = dimk V). For a proof, see for instance [Sil], Lemma II.5.8.1. 0 REMARK. This theory adapts in the following way to more general algebraic curves (see for instance [Ha], IV): if E is smooth and projective over k, one can define a certain divisor class K (called the canonical class, and related to differentials on E). It has degree deg(K) = 2g- 2 for some integer g ) 0, called the genus of E, and the Riemann-Roch Theorem takes the form (11.38)
£(D) - C(K- D) = deg(D) + 1 -g.
l
I l l
l
11. SUMS OVER FINITE FIELDS
297
Elliptic curves correspond to g = 1; in this case the canonical class is trivial, and this reduces to Theorem 11.27. The case g = 0 corresponds to the projective line, and is also very easy. The proof of (11.38) is much more involved than the one for elliptic curves, since there is no group law on the curve which would help. 11.10. The local zeta function of elliptic curves.
Let C be an elliptic curve over lF given by (11.32). It is more convenient to use here the corresponding projective curve E, as described in Section 11.9. The zeta function of E is defined as the formal power series
Z(E) = exp(L \E(:n)\yn ).
(11.39)
n)l
We first relate Z(E) to points on the curve, by giving its Euler product (compare Lemma 11.7). To do this we introduce some terminology, which comes from the language of schemes. DEFINITION. Let E be an elliptic curve over a finite field lF with q elements. A closed point of E is the Galois orbit of a point x 0 E E(JF). The degree deg(x) of a closed point x is the cardinality (necessarily finite) of the orbit and its norm is N x = qdcg(x). The set of closed points of E is denoted \E\.
This notion is analogue to that of an irreducible polynomial in JF[X] used for the zeta functions of Gauss sums and Kloosterman sums. To every closed point x E \E\ is associated anlF-rational divisor which is simply the formal sum of all the elements in the orbit. The degree of this divisor is the degree of x. Moreover, it is easy to see that the group of lF-rational divisors is the free abelian group generated by the divisors associated to closed points. LEMMA
11.30. We have the Euler product expansion
Z(E)
( 11.40)
=
II
(1 ~
ydeg(x))-I,
xEiEi
where the product is over all closed points of E. PROOF. This is very close to Lemmas 11.7 and 11.9. First by decomposing the points in E(lFn) in Galois orbits we obtain
\E(lFn)\ =
Ld L din
1,
xEiEi
deg(x)=d
which is the analogue of Lemma 11.1. Then we have
Z'(E)
"'""'
~T Z(E) = L
\E(lFn)\T
n
n)l
and, on the other hand, this operator applied to the right side of (11.40) yields
L xEiEi
deg(x)LTndeg(x)=LTn(Ld n)l
n)l
din
L
1)
xEiEi
deg(x)=d
hence the result.
0
298
SUMS OVER FINITE FIELDS
11.
Using the Riemann-Roch Theorem, we now prove the rationality and functional equation of the zeta function. THEOREM 11.31. The zeta function Z(E) of an elliptic curve is a rational function. More precisely, it is of the form
zE
_ 1 - aT + qT ( ) - (1- T)(1- qT)
(11.41)
2
where a E Z is defined by the relations \E(lF)\ = q+ 1-a or, in terms of C, \C(lF)\ = q- a. The zeta function satisfies the functional equation Z(E, (qT)- 1 ) = Z(E, T). LEMMA 11.32. Let d) 0 and let hd(C) be the set of linear equivalence classes of lF-rational divisors of degree d. Then hd(C) is finite and \hd(C)\ = \ho(C)\ :( IE(lF)J. PROOF. For any rational divisor D, we have by Proposition 11.28 the equiv[u(D)) + (degD- 1)[oo], thus the linear equivalence class of D only depends on u(D). If D is lF-rational, it follows that u(D) E E(lF), and therefore the inequality \hd(C)I :( JE(lF)J holds. Moreover, it is clear that the map D ,__.. D + d[oo) with inverse D ,__.. D- d[oo) induces a bijection between hd(C) and h0 (C). D alenceD~
PROOF OF THEOREM 11.31. Since lF-rational divisors on E are simply combinations with integer coefficients of divisors associated to closed points, the Euler product (11.40) gives the formal power series expression
Z(E) =
L
Tdeg(D)
D)O
where the sum is over all effective lF-rational divisors on E. Split the sum according to the degree d of D; for d = 0, the only effective divisor is D = 0 so Z(E) = 1+ LTd L 1. d)!
D)O
deg(D)=d
For each d, split further the sum over divisors of degree d in linear equivalence classes. By Lemma 11.32, there are h0 (C) equivalence classes for each d. For a given class (that of D say), the contribution is the number of effective (JF-rational) divisors linearly equivalent to D. By (11.34) and Theorem 11.29, this is equal to qiF(D)-
JP(L(D))J =
q_
1
1
l(D)-
=
q_
1
1
.
Since d ) 1, the Riemann-Roch theorem implies f(D) = deg(D) computation of Z(E) is now straightforward
Z(E)=1+LTd d)!
L D)O
d, so the
1=1+ho(C)""(l-1)Td q-1 L.... d)l
dcg(D)=d
=
(11.42)
1
h0 (C) ( Td T ) h 0 (C)T 1 + q- 1 1- qT- 1- T = + (1- T)(1- qT)
1- bT+ qT 2 (1- T)(1- qT)
.,
11. SUMS OVER FINITE FIELDS
299
where b is defined by h0 (C) = q + 1- b. This proves the rationality, and gives the precise form, except that we need to prove that a= b, where IE(JF)I = q+ 1- a, or equivalently h0 (C) = IE(lF)I (actually in Lemma 11.32, we have already shown lh 0 (C)I :( IE(lF)I, but we do not need it any more). To obtain this equality, start from the original definition (11.39) of Z(E), and compare with (11.42): the latter is seen to imply that IE(lF)i = q+1-b = h0 (C). Finally the functional equation of Z(E) is a formal consequence of (11.42). D It is worth recording separately one of the last steps of the proof. PROPOSITION 11.33. Let E be an elliptic curve over a finite field lF, let D be a divisor on E. Then D is principal if and only if deg(D) = 0 and cr(D) = 0 E E. More precisely, the map j : D >--> cr(D) is an isomorphism between the group of divisor classes of degree 0 and E(lF). PROOF. A divisor D is 1Fn-rational for some n ) 1; looking at E over 1Fn, it suffices to prove the isomorphism between classes of lF-rational divisors of degree 0 and lF-rational points. But j is a surjective (j([x] - [oo]) = x) map between finite sets with the same cardinality (lho(C)I = IE(lF)i). D
This is the special case of the so-called Abel-Jacobi Theorem, for an elliptic curve over a finite field. It actually holds over any field, and a generalization to all (smooth projective) curves is the content of the theory of jacobian varieties associated to curves. To conclude the proof of Theorem 11.25, we proceed as in the case of Kloosterman sums: from (11.41), we derive IE(lFn)l- (qn + 1) =an+ {Jn where 1 - aT+ qT2 = (1 - aT)(1 - {JT). Then Lemma 11.22, applied with the input from Stepanov's Theorem 11.13, shows that ial :( ,fii., lfJI :( ,fii., and since a{J = q, this concludes the proof. EXERCISE 2. Assuming the general Riemann-Roch formula (11.38), prove that for a smooth projective algebraic curve E of genus g over a finite field lF with q elements, the zeta function
Z(E) = exp('L
IE~n)!Tn)
n;n is a rational function of the form Z(E) =
P(T) (1- T)(1- qT)
for some polynomial P with integral coefficients and degree 2g. [Hint: The question will arise whether there exist lF-rational divisor classes on E of degree 1 (which is obvious for elliptic curves since the point oo is lF-rational). The image of the degree map is 8Z for some 8 I (2g- 2) (the degree of the canonical class). Using this fact, find a preliminary form of the zeta function and analyze the poles to show that actually 8 = 1 (see [Mor2], 3.3).]
II.
300
SUMS OVER FfNfTE FrELDS
11.11. Survey of further results: a cohomological primer. The methods of Stepanov are very useful and, in certain circumstances they provide the best tools available today, especially when the genus of the curve is large compared to the cardinality of the finite field (see for instance the proof by Heath-Brown of non-trivial estimates for Hcilbronn sums [HB2]). However, the deepest understanding of exponential sums over finite fields and the greatest impact on classical problems of analytic number theory comes from the sophisticated concepts of algebraic number theory, especially the f-adic cohomology theory as developed by Grothendieck and his collaborators, which give a very powerful and flexible framework for working with very general exponential sums. The proof of the Riemann Hypothesis for varieties by Deligne [Del], and even more his far-reaching generalization [De2], are the basis for the extensive work of Katz, Lauman and others. It is beyond the scope of this book to discuss this theory in great detail. Let us direct the interested reader to the survey articles [Lau], [K2]. Study of the foundational basis of the f-adic theory can be started in [De3] and continued together with applications in the books of Katz, for instance [K3], [K4]. We will limit this section to a short introduction of the basic vocabulary and we will state a few of the most fundamental results in this language. We then include examples to show that such knowledge can already be very useful even when one is not familiar with the details and background of algebraic geometry. In Sections 11.4 and 11.5, we have shown that Gauss sums and Kloosterman sums can be related to analogues of Dirichlet characters over finite fields. The f-adic cohomological formalism which we now discuss can be thought as relating exponential sums, dually, to objects which are Galois-theoretic in nature. The exponential sums Sn defined by (11.8) can be interpreted as sums over the algebraic curve Uf,g consisting off; minus the poles of the rational functions f and g. More generally, one wishes to consider exponential sums not only over curves but over more general varieties. We will use some basic vocabulary of algebraic geometry to describe such situations, but will illustrate them in the simpler case of curves. Already the case of (11.8) and Uj,g are quite interesting. Let IF be a finite field and U /IF be a smooth algebraic variety of dimension d ) 0 (technically, we assume as part of the smoothness assumption that U is geometrically connected, and as part of being a variety that U is quasi-projective). The simplest examples in dimension 1 are Uf.g• or smooth projective curves. In dimension d > 1, the most important examples are the affine d-space Ad, with set of points A d(JF) = Jli'd, and the projective d-space. The exponential sums over U will be of the type (11.43)
Sn =
L
x(N(f(x)))'l/!(Tr(g(x)))
xEU(IFn)
where
f
and g are IF-rational functions defined on U.
To U /IF is associated the so-called arithmetic etale fundamental group 1r 1 (U) which "classifies" etale coverings V ---+ U of U, and is the analogue both of the Galois group of a field, or of the "ordinary" topological fundamental group. A morphism of algebraic varieties is etale if it is flat and unramified; if U is a curve, this means V is a curve, f is non-constant and unramified. For the simpler purposes of exponential sums, the fundamental group can be considered somewhat as a black
r 11. SUMS OVER FINITE FIELDS
301
box in what follows, but one should keep in mind that the elements of V in 1r1(U) act as automorphisms of any etale covering 1r : V ---t U (i.e. 1r(!x) = 1r(x) for any 1 E 1r1 (U) and x E V), and that it is a functor: any map U ---t V between varieties induces a continuous group homomorphism 1r1(U) ---t 1r1 (V). (One should fix a base-point in defining 1r 1 (U), but a more or less canonical choice exists, the so-called "generic point" of the scheme U.) EXAMPLES. (1) Let U be a single point {x} defined over lF. Then 1r1(U) is the Galois group Gal(lF /lF). (2) Let U/lF be a smooth curve, not necessarily projective. There is a an associated smooth projective curve C/lF such that U C C with complement a finite set T of points. If U = lf for instance, then C = lP' 1 is the projective line, and
T={O,oo}. The fundamental group can be described concretely as follows: let K = JF(U) = lF( C) be the function field of U, i.e. the field of rational functions on U or C (if C = lP' 1 , then K = lF(t) is the usual field of rational fractions). We have the Galois group G K = Gal(k / K) of K. For every closed point x of C, there is the c9rresponding discrete valuation ordx of K. This extends to the separable closure k of K, and gives rise to a decomposition group Dx < Gx and an inertia group Ix < Dx as in classical algebraic number theory, with the property that Dx/Ix ~ Gal(lFqflFq), where lFq is the residue field of x, a finite field with q = Nx elements. Then 1r 1 (U) "is" the quotient of Gx by the smallest closed normal containing all inertia groups Ix for x a closed point of U. Fix a prime number £ =fc p. The objects used to interpret exponential sums over U are the so-called £-adic sheaves on U. In the simpler cases, those will be "lisse", in which case there is a simpler alternate Galois-theoretic description which we take as definition. DEFINITION. Let U /lF be a smooth variety over a finite field. A lisse £-adic sheaf on U is a continuous representation p: 1r1 (U)--> GL(V) where Vis a finite dimensional Qe-vector space. Continuity refers to the profinite topology on 1r 1 ( U) and the £-adic topology on V.
Note the similarity with the definition of Galois representations of number fields (see Section 5.13). Because of the original definition of a sheaf, one usually denotes £-adic sheaves by curly letters :F, g, etc. Notice that one can obviously speak of direct sums, tensor product, symmetric powers, etc., of lisse £-adic sheaves by performing the corresponding operations on the representations. Also one can speak of irreducible sheaves, etc. An important £-adic sheaf, denoted Qe(l), is obtained by considering the natural action of 1r1 (U) on £-power roots of unity, which arises from the etale coverings where one simply extends the base field from lF to its extension by roots of unity. This action is given by a certain character Xt : 1r 1 (U) ---t Q£. Using this sheaf, one defines Tate twists: if :F is a lisse £-adic sheaf and i E Z, then one denotes :F( i) (:F twisted i times) the sheaf which corresponds to the action p' of 1r 1 (U) on the same vector space but with
p'(t) = xf(!)Pb); in other words, :F( 1)
=
:F 0 Qe (1) for instance.
302
SUMS OVER FINITE FIELDS
11.
Exponential sums arise by looking at the action of the Frobenius elements at points of U. Let x be a closed point x of U, which can be seen as a Galois orbit of points in U(IF). The fundamental group of the "point" x is the Galois group Dx of the residue field of U at x, isomorphic to 1Fn where n is the degree of x. By functoriality there is a map Dx __, 1r1 (U). We have Dx ~ Gal(Fn/IF n) and the latter is generated (topologically) by the Frobenius morphism cr, so taking the image we get in ?Tr(U) a well-defined conjugacy class, called the arithmetic Frobenius conjugacy class at x. In particular, for any f-adic sheaf one can speak of the trace Tr p(crx) without ambiguity. However, it turns out that it is the inverse F of CT (the so-called geometric Frobenius) which appears naturally in the cohomological description of exponential sums. We denote by Fx the corresponding conjugacy class; it is called simply the Frobenius conjugacy class at x (omitting the adjective geometric). THEOREM 11.34. Let U /IF be a smooth variety, let f =I 0 and g be IF -rational functions on U, let 'lj! be an additive character and x a multiplicative character of IF. Let Sn = Sn(U,f,g,x,'lj!) be the associated exponential sums over U(IFn) as in {11.43). Then there exists a lisse f-adic sheaf :F on U of degree 1 with the property that for all n ): 1 we have (11.44)
Sn =
L
Tr(Fx I :F)
xEU(IFn)
where we denote Tr(g p : ?Tr(U) __, GL(V).
I :F)=
Tr(p(g)
I V),
:F corresponding to the representation
To compare with the characters used to describe Gauss sums and Kloosterman sums, one should think of the latter as analogues of Dirichlet characters or Heeke characters, whereas the f-adic sheaves given by this theorem are analogues of Galois characters. The correspondence between the two concepts is an instance of reciprocity or class-field theory. We sketch the construction of :F in the case where x = 1 and g is a non-zero rational function on U = A 1 - {poles of g}, over IF, which makes it clear that this . is very closely related to the argument in Section 11.7. Consider the curve (11.45) and notice that there is a surjective map 1r : (x, y) >-+ x from C to U. For any a E JF, the equation yq - y - a = 0 is separable, hence it has q distinct roots in lF. In fact the additive group of IF acts on the roots by translation: if y is a root and z E IF, then (y + z)q- (y + z) = yq - y = a. Moreover, 1r : C __, U is an etale covering (we've just seen it is everywhere unramified and surjective). In other words, 1r is an etale Galois covering with Galois group isomorphic to the additive group IF (coverings given by such equations are called Artin-Shreier coverings). The fundamental group ?Tr(U) acts on C by automorphisms of the covering, which means as translations by elements of IF as above. This defines a surjective map
r
11. SUMS OVER FINITE FIELDS
303
1r1(C) to 1r1(U), which can be described as the space
V = {f: 1r1(U)
--t
Qe I f(n) = f(J) for any T E 1r1(C)}
(where T E 1r1 (C) is seen through the map 1r1(C) which 1r1(U) acts by translation on the right
--t
1r1(U) coming from 1r), on
p(J)j(T) = f(ry). The elements f E V depend only on 1T1(C)\1r1 (U) ~IF (i.e. on the automorphisms of the covering C --t U), which implies that V ~ Q~ is an £-adic sheaf on U of degree q. The representation space V can be decomposed over the additive characters 'ljJ of IF,
where l.p is the '1/J-eigencomponent of V, namely
L.p = {f E VI p(J)f = '1/J(rp(J))f for all1 E 1r 1 (U)}. It is easy to see that each l.p is an £-adic sheaf on U, and because p is induced from the trivial representation, each L.p is of degree 1.
Then for every additive character '1/J, the £-adic sheaf on U corresponding to l;r; is the sheaf satisfying (11.44) for the exponential sums Sn(U, g, '1/J). Indeed, if x E U(IFn), and y satisfies yq - y = g(x), then the Frobenius of x acts on y by yqn = y+TrFnfF(g(x)) since Yqn _ Y = Yqn _ Yqn-1 +Yqn-1 _ ••• +Yq _ Y
= (yq-
y)qn-l
+ · · · + yq- y =
Tr(yq- y)
= Trg(x).
Hence rp( u x) = Tr g( x) and by definition of l.p it follows that u x acts on l.p . by multiplication by '1/J(Trg(x)), hence Fx = u; 1 acts by .,z;(Trg(x)), which gives (11.44). In particular, note that taking the trace for IQle on C we derive IC(IFn)l = LSn(U,f,'ljJ), as in (11.30).
"'
EXERCISE 3. (1) Let Sn be the character sum (11.8) with g = 0 for some multiplicative character X of IF* and some non-zero rational function f E IF(x), on the variety U = A 1 - {zeros and poles of !} . Describe as above the construction of the sheaf£ satisfying (11.44) in this case. [Hint: Use the cover yd = f(x), where d is the order of the multiplicative character x.] (2) Let Sn be as in (11.8), U C A 1 the complement of the zeros and poles of f and the poles of g. If L.p is the sheaf satisfying (11.44) for f = 1 and lx is the sheaf satisfying (11.44) for g = 0, show that£= l.p 0 lx satisfies (11.44) for Sw EXAMPLES. (1) Even the case p = 1 is interesting when dealing with a general variety U. This "trivial" £-adic sheaf is denoted Qe, and one has Sn = IU(IFn)l. (2) For the sheaf Qe(1), notice that O"x acts by~>---> ~q for any root of unity if Nx = q. Therefore Fx acts by~>---> ~tfq and in particular the only eigenvalue of Fx is q- 1 •
11.
304
SUMS OVER FINITE FIELDS
Now in addition to UJF we consider its "extension of scalars" [J jiF over the algebraic closure of F. There is a corresponding geometric fundamental group 1r 1 (U), which sits in an exact sequence 1-. 1r 1 (U)-. 1r 1 (U)-. Gal(F/F)-. 1.
(11.46)
To every £-adic sheaf F on U are associated the £-adic cohomology groups with compact support of [J with coefficients in F. Those are finite-dimensional Qevector spaces, denoted, H~(U, F) fori) 0. The key point is that the Galois group ofF acts naturally on H~(U, F), and in particular, so do the Frobenius CT and its inverse F, the geometric Frobenius. The key to the cohomological interpretation of exponential sums is the GROTHENDIECK-LEFSCHETZ TRACE FORMULA. Let U jF be a smooth variety of dimension d) 0, F an £-adic sheaf on U. We have H~(U, F) = 0 if i > 2d and for any n) 1
2:
(11.4 7)
Tr(Fx I F)= Tr(pn I H~(U,F))- Tr(pn I H~(U,F))
+ · ··
xEU(IFn)
Therefore to evaluate the exponential sums (11.43) using the associated sheaf, we need to know the traces, or equivalently the eigenvalues, of F (equivalently, of CT = p - l ) acting on H~ for 0 :( i :( 2d. It turns out that in most cases Hg and H';d are easy to compute: PROPOSITION 11.35. Let F be a lisse £-adic sheaf on a smooth variety U JF, corresponding to the representation p of1r 1 (U) on the Qe-vector space V. We have if U is projective,
(11.48)
if U is not projective,
and (11.49)
va
where denotes the space of vectors invariant under the action of a group G on an abelian group, and Va denotes the space of co-invariants, the largest quotient of V on which G acts trivially. In both cases, the isomorphisms are canonical isomorphisms of vector spaces with an action of the Galois group of F. Since V is a representation of
1r 1 (U),
the exact sequence (11.46) shows that
V"'(O) and Vrr,(U) are acted on by Gal(FJF), "through" the given representation p.
This proposition shows that for a curve UJF, the only "difficult" cohomology group is H1(U,F). EXAMPLE. Let U = E jFp be an elliptic curve, F = Qe the trivial sheaf. By the proposition one has (1) Hg(E,Qe) = Qe, with trivial action ofF (since Qe is the trivial sheaf). (2) H';(E,Qe) = Qe(-1), so by definition of the twist, Facts by multiplication by p (on roots of unity, i.e. on Qe(1), CT acts by ~ ,.... e, hence F by multiplication by p- 1 ).
11.
SUMS OVER FINITE FIELDS
305
The Lefschetz trace formula (11.47) gives
(compare Theorem 11.31). More generally, one derives the rationality of the zeta function directly from the Trace Formula. CoROLLARY 11.36. Let U, Sn and :F be as in Theorem 11.34. For 0 :( i :( 2d, let b; =dim H~(U,:F) and b,
P;(T)
=
det(1- FT I H~(U,:F))
=II (1- n;,jT). j=l
We have
and for n) 1,
(11.50)
Sn =
L
(-1rnr,1.
O(i(2d
Theorems 11.4, l1.8 and the result of Exercise 1 are all special cases of this corollary, together with suitable computations of cohomology groups. The numbers b; are called the £-adic Betti numbers for :F. Of much greater importance, however, is Deligne's vast generalization of the Riemann Hypothesis [De2]. One starts with the following "local" definition: DEFINITION. Let w E Z be an integer. A lisse £-adic sheaf :F on U jJF is said to be pure of weight w if for any closed point x of U, all eigenvalues of Fx acting on the Qe vector space V associated to :F are algebraic integers all conjugates of which have the same absolute value equal to qw/ 2 where q = Nx is the cardinality of the residue field. For instance, the trivial sheaf Qe is pure of weight 0 (all eigenvalues 1). For any i E Z, Qe(i) is pure of weight -2i, and if:Fispure of weight w, then :F(i) is pure of weight w- 2i. For any exponential sum (11.43), the associated sheaf :F is pure of weight 0 because the only eigenvalue at xis the root of unity x(N f(x))'l/J(Tr g(x)). THEOREM 11.37 (DELIGNE). Let UjJF be a smooth variety and :Fa lisse £-adic sheaf on U, pure of weight w. Let i ) 0 and let~ be any eigenvalue of the geometric Frobenius F acting on H~(U,:F). Then~ is an algebraic integer, and if n E IC is a conjugate of~. we have (11.51)
In I :( q(w+i)/2.
The conclusion is also phrased as saying that H~ (U, :F) is mixed of weights :( i + w. If there is equality in ( 11.51), then H~ (U, :F) is said to be pure (of weight w + i). In certain cases, one can apply duality theorems (for instance Poincare duality) to deduce further that ( 11.51) is an equality.
306
11.
SUMS OVER FINITE FIELDS
REMARK. Although Deligne's proof is a monumental achievement of very deep algebraic geometry, it is an interesting fact that a crucial use is made of a generalization of the method of Hadamard and de Ia Vallee Poussin for proving non-vanishing of £-functions on the line Re(s) = 1 (see Section 5.4). Similarly, in Deligne's first proof [Del], the ideas of the classical Rankin-Selberg method for modular forms are essential (specifically, Deligne acknowledges the influence of [Ra3]). EXAMPLE. Let C JIF be a smooth connected projective curve (for instance, an elliptic curve). By Proposition 11.35 as in the previous example, we have easily: (1) H2(
It follows that if a is one of (the complex conjugates of) the eigenvalues of F on H~(C,Qe), then pja is one also. Hence from Theorem 11.37, since Otis pure of weight 0, one deduces that In I= VP· Thus 2g
IC(!Fn)l = pn
+ 1- L
ar
i=l
where the ai E
Q are the eigenvalues ofF on
H~. Estimating trivially now, we get
recovering the Riemann Hypothesis, and in particular, Theorem 11.25 for the case g = 1. In the case of exponential sums (11.43), the sheaf :F is pure of weight 0, hence denoting d(:F) = max{i I H~(U,:F) ""'0}, we derive directly from (11.50) and (11.51) the bound
(11.52)
ISnl ,;
L
2
biqni/ ,
o,;i,;d(F)
for n ): 1 and, in particular,
(11.53) As in the case of Kloosterman sums, the exponent d(:F)j2 is best possible in this inequality. The bound d(:F) ,; 2d gives a trivial estimate (because U is smooth of dimension d, it has about qnd points, as proved by. the Riemann Hypothesis for the trivial sheaf Qe). Any improvement of this trivial bound is equivalent with H;d(U, :F) = 0, and the square root cancellation often expected from heuristic reasonings is equivalent with H~(U, :F) = 0 for i > d. Although not always true, this turns out to hold "generically", as the analytic intuition suggests (see for instance Theorem 11.43 below).·
l l l
I
i
l '
l
1
j
11.
SUMS OVER FINITE FIELDS
307
For the exponential sums (11.43) we have d(F) < 2d, unless F is the trivial sheaf Qe, so there is always a non-trivial bound. This follows from (11.49), since F is of degree 1 so the space of co-invariants is either the whole space (meaning the representation is trivial) or 0. However, this small gain is usually insufficient in applications. Another surprising consequence of Deligne's result and the discreteness of integers is the following "self-improving" statement: COROLLARY 11.38. Let Sn be an exponential sum as in {11.43) and F the associated sheaf. Suppose w ): 0 is an integer such that
ISnl «
qw/2+0
for some 8 E [0, ~[and n): 1. Then we have d(F) (; w, hence
ISnl «
qwf 2 .
A second issue in applying the estimates (11.52) or (11.53) in the context of applications to analytic number theory is that we usually have lF = 7ljp7l, with the prime number p varying. In this case, whereas the variety U can be defined over Q (or Z) so that the sum is, for all p, over the 1Fp-points of the reduction Up of U modulo p, the sheaves Fp genuinely depend on p (see the equation (11.45)), i.e. there is no theory of sheaves over U jZ giving each Fp by "reduction modulo p". (Katz has asked a number of times for such a theory of "exponential sums over Z"; see e.g. [K2], but it remains elusive.) Thus, the Betti numbers bi(P) = dimH~(Up,Fp)
of the cohomology groups can depend on p, and the applicability of the results above would be ruined, even with the Riemann Hypothesis, if these dimensions were not bounded in a reasonable way in terms of p. This is in fact the case. The first general result in this direction is due to Bombieri [Bo4] for additive character sums (11.43) where f = 1, and was generalized by Adolphson and Sperber [ASl], [AS2] for general sums (their methods are p-adic, based on Dwork's original ideas). In general, those results bound the Euler characteristic 2d
Xc(F)
= 2)-1fdimH~(U,F) = i=O
(-l)ibi,
L O(i(2d
of a sheaf F on U jJF, but further arguments of Katz [K5] show how to deduce bounds for 2d
ac(F)
= LdimH~(U,F) =
L
bi,
O(i(2d
i=O
(hence forb, (; ac(F)) from those for Xc(F). THEOREM 11.39. Let U /Q be a smooth variety over Q, f and g Junctions on U with f invertible. Let C be a prime number and for all p ol C such that the reduction Up ofU modulo pis smooth, let x and '1/J be any multiplicative and additive characters ofJFP. Let Fp be an C-adic sheaf on Up such that L
x(Nf(x))'lj;(Trg(x)) =
L
Tr(Fx
I Fp)
11.
308
SUMS OVER FINITE FIELDS
for n): 1. We have ac(Fp) (; C where C is a constant depending only on U, f and g.
A simple explicit bound is given in [AS3] if f(x) = 1, so only the additive characters occur, and g is a Laurent polynomial on U = (Q- {0} )d. The sums over 7ljp7l in question are therefore sums in d variables of the type
(11.54) where
f
s,,p = E Q[x 1 , x1 1 , . . . , xd, x;J 1 ] is a non-zero Laurent polynomial. Writing
for some (finite) set J c zd, the Newton polyhedron W(f) off is defined to be the convex hull in JRd of J U {0}. PROPOSITION 11.40. With the above assumptions, denoting by Fj,p the associated sheaf for the sums Sf,p• we have
lxc(FJ,p)l ( d!Vol (W(f)), ac(Fj,p) (; lOdd!Vol(W(f)) for any p not dividing the denominator of any coefficient off, where Vol (W(J)) is the volume of the Newton polyhedron in the sub.space spanned by W(f) in JRd, with respect to Lebesgue measure.
Note that by using exclusion-inclusion and detecting polynomials equations by means of multiplicative characters, one can use combinations of sums of the type (11.54) to describe much more general ones. Also, in many cases, one can show that only all the odd (or even) cohomology groups vanish, in which case lxc(F)I = ac(F). See also Theorems 11 and 12 of [K5] for explicit estimates in quite general cases. We now give examples of computations using these fundamental results. For exponential sums arising in analytic number theory, one often needs nothing more, if one uses skillfully some other simple tricks such as averaging over extra parameters to analyze the weight of the roots. EXAMPLE 1. The Kloosterman sums S(a, b; p) for ab ol 0 can be treated using Proposition 11.40 with d = 1 and f(x) = ax+ bx- 1 • Then W(f) is the interval [-1, 1]. By Proposition 11.35, we have H~ = H~ = 0 in this case since U = lP 1 - {O,oo} is not projective, so ac = -xc- By Theorem 11.37, is mixed of weight (; 1. Hence we recover the Wei! bound:
H;
(Of course, in fact we have b1 = 2 and the last inequality is an equality). EXAMPLE 2. The previous example generalizes to the multiple Kloosterman sums defined by
(11.55)
11. SUMS
OVER FINITE FIELDS
309
for r): 2 and a ol 0, so K2(a,p) = S(a, 1;p) (see [Bo4], [Del]). Without appealing to the £-function, one can nevertheless get some information by averaging over a. We get
L 1Kr(a,q)l2 = qr- qr-1- ... -
(11.56)
q -1.
a#O
Hence IKr(a,q)l (; qr/ 2 following Lemma LEMMA
numbers
To improve this elementary bound we appeal to the
ll.41. Given a finite set of distinct angles
ai we have
L IL:a,e(nBi)l n(N
2
()i
modulo 2JT and complex
2
= Nllall +0(1)
i
where the implied constant does not depend on N. Hence
1~~!~ IL:aie(nBi)l)
(11.57)
llall.
' PROOF. We have
L IL:a,e(nBif =NL:Iaif+ LLaiaJ L e(n(Bi-8
1 )).
n(N
i
i=;l:j
i
n(N
The inner sum is bounded by a constant independent of N, so the first result follows and (11.57) is an obvious consequence. D From (11.56) and (11.57) it follows that among the Kr(a,p), there is at most one root of weight r, say for K r ( a0 , p), and all other roots are of weight (; r - 1. Notice that Kr(ao, p) E Q(llp), the cyclotomic field of p-th roots of unity. Using the Galois action on Q(llp), the conjugates of Kr(ao,p) are Kr(aovr,p) for v E IF;. By the Riemann Hypothesis, this means that the conjugate of the root ~ of weight r is still a root of weight r for Kr(a 0 v 2,p). Hence vr = 1 for all v E IFp, which is only possible if p- 1 I r. In particular all roots are of weight (; r- 1 if p > r + 1. One therefore gets by Proposition 11.40 (11.58)
Kr(a,q)
«
q(r-1)/2
where the implied constant depends only on r. In the case of Kloosterman sum the Newton polyhedron is the simplex with vertices (1,0, ... ,0), ... ,(0, ... ,0,1),(-1, ... ,-1) whose volume is 1/r!. Moreover, it is known that the zeta function is a polynomial so Xc = -a 0 and we get the precise estimate IKr(a,q)l (; rq(r-1)/2 This was first proved by Deligne [De3], without any assumption on p and r.
SUMS OVER FINITE FIELDS
11.
310
EXAMPLE 3. Here is another higher-dimensional example. In lowing exponential sum over finite fields appears: (11.59)
L
W(x, 1/J;p) =
x(xy(x
[Cll], the fol-
+ 1)(y + 1))'1/;(xy- 1)
x,y mod p
where p is prime, x is a non-trivial quadratic character modulo p and 1/J is any multiplicative character modulo p. THEOREM 11.42 (CONREY-lWANIEC). There exists an absolute positive constant C such that
IW(x,'l/J;p)l (; Cp
(11.60)
for all p and all'!/; as above. The first step of the proof is to apply Theorem 11.34 to say there exists an C-adic sheaf F on the algebraic surface
U = {(x, y) I xy(x
+ 1)(y + 1) =f 0},
pure of weight 0 for which we have, for q = pn, n ): 1, the formula
W(x, 1/J; q) =
L
x(N(xy(x + 1)(y + 1)))'1/;(N(xy- 1))
x,yEU(I'q)
L
Tr(F(x,y)
I F).
x,yEU(I'q)
The Lefschetz trace formula (11.47) takes the form 4
W(x,'l/J;q) = L(-1);Tr(Fn I H~(U,F)). i=O
By Theorem 11.37, each H~(U,F) is mixed of weights(; i. Let (av, iv, w ... ) be the family of eigenvalues ofF acting on the whole cohomology (with multiplicities), together with their index and weight. We have iavl = pw"/ 2 and
By Theorem 11.39 in this case, the total number of roots Ov is bounded by a constant independent of p. Thus, we gain on the trivial bound W « p 2 if Wv (; 3 (instead of 4), and the statement of the theorem is that Wv (; 2. The second step is to show that there is at most one root of weight ): 3 (actually, it must be = 3) and, if it exists, then 1/J = X is the non-trivial quadratic character. This will be derived from the following average formula: (11.61)
1
A= q-
1
2.::1W(x,1/J;q)l 2 =q 2 -2q-2.
"'
11. SUMS OVER FINITE FIELDS
311
To prove this formula, open the square and sum over 1/J first getting by the orthogonality of characters
11.I,Vi 1U2
(where we shorten the notation from performed giving
x oN
to
x).
Next the summation over u 1 is
Then the sum over u 2 is performed giving
L
x(u2 + 1)x(u2)B(vl, u2) = qx(vl + 1) + x(vl) + 1,
tL2;iO
and finally the sum over v 1 gives
A=
L
x(vl + 1)(qx(vl + 1) + x(v!) + 1) = q(q- 2)- 2.
Vt#O
From (11.61), using Lemma 11.41, it is clear that for all 1/J and v we have 3 and moreover, Wv (; 2 except for at most one root, for one character 1/J. If this case occurs for 1/J, it happens for {! too, so the only possibility is 1/J being a real character. Since
Wv (;
W(x,1;q)= (L::x(u(u+1))/ =1, u
we must have 1/J =X· The· last step is to treat the case 1/J = that
x
separately. Precisely, one can show
IW(x,x;q)l (; 4q for any p and q = pn. This is done in a purely elementary manner without appealing to the Riemann Hypothesis, and we refer to [Cil] for the details. In fact, W. Duke showed that W(x,x;p) = 2Re(J2(x,~)), where J(x,O is the Jacobi sum and~ is a quartic character modulo p 1 (mod 4). Also W(x, x; p) is the p-th Fourier coefficient of the modular form TJ( 4z) 6 of weight 3 and level 12.
=
When U is not a curve, numerous geometric subtleties can be involved in dealing with the non-trivial cohomology groups H~ with i ol 0, 2d. Here are two general bounds, among many: the first one is due to Deligne [Del], and the second is the recent version in [FK] of a general "stratification" theorem of Katz and Lauman. THEOREM 11.43. {1) Let f E Z[X1, ... , Xm] be a non-zero polynomial of degree d such that the hyper-surface Ht in pm-l defined by the equation
-1 IL
312
SUMS OVER FINITE FIELDS
where fd is the homogeneous component of degree d off, is non-singular. For any p f d such that the reduction of H 1 modulo p is smooth, any non-trivial additive character 1/J modulo p and any n ): 1 we have
(11.62)
I L ... L X1 , ...
1/J(Tr(J(xr, ... ,xm)))l ( (d -1)mqnm/2.
,xm.EFpn
I
J:i
.·jl
{2) Let d ): 1, n ): 1 be integers, V a locally closed subscheme of A.z of dimension dim V(C) ( d and f E Z[Xr, ... ,Xn] a polynomial. Then there exists C = C(n, d, V, f) and closed subschemes X 1 C A'Z of relative dimension ( n - j such that
·.
Xn c ... c X2 c Xr c A'Z
.
'1 ,~
-~
and for any rational function g non-zero on V, any hE (ZjpZt- X 1(ZjpZ), any prime p, any non-trivial additive character 1/J and multiplicative character x modulo p, we have ·
L
x(g(x))'l/;(J(x)
+ hrXI + ... hnxn)
( Cp~+Sl_
xEV(Z/pZ)
Note that in (1), subject to a geometric condition on Ht, we obtain square root cancellation in the exponential sum. In (2) the assumptions are much Jess stringent, but the conclusion is weaker: we have a family of exponential sums parameterized by h E An, and roughly speaking we have square root cancellation for "generic" sums (for h outside an exceptional subvariety X 1 of codimension ): 1 in An), while worse and worse bounds can occur only on smaller and smaller subvarieties. We will give an application of (2) in Chapter 21. The C-adic theory and formalism are much more developed than what we have surveyed here. It can also deal with sums over singular varieties, but the necessary algebraic notions become rather formidable, and we concede being unable to discuss the perverse sheaves that arise in more advanced situations. We wish to emphasize, however, that this theory is also particularly well-suited to the study of families of exponential sums. The parameters defining those, say Sx, are most naturally themselves points on some algebraic variety Xj'F. In favorable circumstances there exists a lisse C-adic sheaf F on X (corresponding to an action of 1r 1 (X) on V) such that (11.63)
Sx,n = Tr (F;'
I V)
for every value of the parameter x E X and any n ): 1. For example a non-zero polynomial f of degree ( d over F can be described as an F-rational point of the affine parameter space X= _Ad+!- {0} by d
f =
L a,Xi
f-t
(ao, ... ,ad)·
i=O
There is an C-adic sheaf F on X such that
L '1/;(J(x)) = Tr (F;' IV). xEFn
;
······J··· .
11. SUMS OVER FINITE FIELDS
313
Purity of a sheaf satisfying (1 1.63) depends on a first application of the Riemann Hypothesis. If it holds, the application of the C-adic theory typically results in equidistribution statements (following from [De2]) for the arguments of the exponential sums. This equidistribution is in some space of conjugacy classes of the "monodromy group" of the situation. We refer for instance to [KS] for a very lucid introduction to these profound aspects. 11.12. Comments.
In this closing section we give some impressions about how the exponential and character sums over finite fields interact with analytic number theory; There are more subtle issues between the two subjects than just applications of results concerning the first one for solving problems of the latter. We could be quite specific by covering completely a few representative examples, but it would be long and not transparent enough. Rather we decided to discuss principles, ideas and tricks in general terms and guide the reader to particular publications.
I I r
I ,.
•f
[ i
i
First of all some exponential sums appear when one uses Fourier analysis to get a hold on the sequence under investigation. There are no finite fields in the background, so the resulting sums are not immediately related to objects of algebraic geometry. However, one can complete these sums (by another use of Fourier analysis) and then factor them into sums of prime power moduli. Usually one can evaluate these local sums explicitly, or give strong estimates by elementary or ad hoc methods, except when the modulus is prime. But in the prime case one may naturally consider the sum as being over a finite field. This scheme allows us to appeal to the powerful results from algebraic geometry. However, the drawback of finishing by estimates for every complete sum individually is that one cannot exploit a possible cancellation from extra variables offered by the varying moduli (finite fields of different characteristics do not interact in algebraic geometry). Sometimes a kind of reciprocity formula can help turn the modulus into a variable (see for example [Ill] or [M3]). Another scenario is that the sums over modulus appear in the spectral resolution of a differential operator, in which case the spectral theory produces estimates far stronger than those derived by algebraic geometry. For example, this is the case of sums of Kloosterman sums; see Chapter 16. One can also think this way about the real character sums with cubic polynomials; they are coefficients of a cusp form associated with an elliptic curves, so the modularity gives extra cancellation in summation over the modulus. As a rule the exponential/character sum of a given modulus which comes out of analytic number theory is incomplete. This itself is not a problem because various completing techniques are available as mentioned above. Completing is a natural step to take, but is it useful or wasteful? At this point one should realize that a. bound for a complete sum holds essentially the same for the original incomplete sum. This means that the result is relatively weaker for a shorter sum. Still it is non-trivial when the length of the original sum is larger than the square root of the modulus. Very short sums cannot be treated this way. We do not have an absolute recommendation when to complete or not a given incomplete sum. Our experience suggests executing the Fourier method as long as the resulting summation over the frequencies is shorter than the range of the original sum: at least one can feel that one is progressing. But sometimes it is worth acting otherwise, accepting a step backwards in this respect while opening a position for a stronger second move.
314
11.
SUMS OVER FINITE FIELDS
For example, imagine that the amplitude in the exponential sum is not a rational function, but nevertheless becomes one after an application of the stationary phase method to the relevant Fourier transform. In this case the losses from the range of summation can be recovered with extra savings by applications of algebraic arguments (see Section 8 of [Cil], where this game is played in several variables simultaneously). Whatever the arguments which lead to complete sums may be, in the final step one cannot beat the square root saving factor. Therefore to receive a non-trivial result one must first produce somehow a sum with a number of terms larger than the square root of the modulus. There are several ways to get started, depending on the shape of the exponential/character sum. First, one can try to apply a Weyl shift with the effect of squaring the number of terms. Similarly one may just square the whole sum, or raise it to a higher power to produce even more points. Note that shifting the variable and squaring the sum are not the same things; the first requires some additive features of the variable while the latter nothing at all. These operations seem to be quite superficial at first glance (we are taking essentially replicas of the original sum), yet with ingenuity one can rearrange the points so the summation goes in a skewed direction, the consecutive terms repel violently and randomly producing a considerable cancellation. This is easy to say, much harder to execute. In fact one needs many other devices, such as gluing several variables with small multiplicity in order to arrive at a single variable over a range larger than the square root of the modulus. One also must smooth out this composite variable before applying algebraic arguments. Usually an application of Cauchy's inequality and enlarging the outer summation (due to positivity) does the job. A powerful example how this works is given by Burgess [Burl]. In this paper a short character sum is estimated by an appeal to the Riemann Hypothesis for algebraic curves. An interesting point is that after all the tricks one comes to a complete character sum for a curve of a large genus, although the original sum is over a line segment. Different arguments for building one extra large variable are applied in [FI4], consequently the final complete sum comes in three variables, or equivalently in four variables over a hypersurface. Here the Deligne theory applies (see the Appendix by Birch and Bombieri), although the related variety is singular. One should not be surprised and afraid of that singularity, because, after all, the process of creating more points of summation at the start is quite superficial. In this game one must be experienced when mixing the points to be sure that it is quite random. Another interesting case of creating and estimating exponential sums in three variables is given by N. Pitt [PJ]. Applications of the Riemann Hypothesis for curves over finite fields are by now customary. Much less successful are the use of genuinely higher-dimensional varieties. There are reasons for the difficulties involved. First of all when more variables appear, stronger restrictions are imposed on them which are harder to resolve (a kind of uncertainty principle). Just imagine having an abundance of points to work with, but which are not free because of some side conditions. For example how would you cope with a requirement that the determinants of a family of elliptic curves match the conductors?
,
r
11. SUMS OVER FINITE FIELDS
315
Of course, there are also direct applications of the Riemann Hypothesis for varieties to traditional problems of solvability of diophantine equations by means of the circle method (see examples in Chapter 20). If the number of variables is sufficiently large, one needs nothing to manipulate, except for completing the sum by the standard Fourier method. Some ingenuity, however, is required to apply the circle method (a variant of Kloosterman) to treat diophantine equations with a relatively small number of variables, an excellent example being the work of HeathBrown on cubic forms [HB6]. It is possible in some circumstances to beat the bound for exponential sums which is derived by the Riemann Hypothesis. This is because the angles of the roots of the £-function themselves vary so that additional cancellation may occur. Deligne and Katz have established such occurrences for families parameterized by points on curves or varieties. In other words, in their cases one is actually considering exponential sums in more variables. However, the cancellation of roots can also occur for families parameterized by points over small irregular sets. More important for analytic number theory is that these sets can be quite general, no structure of a subvariety is needed, but instead a kind of a bilinear form structure would suffice. In practice it is not clear how to work with the roots, so one returns to the corresponding exponential sums where manipulations with the parameters (grouping, gluing, etc.) can be performed properly according to the shape of the involved rational function which is seen with the naked eye. In this process one must not destroy the complete variables since in our mind the corresponding summations are already executed in terms of the roots. Therefore when applying Cauchy's inequality to smooth the one variable composed out of the parameters, we put all the complete variables to the inner summation, say n of them, together with some remaining parameters which where not used in the composition. These inner parameters are critical for enlarging the diagonal. After squaring out we get a complete exponential sum in 2n + 1 variables which depends on the inner parameters. Except for a few configurations of those, the complete exponential sum satisfies the best possible bound derived from the Riemann Hypothesis, thus saving the factor of square root of the modulus per each variable. Since the number of variables is larger than twice the original, we win the game. The above recipe is somewhat oversimplified, yet it reveals the source of extra saving. One can see how it works in the particular case of [Cll]. Speaking of [Cll] we would like to add that here the exponential/character sums in several variables emerge after applications of harmonic analysis with respect to the hyperbolic Laplace operator rather than by the traditional Fourier analysis. The profound theory of Deligne and other geometers is being used in analytic number theory with spectacular effects, yet more ideas need to be invented to fully exploit its potential. Perhaps one should go beyond borrowing estimates and penetrate deeply inside the theory. This is a great subject for future research. P. Michel [M3] made the first significant steps (see also [FK] and [KSl]).
CHAPTER 12
CHARACTER SUMS 12.1. Introduction.
In analytic number theory one often encounters sums of type
S= LF(x)
(12.1)
xEV
zn
where V c is a finite set and F : V -+ IC is a periodic function of period q. Because V does not match the periodicity, S is called an incomplete sum. Suppose
F(x) = x(J(x))'l/;(g(x))
(12.2)
where x, 1/J are multiplicative and additive characters to modulus q, and rational functions with integer coefficients. Precisely, we assume that (12.3)
f=fr/fo,
J, g
are
g=gJ/go
where fo,fr,go,g! E Z[x] and (12.4)
Uo(x)go(x),q) = 1 if x E V.
Having fixed these polynomials (which are not unique) we define (12.5) (12.6)
x(f(x)) = x(fr(x)]o(x)), '1/;(g(x)) = 'l/;(gl(x)!Jo(x))
where, as usual, a denotes the multiplicative inverse of a modulo q. Then the resulting sum (12.7)
S=
2.::* x(f(x))'l/;(g(x)) xEV
is called an incomplete character sum. Here, and hereafter, the star restricts the summation to the points of V satisfying (12.4). The residue classes x(mod q) which satisfy (12.4) will be called admissible. An important case, but by no means the only one, is V being a box. When the box has size exactly q we get (12.8)
S(x, '!/;) =
2.::*
x(f(x))'l/;(g(x))
x(mod q)
which is called a complete character sum. In this chapter we give basic techniques of estimating incomplete character sums in one variable over an interval. As an exercise for the reader, we suggest generalizing some of the forthcoming results to several variables. 317
12. CHARACTER SUMS
318
12.2. Completing ·methods. The most common treatment of an incomplete sum (12.9)
L*
S(M;N) =
F(n)
goes by expanding it into complete sums (this is called the completing technique), (12.10)
s(~)
L*
=
F(x)e(-
a;)
x(mod q)
and treating the latter by various arithmetic means. To do so we split the summation into the *-admissible residue classes n x(mod q), and detect these classes by the orthogonality of additive characters getting
=
(12.11)
S(M;N) =
~
L
-\(~)s(~)
a(mod q)
where (12.12) Usually the main contribution to S(M;N) comes from a= 0 in (12.11) for which -\(0) = N (we assume that M and N): 1 are integers). For 0 < Ia! :( ~' we have j-\(~)1 :( qjaj- 1 . Hence LEMMA 12.1. Let F(x) be a complex-valued function defined on the residue classes x (mod q) which are -~<-admissible. Then the corresponding sums satisfy
(12.13)
Suppose that the complete sums satisfy (12.14)
js(~)j
:( c(B)(a+ b,q)~l
for some b E Z and B > ~. This bound is often true with B = ~ Then (12.13) becomes (12.15)
+ c for any c > 0.
js(M; N)- ~S(O) I :( c(B)£(b, q)l
where (12.16)
f(b,q)=
L
ial- 1 (a+b,q)~.
O
Since C(b, q) is usually small (for example, we have £(0, q) :( 2T(q) log q), the inequality (12.15) shows that the incomplete sum S(M; N) satisfies essentially the same bound as does the corresponding complete sums S( ~ ), up to the main term !!'.s(o) q .
"
12. CHARACTER SUMS
319
Clearly the above method of completing the sum of a periodic function F(x) works for more general sums of type
S = LF(n)G(n)
(12.17)
where G(x) is a nice function of analytic character which decays to zero rapidly as lxl -> oo. Now we present another method. Suppose G(x) is of Schwartz class on IR. Then one can apply the Poisson summation formula, giving
s = ~ Ls(~)c(~)
(12.18)
q aEZ
q
q
where G(y) is the Fourier transform of F(x). In principle both methods are equivalent, however the latter may be preferable in case G(x) propagates some vibrations. If the vibrations are regular, G( '!) can be evaluated asymptotically by the stationq ary phase method (see for example Corollary 8.15) giving an asymptotic expansion for Sin terms of S(~). On the other hand, if G(x) is wild, one should have some reservation for completing S in terms of additive characters. One should try to use "harmonics" which are more adequate for the particular case. The Fourier coefficients of cusp forms, holomorphic or non-holomorphic, may serve well as the relevant harmonics in many cases. 12.3. Complete character sums. Let Xq and 1/;q be multiplicative and additive characters respectively to the modulus q. The complete character sum
S(xq,1/;q) =
L*
Xq(f(x))'!j;q(g(x))
x(mod q)
is multiplicative with respect to q. Indeed, let q = rs with (r, s) = 1. Then Xq factors uniquely into XrXs where Xr. Xs are multiplicative characters to moduli r, s respectively. The additive character 1/;q is given by (12.19) for some a E Z. By the "reciprocity" formula (12.20)
=
where ss 1(mod r) and rf 1/;;.1/J:. Hence it follows that
s r 1 -r +-s =-(mod 1) q
=1(mod s), the additive character factors into 1/;
9
=
(12.21) Therefore the problem of evaluating a complete character sum of modulus q reduces to that of prime power moduli. The complete sum S(x, 1/;) of modulus q = p/3 shouldn't be confused with the character sum over the finite field IFq which was considered in Chapter 11, except for q = p in which case S(x, 1j;) is indeed one of such sums. The case of prime modulus belongs to the theory of £-functions for curves over finite fields, as described in Chapter 11. The rationality of the relevant £-function together with the Riemann
12. CHARACTER SUMS
320
Hypothesis (both proved by A. Wei! [Wel] in this case) yield algebraic numbers lgvl = p, p!, 1 such that
g1, ... , gr with
S(x,1/J) = g1 +···+g.,
(12.22)
The number r is bounded independently of the characteristic p. Moreover, assuming some non-singularity conditions for the rational functions J, g with respect to the characters X, 1/J, there are no roots gv with lgvl = p, so (12.22) yields (12.23) See Chapter 11 and in particular Section 11.11 for more complete discussion of this situation. There is no need to algebraic geometry for the complete character sums S(x, 1/;) to modulus q = p/3 with f3 ;;: 2, because in this case elementary arguments are available. LEMMA
12.2. Let q = p 2 " with a;;: 1. Then we have
(12.24)
x(f(y))1/J(g(y))
S(x,1/J) =p" y(mod pa) h(y)=O(mod pa)
where h(y) is the rational function given by (12.25)
h(y)
=
ag'(y)
!' + bf(y)
with the integers a, b depending on the characters 1/J, x which are determined by {12.19} and (12.27} below. REMARK. Since the characters x, 1/J have modulus p 2"' it is required for correctness of the summation (12.24) to know that x(f(y))'!j;(g(y)) does not depend on the choice of the representative of y(mod p"') on the curve h(y) O(mod p"'). This property follows in the course of the proof.
=
PROOF. Write x = y + zp"', where y and z run independently over any fixed systems of residue classes modulo p"', and y is restricted by the condition p f fo(y)go(y). We have
f(x)
(12.26)
= f(y) + J'(y)zp"'
(mod p 2 "')
Indeed, the congruence (12.26) is easily seen for monomials xn by the binomial formula
(y
+ zp"')n
=
yn
+ nyn-1 zp"' + ....
Then it extends to arbitrary polynomials JI(x),j0 (x) with integral coefficients by linearity. Moreover, if f 0 (y) O(mod p), then (12.26) is verified for f(x)j Jo(x) as follows:
=
h(x)/ fo(x)
=(JI(y) + J;(y)zp"')fo(y)(1- fo(y)Jb(y)zp"') =h (y)]o(y) + u; (y)fo(y)- j"g(y)J~(y)h (y))zp"'.
By (12.26) we get
x(f(x)) = x(f(y))x(1
+ !' y(y)zp"').
12. CHARACTER SUMS
321
Clearly x( 1 + zp") is an additive character to modulus p", so there exists an integer b (uniquely determined modulo p") such that (12.27)
x(1
+ zp") =
e(!: ).
Hence (12.28) Similarly we get (12.26) for the rational function g(x), whence
1/;(g(x)) = 1/;(g(y))e(ag'(y)zp-").
(12.29)
Multiplying (12.28), (12.29) and summing over the residue classes y, z modulo p" we get
S(x,1/J) =
:L*
x(f(y))'!j;(g(y))
y(mod p")
(12.30)
e(h(y)zp-").
=O(mod p") in which case it equals p". This
The inner sum vanishes unless h(y) completes the proof of (12.24). LEMMA
L
z(mod p")
D
12.3. Let q = p2 "+1 with a): 1. Then we have
S(x,1/J) = p"
x(f(y))1/J(g(y))Gp(Y) y(mod p") h(y)=O(mod p")
where Gp(y) is the Gauss sum
L
Gp(y) =
(12.31)
ep(d(y)z 2 + h(y)p-"z).
z(mod p)
Here h(y) is the rational function {12.25} but with b given by {12.35} below, and a d(y) = -g"(y)
(12.32)
2
bf"
bf'
2
+ --(y) + (p-1)-(-(vl) . 2 f 2 f
2 REMARK. As z runs mod p, so does z j2p, therefore the Gauss sum (12.31) is correctly defined.
PROOF. We write x = y + zp" with y running modulo p" subject to p fo(y)go(y), and z running modulo p"+ 1 As before we argue that
t
(12.33) Nate that the rational function ~ f" (y) has integral coefficients. This is clear for the monomial yn because n(n- 1) O(mod 2), then for any integral polynomial by linearity, and finally for a rational function f(y) = h(y)/fo(y) by the identity
=
~!"
=
1
2
~f{'fr; - JU~fa - Vd~'fa
2
+ JU6l 2 faa
By (12.33) we get (12.34)
x(f(x)) = x(f(y))x(1
f' 1f" + ( ](y)z + 2j(y)z2 p")p").
12. CHARACTER SUMS
322
Consider the function
W
+ zp")
e(p:+l
=
+ (p
-1);;).
This is a character of the subgroup of residue classes x(mod p 2 "+ 1 ) with x _ 1(mod p"). Indeed ~((1
+ zp")(1 + wp"))
=
W
+ (z + w + zwp")p") 2
2
z+ z )+-w- ) =e ( -w +(p-1 p<>+l 2p =
W
+ zp")W + wp").
Since the subgroup has order p"+ 1 and ~b are all different for distinct b modulo p"+ 1 , it follows that there exists an integer b (uniquely determined modulo p"+ 1 ) such that (12.35)
x(1
+ zp")
bz = e ( p"+l
+ (p-
bz
2
)
1)2; .
Using (12.34), (12.35) and
1/;(g(x)) =
(12.36)
lj;(g(y))e(~~~;) z+ ~g";y) z 2 )
we derive that
S(x, 1/J) =
2:.:
x(f(y)I/J(g(y))
=
Here the innermost sum vanishes unless h(y) 0 (mod p") in which case it equals p"Gp(y). This completes the proof of (12.30). 0 The Gauss sums Gp(Y) were computed in Chapter 3. If p f 2d(y), we have (12.37) The formulas (12.24) and (12.30) represent truly the final stage of computation. The only terms which are not determined on the right side are the roots of the congruence h(y) O(mod p"). However, in practice there are not many roots (essentially a bounded number) so one gets the estimate \S(x, 1/J)\ ~ cq 112 with c depending mildly on the coefficients of the rational functions f, g.
=
EXERCISE 1. Using Lemma 12.2 and Lemma 12.3 together with (12.37) evaluate the Kloosterman sum
(12.38)
S(m,n;q) =
""'* L...
e
(mx q+ nx)
x(mod q)
for q = pf3 with f3 ): 2. Suppose p (~)=(~)in which case (12.39)
f 2mn.
Show that S(m, n; q) vanishes, unless
12. CHARACTER SUMS
where £2
323
=mn(mod q).
The Kloosterman sums to modulus q = V with f3): 2 were computed first by H. Salie [Sal]. He also computed the so-called Salie sums (12.40)
T(m,n;q) =
"' L...J (x) q e (mx +q nx) x(mod q)
where (~) is the Jacobi-Legendre symbol, including the case of prime modulus. One can do it for composite moduli as well. LEMMA 12.4. Suppose (q, 2n) = 1. Then
T(m,n;q)=Eqq~(%)
(12.41)
_
L
v2=mn(mod q)
eC;).
PROOF. Consider the function F(u) = T(m, nu 2 ; q) defined for u (mod q). The Fourier transform of F( u) is
F(v)
=
L
F(u)e(- uqv)
u(mod q)
"'*
1(n)
= EqQ' -
q
L...J
e
(mx-4nXv
2 )
q
x(mod q)
by the formula (3.21) for quadratic Gauss sums. Notice that the Jacobi-Legendre symbol canceled out, therefore the last sum is the Ramanujan sum
dji(qjd). dl(4mn-v 2 ,q)
Hence by Fourier inversion
F(u) =
~
L
F(v)eCqv) =
Eqq-~ (%) LdJ1(~)
v(mod q)
dlq
L
e(uqv).
v(mod q) v 2 ,;4mn(mod d)
If ( u, q) = 1, this simplifies to
, (n) q _ "' L...J
F(u) = Eqq2
e (2uv) q .
v2=mn(mod q)
In particular, for u = 1 we get (12.41). Suppose (q, 2mn) = 1. Then T(m, n; q) vanishes unless there exists i with (12.42)
li.a..>,
£2
=mn(mod q).
D
CHARACTER SUMS
12.
324
=
Given .e, all the solutions to v 2 mn(mod q) can be written explicitly as v (rf- ss)f, where r, s run over the factorizations rs = q with (r, s) = 1. Hence the formula (12.41) can be written more explicitly as follows: (12.43)
The Kloosterman sums to prime modulus cannot be computed in elementary terms. This shouldn't be surprising in view of the results (and conjecture) concerning the distribution of angles of Kloosterman sums (see Section 21.2, in particular Theorem 21.7 and the Sato-Tate Conjecture). 12.4. Short character sums.
In this section we are dealing with character sums over a short interval (12.44)
x(n),
where X is a non-principal Dirichlet character of modulus q. By Lemma 12.1 we obtain (12.45)
/Sx(N)/ ( 2
L
a- 1 /gx(a)/
O