Introduction to Proof
Steven Spallone
PREFACE
3
Preface
What What is mathemat mathematics ics?? Two Two friend friendss and I pondere pondered d this this questi question on and cam camee up with the statement “Mathematics is pattern recognition and deduction applied to numbers and geometry.” We believe we observe phenomena of various sorts, and wish to convince convince ourselves ourselves and others others that these phenomena phenomena are real. Perhaps Perhaps we notice that multiplying two odd numbers tends to produce another odd number, or that two triangles with proportional sides tend to have the same angles as well. A curious student should want to know, not only whether only whether these these phenomena are true, but also why also why they they are so. Part of our civilization’s great heritage is the observation and proof and proof of of such things. Over Over the years years a standard standard language language has arisen, arisen, which is quite quite satisfact satisfactory ory to all but the most extreme extreme skeptic skeptics. s. It is the purpose purpose of this book to introduce students to this language and methods of mathematical proof. What What is a proof? Accord According ing to Steven Steven Krantz Krantz [ 6], “A proof “A proof in mathematics is a psychological device for convincing some person, or some audience, that a certain mathematical assertion is true.” Thus, it varies according to whom you’re telling it. For instance, instance, if I wanted wanted to convince a mathematics mathematics professor professor that “The nth derivative of xn is n!.”, I would would simply simply say that that it is true “by induct induction ion”. ”. If I wanted to prove this fact to a typical high school student, I would have to put in considerably more work. Now there is an interesting issue here. It is plausible that I could “convince” this student by demonstrating that it is true for n = 1, 2, 3, 4 and 5 and perhaps subtly intimidatin intimidating g him into acquiescen acquiescence. ce. But that would be a bad deed; of course recognizing a pattern is a different act than proving the pattern persists. In this book we strive to only give “good” proofs. The first chapter treats mathematical grammar, elementary logic, basic proof techniques such as deduction and contradiction. This is followed by a review of (naive) set theory, theory, and then mathematic mathematical al induction. induction. At this points points students students have gained some skill with proofs and are ready to learn “theory building”. In Chapter 2, we no longer assume an easy familiarity with numbers, as we plan to develop elementary number theory from very simple beginnings. We present the Peano theory of the natural numbers N, N , based on only two simple axioms and the principle principle of induction. induction. Addition Addition and multiplic multiplication ation are defined defined recursiv recursively ely and we prove everything straight through to the Fundamental Theorem of Arithmetic. Chapter Chapter 3 is a study of functions functions and relations. relations. Particula Particularly rly important important is an introduction duction to the theory of equivalence equivalence classes. classes. The chapter chapter ends with the b eginnings eginnings of cardinality theory. The problems throughout the text are a compilation from old homework, exam, and bonus problems, although I have stripped away hints and demands for rigor. I feel that it is the instructor’s place to adapt the problems to the class. The selfstudying student should be warned that many of the problems are difficult, and should not get hung up on the toughies, which I place at the end of the chapters. Much thanks are due to Ben Walter for teaching out of an earlier version of the text, text, and for several of the problems. problems. At present, present, these notes are being used by the
4
author for a course at the Indian Institute of Science Education and Research in Pune. If you find errata please e-mail them to me and I will thank you and try to update the notes appropriately. Steven Spallone
Contents Preface
3
Chapter 1. Naive Logic 1. Introduction 2. Mathematical statements 3. Implication 4. Propositional Calculus of a Single Variable 5. Exploiting Symmetry in Proofs 6. Some Game Theory 7. Sets 8. Induction 9. Chapter 1 Wrap-up
7 8 8 12 14 18 19 21 26 33
Chapter 2. Arithmetic 1. Introduction 2. The Natural Numbers N 3. The Division Algorithm 4. The Division Algorithm 5. Superlatives 6. Euclidean Algorithm 7. Strong Induction 8. Place-Value Systems 9. The Fundamental Theorem of Arithmetic 10. Chapter 2 Wrap-up
37 38 38 46 48 51 53 54 57 67 77
Chapter 3. Functions and Relations 1. Relations 2. Composition of Relations 3. Functions 4. Functions as Relations 5. Partially Ordered Sets 6. Chapter 3 Wrap-up
79 80 84 88 95 96 97
Chapter 4. Cardinality 1. Finite and Infinite Sets 2. Countable Sets 3. Uncountable Sets 4. Interlude on Paradoxes 5. Some History 6. Chapter 4 Wrap-up
99 100 106 108 111 115 115 5
6
CONTENTS
Chapter 5. Equivalence 1. Equivalence Relations 2. The Positive Rationals Q +
117 118 122
Chapter 6. Rings 1. Abstract Algebra 2. Rings 3. Abstract Linear Algebra 4. Chapter Wrap-Up
127 128 129 137 142
Chapter 7. Polynomials 1. Polynomials 2. Polynomials over a Field 3. Irreducibility in C[x] 4. Irreducibility in R[x] 5. Irreducibility in Q[x] 6. Z[x] 7. Rational Functions 8. Composition of Polynomials 9. Chapter 6 Wrap-Up
145 146 152 164 164 165 166 169 170 172
Chapter 8. Real Numbers 1. Constructing R 2. Ordered Fields 3. Decimal Expansions 4. Dedekind Cuts
175 176 176 177 180
Chapter 9. Miscellaneous 1. An ODE Proof 2. Pythagorean Triples
189 190 192
Bibliography
195
CHAPTER 1
Naive Logic
7
8
1. NAIVE LOGIC
1. Introduction 2. Mathematical statements
One of the great features of mathematics is that every problem in the subject has a correct answer. Every statement is either true or false. Here, for instance, are some mathematical statements. Do you think they are true or false? (1) (2) (3) (4)
100101 > 101 100 . 1 1 + 21 + 31 + + 10 = 3. A regular icosahedron has 30 edges. π 2 = 10.
· ··
You may or may not know whether these statements are true or false, but you should believe that there is a correct answer. Mathematicians presume that every mathematical statement is either true or false. This philosophy goes back to Aristotle, and is called the “Law of Excluded Middle”. Compare this, for instance, to the statement s“I have two hands.” and ”My dog Diogi is friendly.” I consider that these are true statements. Any normal person would agree with the first statement. But many people, and certainly many other dogs, would consider the second statement to be false. Moreover I’m afraid I would be unlikely to convince them that it is a true statement. Even with the first statement, one could imagine, for instance, a paranoid conspiracy theorist who believes that professors of mathematics have another hand hidden somewhere. Most real-world statements are subjective, or debatable on some level. One thinks of the philosopher Descartes who strives to prove that he exists! Nonetheless, the logic we develop in this text applies so well to common real-world situations that we will often spice up this chapter with nonmathematical statements . Indeed, there are many applications of logic, such as law, which benefit from this theory. The magnanimous reader will not be offended by the subjectivity of my real-world examples. What exactly is meant by the term “mathematical statement”? For the purposes of this text, it means a grammatically correct English sentence which only concerns mathematical objects. It should end with a period/fullstop. There should be a subject and a verb. (In the mathematical statements above, the verbs are “is”, “equals”, “has”, and “equals”. The audience should, in principle, understand precisely what is meant when reading it. We say that putative (mathematical) statements are not “well-formed” if they fail to impart precise meaning to their audience. So for instance,
• “3 + 7.”, • “The real number 3 + 7 is awesome.” are not well-formed statements. The first fails because it is not even grammatically a sentence (there is no verb), and the second fails because the audience presumably does not know what makes a number “awesome”. This second sentence can be
2. MATHEMATICAL STATEMENTS
9
remedied by preceding it with a sentence that explains the unfamiliar word. Consider the two statements: “We say a real number is awesome, provided that it is greater than 2. The real number 3 + 7 is awesome.” The first sentence defines the term “awesome”, and now the second statement is well-formed. Note: I will often omit the adjective “mathematical” from the word “statement”, when it is understood from context. The word “proposition” is a synonym for ”statement”, although later on it takes on the connotation of “a statement that should be proved or disproved”. There is an important distinction I want to make: If a statement is well-formed, it is true or false. So it is common to have a false well-formed statement. For instance, “The real number 3 7 is positive.” is a well-formed statement, even though it is false. When discussing logic, about half of our statements ought to be false. Please pay attention to context.
−
Note: In a more serious course in logic, meticulous care would be taken in explicating the precise rules for making “well-formed” statements. There is good reason for this, which we discuss in the Section “Interlude on Paradoxes”. However, this is an enormous endeavor that we will not take on; we will be content with “naive” logic.
2.1. Equality versus Equivalence
Let P and Q be statements. (I am using here, much as in algebra, a variable to denote an entire statement. It is not meant to be a number, or any other mathematical object; it is an entire statement.) Let us be very strict by saying that P equals Q, written P = Q, provided that they are precisely the same statement, word for word and symbol for symbol. So if P is “I like cats and dogs.”, and Q is “I like dogs and cats.”, then P = Q, although they mean the same thing. It is very easy to tell whether two statements are equal.
We will rarely worry about whether two statements are literally equal , since it is just too strict. A more useful notion is that of equivalence. We say that two statements are equivalent provided that they mean the same thing. Let us write P Q if P and Q are equivalent statements, as in the “cats and dogs” example above. Certainly P and Q are not equivalent to the statement “I hate cats.” The statements “5! > 10 2 .” and “102 < 5!.” are equivalent. The statements “ π2 = 10.” and ”π 2 < 10 or π 2 > 10” are equivalent.
≡
It is not always easy to tell whether two statements are equivalent. Equivalence is something that needs to be proved. For instance, from the statement P :“5! > 10 2 .”, one can deduce the statement Q:“4! > 20” by dividing both sides of the inequality by 5, and one can deduce P from Q by multiplying both sides by 5. Since we can deduce one from the other, we may conclude that P Q.
≡
To define “equivalence” a bit more mathematically, if we are given statements P and Q, we may consider P Q, or “P is equivalent to Q.”, as a third statement, whose truth value is given by the following table:
≡
10
P T T F F
1. NAIVE LOGIC
Q P Q T T F F . T F F T
≡
This is an example of a truth table. Given truth values (true or false) of statements, you read the table to find the truth value of the new statement. Caution: Truth tables rely fundamentally on the Law of Excluded Middle, and so applying this definition of equivalence to “real world examples” can lead to nonsense. Note that any two true (mathematical) statements are equivalent; for instance the statement “1 + 1 = 2.” is equivalent to the statement “A triangle has three sides.” Similarly, any two false statements are equivalent. Traditionally, we take “0 = 0.” as the simplest true statement and “0 = 1.” as the simplest false statement. So, any mathematical statement is equivalent to one of these. 2.2. Negation
Suppose that P is a given statement. Write Q for the statement “P is true.”, and R for the statement “P is false.” The truth table for P and Q is: P T F
Q =“P is true.” T F
We can see from this table that Q is equivalent to P , since Q agrees with P whether P is true or false. On the other hand, R is certainly not equivalent to P . R is called the negation of P ; we use the notation R = P . Another truth table gives the truth values of P in terms of those of P :
¬
P T F
¬
¬P F T
You can read this table as, “If P is true then true.”
¬P is false. If P is false then ¬P is
For instance suppose P is the statement “100 101 > 101100 .” Then P is the statement “ “100101 > 101100 ” is false”. This is equivalent to the statement “100101 < 101100 ”. If you are asked to “negate a statement”, you should find a statement equivalent to its negation which is as simple as possible. Of course this is a matter of taste as to what is simplest.
¬
Here are two statements obviously equivalent to the negation of the statement “π2 = 10.”: (1) π 2 = 10. (2) Either π 2 < 10 or π 2 > 10.
2. MATHEMATICAL STATEMENTS
11
Remark: The statement is false, so strictly speaking any true statement is a nega-
tion here. But the given negations are good answers, assuming that the audience doesn’t know the statement is false. It is always easy to give a reasonable negation of a mathematical statement, as we will see, whether or not we know it to be true.
¬
If P is a statement, then what is a good negation of P ? Call the negation Q. If P is true, then P is false, and so Q is true. Similarly, if P is false, then Q is also false. Looking over what we just said, we see that Q is equivalent to P . We have proven our first theorem:
¬
Theorem
¬ ¬ ≡ P .
1.1. Let P be a statement. Then ( P )
We can also prove this theorem more mechanically by constructing a truth table. P T F
¬P ¬(¬P ) F T
T F
.
Note that this theorem relies heavily on the Law of Excluded Middle. In everyday speech, we might say something like, “I’m not hungry but I’m not not hungry.” to indicate that the statement “I am hungry.” is neither true nor false. 2.3. Conjunction and Disjunction
Mathematicians share a very precise language. Subtle ambiguities can creep into the English language, for example with the word “or”. If you say, “Every day, Steven eats dahl or Steven drinks lassi.” , does this assertion include the possibility that on Saturday I might consume both dahl and lassi? In mathematics, we do include the possibility of both, with the word “or”. If P and Q are statements, then P Q is the statement “P or Q is true.” Here is the truth table:
∨
P T T F F
∨
Q P Q T T F T T T F F
From the table, if P is true and Q is false, then the statement ‘P or Q’ is true. If P false and Q is true, then the statement ‘P or Q’ is true. If P is false and Q is false, then the statement ‘P or Q’ is false.”
∨
If P and Q are statements, then P Q is called the disjunction of P and Q. It is a way of forming a new statement from two other statements. Next is the mathematical “and”, which is represented with the symbol combination works the way you’d expect; here is the truth table:
∧.
This
12
P T T F F
1. NAIVE LOGIC
Q P Q T T F F T F F F
∧
Thus, P Q is true only when both P and Q are true. The statement P called the conjunction of P and Q.
∧ Q is
∧
2.4. Truth Table Proofs Proposition
1.2. Let P, Q be statements. Then (P
¬ ∧ Q) ≡ (¬P ) ∨ (¬Q).
[Negation of and/or statements, tautology, absurdum]
2.5. Exercises
Use truth tables to prove the following identities, for any statements P,Q, R.
≡ ≡
(1) (2) (3) (4)
P P . (P Q) (Q P ). Prove that P Q is equivalent to (( P ) ( Q)). Prove that (P Q) R is equivalent to P (Q R). Then the same for instead of . Is it true if we replace with ? [move] (5) Find combinations of P,Q, , , which give each of the 16 possible truth values, given the four possible truth values of P and Q. (This is meant as a group exercise.) (6) Of the sixteen different combinations of P and Q from the previous problem, how many are both commutative and associative? (7) Let P ,Q, and R be statements. Prove that P (Q R) is equivalent to (P Q) (P R), and that P (Q R) is equivalent to (P Q) (P R). The Vellerman exercise
∧
≡ ≡ ∨ ∨ ∨ ∨
∧ ∨ ∧ ∨ ∧ ∨
¬ ¬ ∧ ¬ ∨ ∨ ∨ ⇒ ∧∨¬
∨ ∧
∧ ∨
3. Implication
The phrase “P implies Q”, written P Q is typically confusing for students who may confuse it with its English usage. I recommend you substitute the phrase “The truth of P implies the truth of Q.” whenever you are perplexed. As with the connectives $ and , we may define it with a truth table:
⇒
P T T F F
Q T F T F
∧ ∨ P ⇒ Q T F T T
3. IMPLICATION
13
⇒
Please note that whenever P is false, then the statement P Q is true, contrary to what you might think. So under this convention any false statement implies any other statement. The only time that P Q is false is when P is true and Q is false. Under this convention, the following statements are true:
⇒
• “If this is the year 1986, then the Earth has two moons.” • “If this is the year 1986, then the Earth has one moon.” • “If the earth has one moon, then there are 24 hours in a day.” Here is a false statement: “If there are 24 hours in a day, then the Earth has two moons.” [ converse, contrapositive. Square of Opposition ] P T T F F
Q P Q Q P T T T F F T T F T
⇒
¬P ⇒ ¬Q ¬Q ⇒ ¬P
⇒
T
T
3.1. A Syllogism
Now we will “prove” a form of deduction using truth tables. We will prove that the statement [(P
⇒ Q) ∧ (Q ⇒ R)] ⇒ (P ⇒ R)
is always true. Before doing the proof, let us apply it to the following statements.
P : “I eat an entire chocolate cake.”
Q: “I get sick.”
R: “I will not win the wrestling tournament.” Let’s say that you believe that P implies Q, and also that Q implies R. (Do you?) Then you should also believe that if I eat an entire chocolate cake, then I will not win the wrestling tournament. Combining two implications in this way one of the logical exercises going back to Aristotle, called “syllogisms”. Presumably you have already mastered this in some intuitive form. Here we prove it using truth tables.
14
P T T T T F F F F
1. NAIVE LOGIC
Q T T F F T T F F
R P Q Q R P R (P T T T T F T F F T F T T F F T F T T T T F T F T T T T T F T T T
⇒
⇒
⇒
⇒ Q) ∧ (Q ⇒ R) T F F F T F T T
[(P
⇒ Q) ∧ (Q ⇒ R)] ⇒ (P ⇒ R) T T T T T T T T
As you can see, no matter what the values of P , Q, and R, the value of the final column is always true. Reflect on this and see if you agree that this is a proof. A tautology is a formula of propositional logic, which is always true regardless of the truth value given to the “propositional variables”. The statement [( P Q) (Q R)] (P R) is a tautology. You will see more examples in the exercises below.
⇒ ∧ ⇒
⇒ ⇒
3.2. Exercises
(1) Use truth tables to check that the following are tautologies. Also explain why they are true with “common sense”. P P P P Q (P Q) ( P R) Q R [(P Q) P ] Q Remark: This last one is another famous example of a syllogism.
• ∨ ¬ • ⇒ ∨ • ∨ ∧ ¬ ∨ ⇒ ∨ • ⇒ ∧ ⇒
4. Propositional Calculus of a Single Variable 4.1. Quantifying
The first thing a math student should learn is how to write a mathematical statement. This is different from writing a mathematical expression or equation. It is important that you introduce variables with care. For example, merely writing (x + y)2 = x2 + y 2 is very bad, but not for the reason you might think. The main reason it is bad is because we have not introduced the variables “x” and “y”. Are they real numbers? Complex numbers? Matrices? A better statement, grammatically, would be For all positive numbers x and y, we have (x + y)2 = x 2 + y2 . We will call such a statement “well-formed”. It is false. For the moment, our priority is to make grammatically correct statements, which may be true or false. If a statement is not well-formed, then we do not ask whether it is true or false; we send it back to the author asking for a revision. I hope when you read this book, and other books, that you appreciate the care that is made to make well-formed statements.
4. PRO POSI TIONAL CALCUL US OF A SINGL E VARIABL E
15
Here we make a true well-formed statement: Let x = 0 and y = 2; then (x + y)2 = x 2 + y2 . Please remember to always “initialize” your variables in such a fashion. This is extremely important to do, both to communicate with your audience and also to clarify your own thinking. Here are some good ways to introduce, or “quantify” a variable:
• Give it a specific values. (“Let x = 4”.) • Allow it to be some element of some set. (“Let x be a real number.”) • Allow the variable to represent all values in a given set, i.e. (“For all real numbers x, the number x 2 is negative.”) • State that what follows is true for some value in a given set, i.e., (“There exist positive numbers x and y so that log(x + y) = log(x) + log(y).”)
The expressions “For all” and “There exist” are ubiquitous in mathematics. They are called “quantifiers”, and get the special symbols and , respectively. You should look for them, or their implication, throughout mathematics.
∀
∃
As another example, note that the constant C in Proposition 9.1 is introduced with the quantifier.
∃
∀ ∀
say f is even provided that real numbers x, we have f ( x) = f (x). We say f is odd provided that real numbers x, we have f ( x) = f (x). Definition. We
− −
−
Here are some examples (which we’ll study later more thoroughly) from the theory of factorization:
∃
Let d and n be integers. Then d divides n provided that an integer e so that de = n. Let n > 1 be an integer. Then n is prime provided that divisors d > 0 of n, either d = 1 or d = n. Definition.
∀
4.2. The First Principle of Analysis
It is important to understand how to combine these quantifiers, particularly in the area of mathematics called “analysis”. Let’s start with a benign concept: When is f a “constant function”?
∃
function f is constant provided that C a real number so that real numbers x, we have f (x) = C . Definition. A
∀
∃ ∀
The order in which the quantifiers , are used is crucial; a different order can completely change the meaning of the statement. For example, which functions f : R R satisfy:
→
16
1. NAIVE LOGIC
“For all real numbers x, there exists C a real number so that f (x) = C ”? All functions f : R
→ R do. For instance, consider f (x) = x 2 . 2
Then given a real
number x, there exists C (= x ) so that f (x) = C .
This may be confusing to you, so let’s look more carefully at these examples. Keep in mind that once you quantify a variable in a (normal) sentence, it remains quantified that way for the rest of the sentence. Let me rewrite these two sentences with parenthesis to clarify this. (1) (2)
∃ C a real number so that ( ∀ real numbers x,( we have f (x) = C )). ∀ real numbers x, ( ∃C a real number so that( f (x) = C )).
In (1), The choice of C must be made before x is introduced; it must therefore hold for all x at once. In (2) within the first set of parenthesis, the choice of x has been fixed, and we only need to choose C to work with this choice. Let’s do another example. Are the following statements true or false? (1) (2)
∀ positive real numbers x, ∃ a positive real number y so that y < x. ∃ a positive real number y so that ∀ positive real numbers x, we have y ≤ x.
Certainly (1) is true; given x > 0 we may take y = 21 x. What about (2)? Is there a positive number so small that no other positive number is less? Certainly not, and this fact is so important that we will give a formal proof. The proof will be our first example of a “Proof by Contradiction”. Proof Strategy: Proof by Contradiction (“reductio ad absurdum ”)
In mathematics, either a statement is true or it is false. There is no middle ground. Here is how a proof by contradiction works. We have a statement, and we want to prove that it is true. But instead of directly deducing the statement, we take another approach. We add as a hypothesis that the statement is false, and then logically deduce an absurd (false) statement; this is the “contradiction”. Thus we then see that the statement must be true. This is best understood through examples. 1.3. (First Principle of Analysis) Let x 0 be a real number. Suppose that for all real numbers y > 0 we have x y. Then x = 0. Proposition
≤
≥
1 x > 0. Then 21 x > 0. By hypothesis, x 2 x (since x isn’t bigger than any positive number). Since x > 0 we may divide by x to obtain 1 21 , which is absurd.
Proof. Suppose
≤
Therefore it is impossible that x > 0, and we conclude that x = 0. Read this short proof a few times to make sure you understand it.
≤
4. PRO POSI TIONAL CALCUL US OF A SINGL E VARIABL E
17
Let us give an application of the First Principle of Analysis. Many people in the world do not believe that 0.9999 . . . = 1. (Do you?) Suppose you find yourself locked in a debate with a nonbeliever; here is one argument to explain why it is so. Let x = 1 0.9999 . . .; we can call it the “niggling number”. Get your opponent to agree that x 0. Then get him to think about what the decimal expansion of x must be. With some reflection, he should agree that it will begin x = 0.000 . . ., and that it will begin with as many 0s as you like. (He will start to give up, but may still feel like something is happening “at infinite places”...) Now, if y is any positive number, it can’t be smaller than x, because it will have some nonzero decimal digit somewhere. So your gracious opponent will agree that x y for all positive y. Now you have him. You say that therefore x 21 x, and so if he still doesn’t believe that 1 x = 0 you divide by 0 and show that his way leads to the madness of 1 2 . Done.
−
≥
≤
≤
≤
4.3. Exercises
(1) Negate the statement, “There is a real number x so that for all real numbers y , we have x y.” Then prove your negated statement. (2) Negate the statement ” even integers n > 2, there exist prime numbers p1 , p2 so that n = p1 + p2 .” (3) Consider the statement, “For every real-valued function f : R R there is a constant k > 0 so that for all x R, we have f (x) kx .” Is this statement true or false? (4) Suppose that f, g : R R are functions satisfying f (x) = g(y) for all numbers x, y. Prove that f and g are both constant functions. (5) Suppose that f, g : R R are twice differentiable functions which are nowhere 0. Let u(x, y) = f (x)g(y). Suppose that
≥
∀
∈
→ ≤ | |
→ →
∂ 2 u ∂ 2 u + 2 = 0. ∂x 2 ∂y Prove that there exists a constant C so that for all x, we have f (x) = Cf (x) and g (y) = Cg (y). (You should use the previous exercise.)
−
18
1. NAIVE LOGIC
5. Exploiting Symmetry in Proofs Proof Strategy: Without loss of generality Often one is faced with two or
more possibilities in a proof. If all possibilities are symmetric, then we may say “Without loss of generality we may assume” (WLOG WMA) to assume one of these possibilities. The (implied) proofs for the other cases must be exactly the same, except for the symmetry. Here is an example of a proof with three usages of WLOG WMA. See if you understand it; after the proof we will examine the symmetry behind these usages. 1.4. Suppose the complete graph on 6 vertices has each edge colored red or blue. Then there must be either a red triangle or a blue triangle. Proposition
Proof. Label the vertices A,B,C,D,E,F as in the figure. [Sorry...please make your own figure] There are five edges from A so there are either at least 3 red edges from A, or there are at least 3 blue edges from A. (For otherwise there would be no more than 4 edges from A, a contradiction.) WLOG WMA there are at least three red edges from A. WLOG WMA that the edges AB, AC , and AD are red. Now consider the edges BC , CD, and BD. Suppose one of these is red. WLOG WMA that BC is red. Then ABC is a red triangle. If none of these edges are red, then they are all blue, and then BC D is a blue triangle. Therefore in all cases, there is a red or a blue triangle.
Let’s now examine the WLOG WMA symmetries
• If there are three blue edges from A, we may switch the roles of red/blue for the rest of the proof. This justifies the first WLOG WMA. • If three red edges from A are connected to P 1, P 2, P 3 instead of B,C, D, then we may switch the roles P 1 ↔ B, P 2 ↔ C , and P 3 ↔ D for the rest of the proof. This justifies the second WLOG WMA. • If C D is red, we may apply the permutation B → C → D → B, and if B D is red, we may switch the roles via C ↔ D. This justifies the third WLOG WMA.
5.1. Exercises
For the first 5 problems, suppose the complete graph on n vertices has each edge colored red or blue. (1) For which n is there necessarily a monochromatic triangle? (2) If n = 5 and some vertex has 4 edges of the same color, then there is a monochromatic quadrilateral. (3) If n = 6 then there is a monochromatic quadrilateral. (Harder, use the previous problem.) (4) If n = 5 and every vertex has 2 red edges and 2 blue edges, then there is a monochromatic pentagon.
6. SOME GAME THEORY
19
(5) If n = 6 there are actually at least two monochromatic triangles. (6) A magic square is an n n grid in which each of the numbers 1, 2, 3, . . . , n2 is used once and the sum of each row, column, and diagonal is the same. Find all possible 3 3 magic squares. Prove that you have done so. (What is the common sum? What must the middle square be?) (7) Suppose that 17 friends are standing in a circle holding strings so that every pair of students is sharing a string. Each string is colored either red, yellow, or blue. Prove that there exists either a red triangle, a yellow triangle, or a blue triangle in this arrangement.
×
{
}
×
6. Some Game Theory
The theory of the previous section suggests some pleasant graph games. Activity: The Game of SIM
SIM requires a piece of paper, two players, and two writing utensils with two different colors, say red and blue. First mark 6 dots, evenly distributed around a circle. Players take turns, with Player #1 going first. Players #1 uses red and Player #2 uses blue. On a given turn, two dots are connected with a straight red or blue line. You may not connect the same two dots twice. If a red triangle is created, then Player #2 wins, and if a blue triangle is created, then Player #1 wins. By Proposition 1.4, eventually one of the players will win. (Compare this to TicTac-Toe, where a game between experienced players should end in a tie.) There are many variations of SIM: one may vary the number of dots, or one may pick other shapes besides triangles, or one may play MIS, in which the first person to get a triangle (or other shape) wins instead of loses. Note that it is possible for 5-dot Sim to end in a tie, since the outcome could simply be a red pentagon and a blue pentagon. Proposition
1.5. There is a strategy for Player #1 to always win 5-dot MIS.
Proof. Write
the vertices as A,B, C, D, E . Player #1 may start with edge AB. WLOG WMA either that Player #2 plays either B C or C D. Case I: Player #2 plays BC . In this case, Player #1 plays AD, forcing Player #2
to play B D. Player #1 plays C D, then Player #2 must play AC . Player #1 now plays DE for the win, since on his/her next move, either AE or CE makes a red triangle. Case II: Player #2 plays CD. Player #1 plays AE , forcing Player #2 to play
BE . Player #1 now plays AD for the win, since on his/her next move, either DE or B D makes a red triangle. Corollary
MIS.
1.6. If n > 5, there is a strategy for Player #1 to always win n-dot
20
1. NAIVE LOGIC
Proof. Red
simply picks 5 of the n dots and follows the previous strategy.
1.7. For any n there is either a strategy for Player #1 to always win, or a strategy for Player #2 to always win n-dot SIM. Lemma
[Proof] Proposition 1.8 . For
any n there is a strategy for Player #2 to always win n-dot
SIM. [Proof: Strategy stealing] [Explanation for how this proof would fail for chess.] 1.9. For any n and any shape X , there is a strategy for Player #1 to always win n-dot X -MIS. Proposition
6.1. Exercises
(1) Find an example of a game of SIM in which the game doesnt end until the 15th edge is drawn. (2) Consider the strategy in chess (or checkers, or go) to always mirror your opponents move. Must the game end in a tie? Experiment with a friend. (3) Sometimes a master of chess (in which white moves first) will play two games with two opponents at the same time, alternating moves between the boards. Suppose you are the second of these opponents, and that the master is playing as black against you, and as white in the other game. Describe a way to play so that the master does not win both of the games. (4) Learn the games Dots and Boxes, Connect 4, Gomoku, and Hex. Like SIM, these games have variations. Find some variations where ties are impossible, and find some examples to which strategy-stealing arguments apply.
7. SETS
21
7. Sets
In 1874, Mathematics received its “Theory of Everything”, a theory on which all other theories of the time (algebra, analysis, differential equations, . . . ) could rest. This was the theory of sets, introduced by Cantor, and now taught to school children around the world. We begin this section by first explaining why we will not define the term “set” explicitly, but then introduce the basic set theory operations in terms of mathematical logic.
7.1. Definition of a Set
“A set is a collection of elements. ” This is a very well-known sentence. How do you feel about it? Do you think it makes a good definition? In mathematics, a good definition precisely introduces a word or phrase in terms of earlier established concepts. A mathematician reading this will ask, what is a ‘collection’? What is an ‘element’ ? Where are the definitions of those words? Without knowing precisely what are collections or elements, this is simply not a definition in the mathematical sense. However I’m never going to give you a definition of set. There is an insurmountable problem with trying to define the most basic objects in mathematics. To illustrate this problem I will use the Oxford Dictionary [ 13] to attempt to define the word “cat”. Suppose that I don’t know what any words in the English language mean. Definition. cat: A small domesticated carnivorous mammal with soft fur, a short snout, and retractile claws.
Hold on. Maybe I don’t know what the word ‘a’ means. Let’s look that up! Definition. a: Used when referring to someone or something for the first time in a text or conversation.
Before I notice the fact that the word ‘a’ is used in the definition of ‘a’, I am again lost because I don’t know ‘used’. Definition. used: Having
already been used.
Don’t groan. We look up ‘having’, et cetera. Definition. having: possess,
own, or hold.
Definition. possess: have
as belonging to one; own.
Definition. have: possess,
own, or hold.
Uh-oh. We are now totally stuck. The first word in the definition of ‘possess’ is ‘have’, and the first word in the definition of ‘possess’ is ‘have’. We are unable to understand the definitions of ‘possess’ or ‘have’ purely by using the dictionary, and
22
1. NAIVE LOGIC
in some sense we can never understand the other words, including ’cat’, without any of our own intuition. It’s an interesting question, to what extent a picture dictionary would help in this regard. And this is how it goes. Certain concepts we treat as “irreducible”, in the sense that we don’t have a way to define them in terms of simpler notions. The notion of a ‘set’ is an irreducible concept. Here are some other definitions of ‘set’, just to be thorough. Definition. “By
a set we mean a grouping into one entity of distinct objects of our intuition or our thought.”–Cantor “A set consists of elements which are capable of possessing certain properties and of having certain relations between themselves or with elements of other sets.”–Bourbaki Definition.
“A set is a collection of distinct objects, considered as an object in its own right.”–Wikipedia Definition.
It is interesting to look at mathematical books and see what they treat as irreducible concepts, and how they introduce these. Here are some “definitions” from Euclid’s Elements of notions that seem like they were really irreducible back then: A point is that which has no part. A line is breadthless length. A surface is that which has length and breadth only. Definition.
Anyway we’re not going to precisely define a set. We will say many things about them however. And we will otherwise try to define things as well as we can.
7.2. Basic Set Theory Notions
Sets have members; we write “x S ” if S is a set and x is an element of S . A synonym for ‘element’ is ‘member’. An example of a set is A = 2, 1, 7 . This notation means that the numbers 2, 1, and 7 are elements of A, and nothing else is an element of S . We write 3 / A to indicate that 3 is not a member of A.
∈
{ − }
−
− ∈
−
Let A, B be sets. Then B is a subset of A, written B provided that “ (x B) (x A)”. Definition.
∈
⊇
⇒ ∈
⊆ A
The notation A B means the same thing. It is like with inequalities. For example let B = 2, 1 and A = 2, 1, 7 again. Then B A.
{ −}
{ − }
⊆
The following is actually an axiom of set theory: Let A, B be sets. Then A = B provided that (A B).
Definition.
(B
⊆ A) ∧ ⊆
Equivalently, A = B means that x
∈ A ⇔ x ∈ B.
7. SETS
{ }
{
23
}
For instance, the sets 1, 2 and 1, 2, 1 are equal, even though 1 is presented twice in the expression for the second set. Of course A = B means (B = A).
¬
Let A, B be sets. Then B A provided that (B = A). In this case, B is called a proper subset of A.
⊂
Definition.
(B
⊆ A) ∧
I should warn you that many eminent authors use the notation B A to mean B A, and many use notation as written here. I like the analogy with inequality for numbers, which is why we will use the above notation in this book.
⊂
⊆
Definition. The
empty set is the set with no elements.
∅
A set is determined by the answers to the question, “Is x A?” for various x. For the empty set, the answer to the question, “Is x ?” is always “No!”.
∈∅
∈
∅⊆
Convince yourself by using the definitions that if A is any set, then A. You will need to remember that if a statement P is false, then “P Q” is true no matter what the statement Q is.
⇒
Definition.
Let A be a set. The power set ℘(A) is the set of all subsets
of A.
{ }
For example, let A = a, b , with a = b. Then ℘(A) =
{∅, {a}, {b}, A}. ∈
Let A be a set, and P (x) a statement involving members x A. Then we may form a new set x A P (x) , which is the set of all x A so that P (x) is true. For example, the set (x, y) R 2 x 2 + y 2 = 1 is the unit circle, a subset of the plane R 2 .
{ ∈ | } { ∈ |
}
∈
⊆ X . Then A ∩ B = {x ∈ X | (x ∈ A) ∧ (x ∈ B)}, A ∪ B = {x ∈ X | (x ∈ A) ∨ (x ∈ B)}, We call A ∩ B the intersection of A and B and A ∪ B the union of A Definition.
Let X be a set, and A, B
and B.
What is A What is A
∩ ∅? What is A ∪ ∅? What is A ∪ A? What is A ∩ A? What is A ∩ X ? ∪ X ?
Before the next definition, please note quietly that if X is a set, and A is a subset of X , then x X x A = A.
{ ∈ | ∈ }
⊆ X . Then Ac = {x ∈ X | x ∈/ A}. X , or X − A. What is X c ? What is (Ac )c ? What is A ∩ Ac ? What is A ∪ Ac ? Definition. Let X be a set, and A We call A c the complement of A in
24
1. NAIVE LOGIC
7.3. Set Theory Applications of Propositional Logic
[To be written.] 7.4. Exercises
(1) Give an example of a subset A of R so that the statement “(A Q) (Q A)” is false. (2) Let B = x R 1 < x . Give an example of a subset A of R so that the statement (A B) (B A) is false, and use “set builder notation” to describe A. (3) Let B = x R 1 < x . Give an example of a subset A of R so that the statement (A B) (B A) is false, and use “set builder notation” to describe A. (4) Can you find a set whose power set has exactly three elements? (5) Given objects a, b, and y and given that a, b = a, y , prove that b = y. (6) List the elements of the set
⊆ ∨ ⊆ { ∈ | − } ⊆ ∨ ⊆ { ∈ | − } ⊆ ∨ ⊆
{ } { }
{A ∈ ℘(N) | A ∪ {11, 6} = {6, 11}} .
(Here ℘(N) denotes the power set of the natural numbers.) (7) Describe three different elements of the set
{A ∈ ℘(N) | A ∩ {11, 6} = {6, 11}}.
(8) Consider the two sets
(x, z)
and
∈ R2 | ∃y ∈ R so that (x2 + y2 = 1) ∧ (y2 + z2 = 1)
(x, z)
∈ R2 | (−1 ≤ x ≤ 1) ∧ (−1 ≤ z ≤ 1) ∧ (x = ±z)
Prove carefully that these two sets are equal. (9) Let A, B,C be sets. Consider the statement A
.
∩ (B ∪ C ) = (A ∩ B) ∪ C.
(a) Use a Venn diagram to show the statement is true or untrue. (b) Reduce the statement to propositional logic. (10) Let A, B,C be subsets of some universal set X . Consider the statement A
⊆ B ∪ C c ⇔ C ∩ A ⊆ B.
(a) Use a Venn diagram to show the statement is true or untrue. (b) Reduce the statement to propositional logic. (11) Let A, B be subsets of some set X . Consider the statement A
∪ B = B ∪ (B ∪ Ac)c.
Recall that A c denotes the complement of A in X . (a) Use Venn diagrams to illustrate that the statement is generally true, or sometimes false. (b) Reduce the statement to propositional logic. (12) Let A, B be subsets of some set X . Consider the statement A
∩ B = B ∩ (Bc ∪ A).
Recall that B c denotes the complement of A in X .
7. SETS
25
(a) Use Venn diagrams to illustrate that the statement is generally true, or sometimes false. (b) Reduce the statement to propositional logic, and use a truth table to verify your answer above.
26
1. NAIVE LOGIC
8. Induction 8.1. Standard Induction
The method of induction is suggested by problems of the following type. You want to prove a proposition P (n) which involves a parameter n which is a natural number. As n varies, you get infinitely many different propositions P (1), P (2), P (3), . . . Imagine that P (n) is easy when n is small but gets progressively more complex as n grows. Then a reasonable idea is to try to prove the smaller ones first, and work your way up to the bigger ones. Suppose we have three propositions P,Q, and R. Recall that if P then P R.
⇒ Q and Q ⇒ R,
⇒
More generally, suppose we have a sequence of propositions P (1), . . . , P (n), and for every k from 1 to n 1, we can show that P (k) implies P (k +1). Then by iterating the above idea we get that P (1) implies P (n):
−
P (1)
⇒ P (2) ⇒ P (3) ⇒ · · · ⇒ P (n − 1) ⇒ P (n)
This is the basic form of induction. (The word “induction” suggests an electrical analogy. Think of each P (k) as being connected to P (k + 1) by a wire. Then if you “charge up” P (1) with veracity, the charge will eventually get to P (n).) Thus in practice, if you want to prove P (n) for all integers n then you must prove:
≥ 1 by induction,
(1) P (1) is true. (2) For all k N, if P (k) is true, then P (k + 1) is true.
∈
Step 2 has some logical complexity to it, and is often misinterpreted. You do not prove that P (k) is true. You show that if it were true, then P (k + 1) would also be true. Step 2 is usually the hardest. It’s not going to work unless you see a relationship between the various P (k). You need to see a way to make the step from each one to the next. Warning: it is not always manageable to prove a proposition P (n) with induction, as there may not be any tractable relationship present. Moreover, one can often prove a proposition directly and more simply without induction. So don’t get too carried away with this. Let’s do some examples. Proposition 1.10. For
all n
∈ N, 1 + ·· · + n = n(n2+ 1) .
Let us call the proposition P (n). It is healthy to always try writing out explicitly a few of the smaller P (n)’s. For instance 1(2) P (1) : 1 = , 2 2(3) P (2) : 1 + 2 = , 2
8. INDUCTION
27
3(4) . 2 All these are easily verified; this suggests that we have correctly interpreted the problem. Warning: P (n) is not a number! Do not say, for example, that P (2) = 3. The P (n) are always mathematical statements, never numbers. In this case they are equations. P (3) : 1 + 2 + 3 =
Now there is an obvious relationship between the P (k)’s as k grows. The left hand side of P (k + 1) is obtained from the left hand side of P (k) by adding k + 1. So step 2 goes like this: Suppose P (k) is true. Thus 1+2+
1) ·· · + k = k(k + . 2
Add k + 1 to both sides. Then 1+2+
1) · ·· + k + (k + 1) = k(k + + (k + 1) 2
is true. We do some algebra to the right hand side and deduce that (k + 1)(k + 2) 1+2+ + k + (k + 1) = 2 is true.
· ··
But this equation is exactly P (k + 1). So that’s it. We have checked that P (1) is true, and proven Step 2. Finish by writing something like “Thus by standard induction P (n) is true.” I’d like to remark that this proof is a little unsatisfying, in that it never really explains the formula. (Although it serves as a good example of induction.) There are many proofs of this important result; here is an easy one: Write S for the sum of the first n numbers. Then S = 1 + 2 +
·· · + n, S = n + (n − 1) + ·· · + 1 Adding these equations yields 2S = (n + 1) + (n + 1) +
· · · + (n + 1) = n(n + 1),
which yields the desired formula. Our next example of induction I find much more satisfying. Let us prove the power rule of calculus, that is Proposition 1.11.
If n
∈ N then dxd (xn) = nxn−1.
We will assume only the product rule for derivatives and the rule
dx dx
= 1.
28
1. NAIVE LOGIC
d before we write P (n) : “ dx (xn ) = nxn−1 ”. It is good to fodx cus first on a few small cases. P (1) is the rule dx = 1, which we have ald 2 ready assumed. P (2) is the rule dx (x ) = 2x. Why is this true? Typically
Proof. As
2
2
one writes out limh→0 (x+h)h −x , does some algebra and limit-logic to get P (2). But we want to connect P (2) to P (1) and so will instead use the product rule: d d d d d 2 dx (x ) = dx (x x) = x dx (x) + x dx (x) = 2x dx (x) = 2x. Note that the last equality uses P (1). Can we make this connection more generally? You bet; using the product rule:
·
d k+1 d k d d (x ) = (x x) = x k (x) + x (xk ). dx dx dx dx
·
We finish this off by applying P (1) and P (k): = x k + x(kx k−1 ) = (k + 1)xk . Combining all the equalities yields P (k + 1) : true by induction.
d k+1 ) dx (x
= (k + 1)xk . Thus P (n) is
8.2. Recursive Definitions
A related idea to proof by induction, is that of “recursive definition”. For example n! may be familiar to you as “the product of all numbers from one to n,” or n(n 1) 2 1. The recursive definition is:
− · ·· ·
Definition.
n! =
1 n (n
· − 1)!
if n = 1, if n > 1.
·
For example if we want to know what 3! is, the definition says it is 3 2!. This forces us to use the definition again to determine that 2! = 2 1!, and we need to look once more at the definition to find that 1! = 1. We put this all together to get 3! = 3 2 1 = 6. The reader should believe that given any positive integer n, one can in principle use this definition to compute n!.
·
· ·
n
d As another example, consider the nth derivative of a function f , dx n (f ), which may be familiar as “what you get when you differentiate f n times”. The recursive definition is
Definition.
dn f = dxn
df dx dn dxn
1
−
1
−
df dx
if n = 1, if n > 1.
The advantage of using recursive definitions is that it does not require readers to use their imagination about doing something n times. There are no “ ”s, for example; all the logic is laid out for you. This is particularly nice when these concepts gang up on you. Here is a small example.
·· ·
dn Proposition 1.12. dxn (xn ) = n!
8. INDUCTION
29
Proof. Induction
on n. The statement for n = 1 is Suppose the proposition is true for k. Then
dx dx =
1, a familiar fact.
dk+1 k+1 dk d k+1 dk (x ) = (x ) = ((k + 1)xk ), dxk+1 dxk dx dxk dn using the recursive definition of dx n and Proposition 3.7. One factors out the k + 1 and uses the inductive hypothesis: dk k = (k + 1) (x ) = (k + 1) k!. dxk
·
·
Finally, using the recursive definition of n! this is equal to (k + 1)!. We are done by induction. I hope you can see in the above example that recursive definitions mesh well with proofs by induction. The resulting proof is clean, and does not ask the reader to visualize, for example, “a sequence of exponents coming down and being multiplied, exactly as many times as the power of x, until we simultaneously have x0 multiplied the product of integers from 1 to n.” The latter, with some examples, is fine if you’re talking to someone and can’t write things down. The inductive proof is clearer and easier to check. Here are a couple more recursive definitions. Let a1 , a2 , . . . , an , . . . be a sequence of numbers. Then n
− ai =
a1
n 1 i=1 ai
i=1
is a recursive way to define “a1 + a2 +
if n = 1, + an
if n > 1.
· ·· + an”.
Also commonly used is the notation n
− · ai =
a1
n 1 i=1
i=1
ai
if n = 1, an
if n > 1.
· · ·· an”.
for the product of n numbers “a1 a2 For example,
n n(n+1) i=1 i = 2
and
n i=1 i = n!.
8.3. Induction Schemes
I’d like to codify the logic of induction from the previous section as (P (1) k(P (k) P (k + 1))) n, P (n). The first is something that you need to prove in an induction proof, and the second is the statement of induction. Thus if you can manage to prove the items in parenthesis, you have obtained the items on the right of the second .
∀
⇒
⇒ ∀
⇒
⇒
∧
⇒
Sometimes you want to tweak the rules of induction. Consider the following problem. We have an intuitive understanding that factorials grow much faster than polynomials, and want to prove that n! > n2 . Unfortunately this isn’t true for
30
1. NAIVE LOGIC
some small values of n. In fact for n = 2 and 3, the quantity n 2 is bigger than n!. For n = 4, we finally have 4! = 24 > 16 = 42 . If we therefore try to literally apply standard induction, as presented in the previous section, to the proposition P (n) : “n! > n2 ” we will fail because it is not true for P (1). So the scheme of our proof cannot be (P (1), P (k)
⇒ P (k + 1) ∀k ≥ 1) ⇒ P (n) ∀n ≥ 1.
We will instead settle for (P (4), P (k)
⇒ P (k + 1) ∀k ≥ 4) ⇒ P (n) ∀n ≥ 4.
Thus we will use the proposition for n = 4, and also prove that if P (k) is true k 4, then P (k + 1) is true.
∀ ≥
So we will get P (4) Proposition 1.13.
⇒ P (5) ⇒ P (6) ⇒ · · · ⇒ P (n − 1) ⇒ P (n).
If n is an integer greater or equal to 4, then n! > n2 .
We will proceed by the induction scheme (P (4)
∧ ∀k ≥ 4(P (k) ⇒ P (k + 1))) ⇒ ∀n ≥ 4 P (n).
The statement P (4) is true since 4! = 24 > 16 = 42 . Let’s get to work on the induction step. We need to find a relationship between P (k) and P (k + 1) which will allow us to derive one from the other. The left hand sides seem to be easiest to relate, since the LHS (left hand side) of P (k + 1) is k + 1 times the LHS of P (k). If P (k) is true, then by multiplying both sides by k +1 we see that (k +1)! > k2 (k +1). This is not P (k +1), since the RHS is not exactly (k + 1)2 . However if we can prove that k 2 (k + 1) is greater than (k + 1) 2 , then we can combine the inequalities a la (k+1)! > k2 (k+1) > (k+1) 2 to obtain P (k+1). The inequality k 2 (k+1) > (k+1) 2 reduces to k 2 > k + 1. Bear in mind that we only need to prove this for k 4. 2 This proof can be done in any number of ways; I prefer k > 2k > k + 1. The first inequality is true since k > 2 and the second since k > 1. We are done by our induction scheme.
≥
The previous paragraph included some brainstorming. It is good to present a final proof which does not include this, and is independent of the last paragraph. Logically, it can be read immediately after the statement of the proposition. Proof. We
prove the proposition by induction. It is clear for n = 4. Assuming the statement for k 4, then k! > k2 , so that (k + 1)2 > k2 (k + 1). Now as k > 2, k 2 > 2k > k + 1, and therefore (k + 1)! > (k + 1) 2 . We are done by induction.
≥
Once you know what to do, the proof need not be very long. The above proof requires a sophisticated and active reader who understands inequalities well. Here is another kind of problem. Suppose you want to convince yourself that you can integrate any power of sin(x). We’ll make P (n) the somewhat imprecise “I have a formula for an antiderivative of sin n (x).” This will be true for, say, n 0. (Is it
≥
8. INDUCTION
31
true for negative n?) Let’s do a couple. An antiderivative of sin0 (x) = 1 is given by x, and an antiderivative of sin(x) is given by cos(x). What about sin2 (x)? Here’s one approach:
−
2
sin (x)dx =
(1
2
− cos (x))dx = x
−
cos2 (x)dx.
Now integrate by parts, with u = cos(x) and dv = cos(x)dx. The latter integral becomes
2
cos (x)dx = sin(x) cos(x) +
sin2 (x)dx.
The devout calculus student will recall that we put this all together to get:
2
sin (x)dx = x
and we can solve:
−
sin(x) cos(x) +
sin2 (x)dx =
x
2
sin (x)dx ,
− sin(x) cos(x) . 2
This same basic method works to write higher powers of sin in terms of lower powers:
k
sin (x)dx =
sin − (x)(1−cos2 (x))dx = k 2
sin − (x)dx− sink−2 (x)cos2 (x)dx. k 2
Let u = cos(x) and dv = sink−2 (x) cos(x)dx. The latter integral becomes 1 1 sink−2 (x)cos2 (x)dx = sin k−1 (x) cos(x) + sink (x)dx. k 1 k 1 Putting this all together we get:
−
k
sin (x)dx =
sin − (x)dx −
and we can solve for
k
k 2
sink (x)dx:
sin (x)dx =
k
−1 k
−
1 k
−1
sin − (x) cos(x) +
1
k 1
sink−2 (x)dx
k
−1
k
sin (x)dx ,
− k1 sin k−1(x) cos(x).
Okay. So I hope that was a pleasant review of integration by parts. Where are we? We have shown that if P (k 2) is true, then so is P (k), since we can write a formula for sink (x)dx in terms of sink−2 dx. If we want the integral for sin 6 (x), for example, we can use the above to reduce to sin 4 (x) which reduces to sin2 (x), and then to 1, which we know. If we wanted the integral for sin7 (x), we can use the above to eventually reduce to sin( x), whose antiderivative has also been noted. This suggests a new induction scheme. If we’ve proven P (0) and P (1) and also proven that P (k) implies P (k + 2), then P (n) is true for all integers n 0. I will codify this as:
−
≥
(P (0)
∧ P (1) ∧ ∀k(P (k) ⇒ P (k + 2))) ⇒ ∀nP (n)
Here are some other useful induction schemes: ( a odd P (a)
∧ (∀kP (k) ⇒ P (2k))) ⇒ ∀nP (n) ( ∀ p prime P (a) ∧ ( ∀k, ≥ 2P (k) ∧ P () ⇒ P (k))) ⇒ ∀n ≥ 2 P (n) ∀
32
1. NAIVE LOGIC
Of course, not everything is an induction scheme. For example, the scheme (P (1)
∧ (∀kP (k) ⇒ P (k + 2))) ⇒ ∀nP (n)
is certainly not valid, because at no point do we obtain P (2), or P (n) for any even number n. Which schemes are valid? For now, use your common sense. Later we will give proofs for the validity of other induction schemes based on the original one. The mother of them all, though, is Strong Induction. This is the scheme (P (a)
∧ (( ∀a ≤ k < n P (k)) ⇒ P (n))) ⇒ ∀n ≥ a P (n).
You are probably a little tired of induction now so we will postpone the discussion of Strong Induction until later. 8.4. Exercises
Standard Induction
(1) Prove that if x
≥ −1 and n ∈ N, then (1 + x)n ≥ 1 + nx.
(2) We have seen that the nth triangular number tn = ni=1 i is given by t n = n(n + 1) . The nth tetrahedral number T n is defined by T n = ni=1 ti . For 2 example, T 3 = 1 + 3 + 6 = 10. Prove that the nth tetrahedral number is 1 6 n(n + 1)(n + 2). −1 i(i!) for n 1. (The convention is that 0! = 1.) (3) Prove that n! = 1+ ni=0 n (4) The nth Fermat number F n is given by the formula 2 2 + 1. For example, F 0 = 3 and F 1 = 5. Prove the following.
≥
· ·· · F n = F n+1 − 2.
F 0 F 1
(5) Suppose that A is a convex subset of the plane. This means that, whenever two points P, Q are in A, and λ is a real number between 0 and 1, then the point λ P + (1 λ) Q
·
− ·
is also in A. (These points fill out the line segment joining P and Q.) Prove that if P 1 , P 2 , . . . , Pn are n points in A, then the “centroid” 1 (P 1 + P 2 + + P n ) n is also in A. (6) Use the recursive definition of summation to prove that if x = 1, then
· ··
n
i=1
xi =
xn+1 1 . x 1
− −
9. CHAPTER 1 WRAP-UP
33
(7) Prove that you can solve the n towers of Hanoi problem in 2 n prove that this is the minimum number of moves required.
− 1. Also
Induction Schemes
(8) Write hk for the k-th Hemachandra number, starting with h0 =√ 0 and h1 = h2 = 1. Thus, hk+2 = hk + hk+1 for k 1. Let φ = 1+2 5 and √ φ = 1−2 5 . Use the induction scheme
≥
(P (1)
∧ P (2) ∧ (∀kP (k) ∧ P (k + 1) ⇒ P (k + 2))) ⇒ ∀nP (n)
to prove that hn =
(9)
(10)
(11)
(12)
− φn . √ 5
φn
In other words, check the formula is correct for n = 1 and 2, and prove that if the formula is correct for n = k and n = k + 1, then it is 2 true for n = k + 2. (Algebra tip: Show that φ 2 = φ + 1 and φ = φ + 1.) Prove that a, b N, we have h a+b = h a+1 hb + ha hb−1 . (Suggestion: Use the same induction scheme; you’re not meant to use the φ-formula from the previous problem.) a Use a scheme mentioned in the text to prove that any positive fraction b can be reduced to a fraction in which the numerator and denominator are not both even. Xuande has a pile of 4- and 5-cent postage stamps. What are all the postages he can pay? Give a proof. (Suggestion: After you figure out the answer, come up with an appropriate induction scheme.) Which of the following are valid induction schemes? Explain. (a) (P (1) ( k 2(P (k) P (k 1))) nP (n). (b) (P (1) ( k(P (k) (P (2k) P (2k + 1)))) nP (n). (c) (P (0) P (1) ( k, Z(P (k) P ()) P (k ))) n Z P (n). (d) (P (1) ( kP (k) P (k + 1) P (k + 2))) nP (n)
∀ ∈
∧ ∀ ≥ ⇒ − ⇒∀ ∧∀ ⇒ ∧ ⇒∀ ∧ ∧∀ ∈ ∧ ⇒ − ∧∀ ∧ ⇒ ⇒∀
⇒∀ ∈
9. Chapter 1 Wrap-up 9.1. Rubric for Chapter 1
In this chapter you should have learned
• to quantify your variables • proof by contradiction • how to use truth tables to prove syllogisms in propositional calculus • how to use propositional calculus to prove facts in set theory • standard induction, and other induction schemes 9.2. Toughies for Chapter 1
(1) Suppose that f : R R is a differentiable function so that f (x) = f (x) for all real numbers x. Prove that there are constants C, D R so that f (x) = Cex + De−x for all real numbers x.
→
∈
34
1. NAIVE LOGIC
(2) Let P and Q be statements. Prove that you can not form the statement “P exclusive or Q”, or P + Q, using only the symbols P,Q, , . (3) In this exercise we look at some of the Zermelo-Fraenkel Axioms of Set Theory, written in the raw form of Propositional Logic. In principle they do not need any explanation but in reality it is a challenge to understand what they mean. Can you explain? x,y,z,u can be regarded as both sets and elements. (a) x y z(x z y z). (b) x y( z(z x z y) x = y). (c) x y z( u(u z u x) z y). (d) x(x = x). (e) x y z u(u x z u z y). (f) x( x y(y x y y x)). Here is another statement apropos of Set Theory:
∧∨
∀ ∀ ∀ ∃ ∀ ∃
∀ ∃ ∈ ∧ ∈ ∀ ∀ ∈ ⇔ ∈ ⇒ ∃ ∀ ∀ ∈ ⇒ ∈ ⇒ ∈ ∃ ∀ ∀ ∈ ∧ ∈ ⇒ ∈ ∅ ∈ ∧ ∀ ∈ ⇒ ∪ { } ∈
∃z((x ∈/ x) ⇒ (x ∈ z)). Can you interpret it? (Hint: It is related to a famous paradox.) (4) Show how to inductively find antiderivatives of sin m (x)cosn (x) with m, n any integers. They may be positive, negative, or zero. What happens for fractions? (5) Say you have some statements P, Q , R, . . . and you form a formula F from these using , , and . Now make a new formula F D by turning all the into and all the into . We call F D the ”dual formula” to F . For instance, if F is the formula P (Q R), then F D is the formula P (Q R). Now you can also deal with formulas with by turning something like P Q into Q P . Show that the dual of P Q is (Q P ). Now find the duals of the following formulas: P Q P P . P (P Q). (P Q) ( P R) (Q R). ((P Q) P ) Q. These last four formulas were tautologies. If you did this correctly, then the duals you found should be ”absurda”, meaning that they are false, no matter what truth values you put in for P, Q , R, . . .. Here is the challenge. Prove that if a formula is a tautology, then its dual is an absurdum. Remark: A formula is an absurdum iff its negation is a tautology. So, if you start with a tautology T , form the dual formula T D , and then take the negation T D , you again get a tautology. Must T D be the same formula as T ? (6) The following is quoted from [1]: Let S be a set. Suppose that a subset A of S is obtained from other subsets X, Y , Z , . . . of S by applying only the operations , , c (in any order). Then the complement Ac can be obtained by replacing the subsets X , Y, Z , . . . by their respective complements, and the operations , by , , respectively, while preserving the order of the operations. This is the duality rule.
∧ ∨ ¬ ∧ ∨ ∨ ∧ ∧ ∨ ¬ ∨ ∧ ¬ ⇒ ∨¬ ¬ ⇒ • ⇔ • ∨ ¬ • ⇒ ∨ • ∨ ∧ ¬ ∨ ⇒ ∨ • ⇒ ∧ ⇒
¬
•
⇒
⇒
¬
∪ ∩
∪ ∩ ∩ ∪
9. CHAPTER 1 WRAP-UP
35
• Let A = B be an equality of subsets of the above form, and conc c c c
sider the equality A = B . If we replace A and B by the expressions obtained by applying the duality rule, and if we then replace X c , Y c , Z c . . . by X , Y , Z , respectively, and vice-versa, we obtain an equality called the dual of A = B . We can do the same for the inclusion relation A B, but then we must take care to replace by . For example, it is generally true that for any subsets X , Y , Z of a given set S , we have X Z (X Y ) Z . The dual of this relation is (check!) that, generally, X Z (X Y ) Z . What is the dual of the identity X (Y Z )c = (X Y c ) (X Z c )? Can you explain the above two points?
⊇
⊆
∩ ⊆ ∪ ∩ ∪ ⊇ ∩ ∪
⊆
∩ ∩
∩
∪ ∩
CHAPTER 2
Arithmetic
37
38
2. ARITHMETIC
1. Introduction
In this chapter we will develop the basic properties of arithmetic, using as few assumptions as possible. In Section 2 we lay down the three “Peano Axioms”, and prove from them the rules of addition and multiplication. Arithmetic starts getting really interesting when we get to the idea of division with remainder. In Section 4 we develop this concept and the related idea of a place-value system. In Section 3 we work out the theory of greatest common divisors. In particular we deal with the idea of the “greatest” and “least” element of a set. An important tool in understanding gcds is the Euclidean Algorithm, and along the way we upgrade our induction toolkit by learning Strong Induction. By Section 4 we are ready to treat the theory of prime numbers, and the Fundamental Theorem of Arithmetic. The FTA says that every number can be given unique “coordinates”, with one component for each prime number. These coordinates completely determine the multiplicative role of a number. 2. The Natural Numbers N
Admittedly we will not actually be able to construct the natural numbers N, since we need a spark of life to get going. This spark takes the form of the existence of an infinite set, which we assume has been organized into a certain “linear” shape. Assuming the presence of this shape, we will be able to define the basic operations of arithmetic and derive their basic properties. Moreover we will be able to construct the other sets out of N. 2.1. Peano’s Axioms
Children believe that there is a counting process to get to all numbers, starting with 1, in which every number is succeeded by another number. The rules are designed so that different numbers have different successors, and you never get back to 1. Let us codify this into mathematics. The natural numbers N is a set with a “successor function” N N written n n , and an “initial element” 1 N satisfying the following three properties: (INJ) For m, n N, (m = n ) (m = n). (INF) n N, n = 1. (IND) For S N, ((1 S ) ((n S ) (n S ))) (S = N). Definition.
→
∀ ∈
→
⊆
∈
⇒ ∈ ∧ ∈ ⇒ ∈
∈
⇒
Let us unravel some of these properties. The property (INJ) says that if two numbers have the same successor, then they must be the same number. It may be
2. THE NATURAL NATURAL NUMBERS NUMBERS N
easier to understand its contrapositive, which for m, n
39
∈ N, is:
⇒ (m = n ),
(m = n) n )
or that different different numbers numbers have different different successors successors.. (Later (Later we will say that a function f tion f is is injective if if (f ( f (x) = f ( f (y )) (x = y = y).) ).)
⇒
Property (INF) is simple enough; it just means that 1 is not the successor of any number. (In particular 0 / N!)
∈
To unravel the third axiom, let us make a quick definition: Definition.
Let S S be a subset of N. Call all S inductive if (n
(n
∈ S ).
∈ S ) ⇒
For example, the set of odd numbers is not inductive, since 1 is odd but 1 is not. The set of numbers greater than 100 is inductive. Then we can rewrite (IND) as: (IND) If S is S is an inductive subset of N of N,, and if 1
∈ S , then S then S = = N.
The reason this last axiom is called (IND) is that it actually allows us to use induction when proving things about N. Here’s why. Let P Let P ((n) for n for n N be a sequence sequence of propositions as in the previous section. Write S Write S = = n N P ( P (n) is true . Suppose we know that P (1 P (1)) is true. true. Then Then 1 S . Suppose Suppose that that we know know that, for all k , P ( P (k) P ( P (k ). Then S Then S is is inductive. By (IND), we deduce that all natural numbers are in S in S .. This means that P that P ((n) is true for all n, n , as desired.
∈
⇒
{ ∈ |
∈
}
Later we will deduce other induction schemes from (IND). Let us deduce our first lemma. Lemma
2.1. Let n
m = n = n..
∈ N.
If n = 1, then there is a unique number m
∈ N so that
Let S Let S = = 1 m m N . Then 1 S , and certainly if s s S then then s S , S , so that S that S is inductive. inductive. By (IND), S = N. N . Since Since n n N it is therefore in S and since n since n = 1, we conclude that n must be of the form m for some m N. N . To see uniqueness, uniqueness, suppose that n that n = m = m 1 and n and n = m = m 2 . Then m Then m 1 = m 2 so by (INJ) we see that m that m 1 = m = m 2 .
{ } ∪ { | ∈ }
Proof.
∈
∈
∈
∈ ∈
Until we have enough arithmetic to develop a place-value system, we will use Roman numerals (except often for 1 itself) for elements of N. They are well-suited for Peano arithmetic anyway anyway. Thus the first few natural numbers 1, 1 , 1 , 1 , 1 , . . . will be denoted as I, II, II, III, III , IV, IV, V, . . . . In other words,
{
}
{
Definition.
I = 1, II = I , III = II , IV = III , V = IV , VI = V , VII = VI , VIII = VII , IX = VIII , X = IX .
}
40
2. ARITHMETIC
We will also have occasion to use larger Roman numberals without comment but they will be no larger than M = X III . Remark: There is not a consensus in the mathematical community about whether 0 should be considered a natural number, and so other books may have the convention that 0 N. N . Howeve Howeverr it is an important important issue, issue, and you will need to know that we are excluding 0 in this course.
∈
Definition. The
operation of addition in N in N is defined recursively via a + n =
a (a + m)
if n = 1, if n = m = m .
Example: II+ III III = (II (II + II) II) = ((II ((II + I) ) = ((II ) ) = V . Please note that in particular, (a ( a + m) = a = a + m for all a, all a, m
∈ N.
2.2. Properties Properties of Addition Addition Theorem
2.2. For all numbers a numbers a,, b, c, we have (a + b) + c = a = a + (b (b + c).
Proof. We fix a and b and use induction on c. If c = 1, the theorem says (a + b)+1 = a = a+( +(bb +1). By definition definition of adding adding 1, this is the same as (a (a + b) = a = a + b , which is the very definition of a a + b .
Suppose the theorem is true for some c. c . Taking successors of both sides yields [(a [(a + b) + c] = [a + (b (b + c)] . The definition of addition lets us move the prime within the sums on both sides of the equation: (a + b) + c = a + (b (b + c) = a + (b (b + c ). Thus the theorem is then true for c . Lemma
2.3. For all numbers n numbers n,, we have n + 1 = 1 + n.
Proof. Exercise. Theorem
2.4. For all numbers a, b, we have a a + b = b = b + a.
Proof. We
fix a fix a and use induction on b on b.. If b b = 1 this is the lemma. Suppose a + k = k = k + + a. Taking successors gives (a (a + k ) = (k + a) . Then we have
2. THE NATURAL NATURAL NUMBERS NUMBERS N
41
a + k = (a + k ) = (k + a) = (k + a) + 1 = k + k + (a ( a + 1) = k + k + (1 + a) = k + a. The fourth and sixth equality used associativity, and the fifth used the lemma. The rest follows follows from the definition of addition addition and the inductive inductive hypothesis. hypothesis. Thus Thus we are done by induction. Theorem
∈ N. N .
2.5. (Cancellation Law of Addition) Let a,b,n Let a,b,n
then a a = b = b..
If a + n + n = b + b + n n,,
This proof is a little bit complex logically logically. Before Before writing the formal proof, let us brains brainstor torm m for a few minute minutess first. first. It will be an inductio induction n proof proof on n. The The case case P (1) P (1) is the axiom (INJ). What about P (2)? P (2)? Suppose a + II = b = b + + II. Then Then I can rewrite both sides as (a (a + 1) = (b + 1) . To this we apply (INJ) to get the equality a + 1 = b + 1, which by P (1) P (1) implies a = b. This logical logical deducti deduction on gives gives P (2). P (2). Similarly, for P for P (3) (3) we start with a + III III = b = b + III, rewrit rewritee it as (a ( a +II) = (b +II) , and apply (INJ) to get a + II = b = b + II. This shows that P (2) P (2) P (3). P (3). Now we are ready for the proof.
⇒
Proof. Induction
on n on n.. (INJ) is the case n = n = 1. Suppose the theorem is true for some k some k.. Then, suppose a suppose a+ + k = b+ b + k . This can be rewritte rewritten n as (a (a+ k) = (b + k) . Through (INJ) this implies that a + a + k k = = b + b + k k,, which by the inductive hypothesis implies that a that a = = b b.. Thus the theorem is true for all n. n .
Let a a and b and b be numbers. Then a
Note that a that a < a always; in this case x case x = = 1. Proposition
2.6. If a a < b and b b < c, then a a < c. c.
Proof. The hypotheses imply that there are numbers x and x and y y so so that a that a+ + x = b = b and b and b + + y y = = c. c . Then we compute compute that a that a + + (x ( x + y + y)) = (a + x + x)) + y + y = = b + b + y y = = c, thus a < c. Proposition
2.7. If a a < b then a + x < b + x.
Proof. Exercise.
42
2. ARITHMETIC
∈
Let a, b N. We write “ a > b” provided that b < a. We also write “ a b” provided that (a < b) (a = b). Definition.
∨
Lemma
≤
2.8. For all n
∈ N, we have 1 ≤ n.
If n = 1 we are done. Otherwise n = 1. Then by Lemma 2.1, there is N so that m = n. Thus m + 1 = n, which shows that 1 < n.
Proof.
an m
∈
Note that by this lemma, any number n is either 1 or m for some m. Lemma
2.9. (Creeping Lemma) If a < b, then a
≤ b.
Proof: Exercise. Use the previous lemma. Theorem 2.10. (Weak
Trichotomy) Let a, b
a = b.
∈ N.
Then either a < b, a > b, or
Fix b, and write P (a) for the statement of the lemma. We will induct on a, holding b constant. Lemma 2.8 gives us P (1). Now suppose P (k) is true, giving three possible cases. We will show that each of these cases leads to a case of P (k ), which will prove the theorem. If k < b then by the Creeping Lemma k b. Thus k = b or k < b. If k > b then since k > k we conclude k > b. If k = b then k > b. Proof.
≤
To prove that no more than one of these possibilities can hold, we need the following proposition. Proposition 2.11.
∀ a, n ∈ N, we have a + n = n.
Proof. Induction on n. (INF) says the proposition is true for n = 1. Suppose the proposition were not true for the successor k of some k. Then a + k = k . But then by (INJ) we would have a + k = k, which means the proposition would not be true for k. This is the contrapositive of P (k) P (k ). Thus P (k) P (k ), and we are done by induction.
⇒
Corollary 2.12.
⇒
∀ n ∈ N, ¬(n < n).
The corollary says that a natural number cannot be less than itself. Do you see why the corollary follows immediately from the Proposition? If not, go back and reread the definition of “a < b”. Theorem 2.13. (Strong
a > b, or a = b holds. Proof. Suppose
Trichotomy) Let a, b
∈ N.
Then exactly one of a < b,
a < b and a = b. Then a < a, contradicting the above corollary. Suppose a < b and b < a. Then by transitivity we have again a < a. The case a = b, a > b is similar.
2. THE NATURAL NUMBERS N
Definition.
Thus a + (b
43
− a is the number x so that a + x = b.
If a < b, then b
− a) = b = (b − a) + a by definition.
Note that this number is uniquely determined, by the Cancellation Law of Addition. Here is one of the “Associative Laws of Subtraction”: Lemma 2.14.
Let x, y,z
∈ N. If x > y , then (z + x) − y = z + (x − y).
Proof. The
calculation [z + (x
− y)] + y = z + [(x − y) + y] = z + x,
shows that z + (x
− y) is (z + x) − y.
Can you guess what the two other “Associative Laws of Subtraction” are? (They’re given as exercises.) 2.3. Properties of Multiplication
That is as far as we will go with just addition. At this point we will freely use the associative and commutative rules of addition without comment. We turn to multiplication, which is of course “repeated addition”. Definition. The
operation of multiplication in N is defined recursively
via
·
a n =
a a m+a
if n = 1, if n = m .
·
Example:
·
·
·
II III = II II+ II = (II I + II) + II = (II + II) + II = IV + II = VI . Theorem 2.15.
If a, b, n
∈ N, then we have (a + b) · n = a · n + b · n. ·
Proof. Induction on n. Since a 1 = a by the definition of multiplication, the case n = 1 reduces to a + b = a + b on both sides. Now suppose the theorem is true for k, so that (a + b) k = a k + b k.
·
·
·
To get P (k ) we have (a + b) k = (a + b) k + (a + b) = a k + b k + a + b
·
·
·
·
= (a k + a) + (b k + b)
·
= a k + b k , as desired. We are done by induction. Theorem 2.16.
If a, b
·
∈ N, then a · b = b · a.
·
·
44
2. ARITHMETIC
Proof. Exercise. Corollary 2.17.
If a, b, y
∈ N, then y · (a + b) = y · a + y · b.
Proof. Combining
the previous two results, we have y (a + b) = (a + b) y = a y + b y = y a + y b
·
Theorem 2.18.
If a, b, n
·
· ·
· ·
∈ N, then we have (a · b) · n = a · (b · n). ·
Proof. Induction on n. If n = 1 then both sides are a b. Suppose the theorem is true for n = k. Thus (a b) k = a (b k).
· ·
· ·
To get P (k ) we have (a b) k = (a b) k + a b
· ·
· · · = a · (b · k) + a · b = a · (b · k + b) = a · (b · k ),
as desired. (Can you justify each step?) We are done by induction. Proposition 2.19.
If a < b, then na < nb. Moreover n(b
−
− a) = nb − na.
Proof. There is a number x = b a so that a + x = b. Multiplying this by n yields na + nx = nb; thus nb > na and nx = nb na as claimed. Proposition 2.20. (Cancellation
−
Law of Multiplication) If na = nb, then a = b.
Proof. By
contradiction and Strong Trichotomy. If a < b, the previous propo sition and S.T. shows that na = nb. Similarly if a > b.
These are the elementary properties of addition and multiplication. 2.4. Exercises
(1) (2) (3) (4) (5) (6) (7)
Lemma 2.3, Proposition 2.7, the Creeping Lemma, and Theorem 2.16. Prove that IV II = II using the definitions. If a > b prove that a 2 > b 2 . If a > b prove that a 2 b2 = (a + b)(a b). If a > b + c prove that a (b + c) = (a b) c. If b > c and a + c > b prove that a (b c) = (a + c) b. State an recursive definition of ab for a, b N, agreeing with the usual sense. Use your definition to prove that for a, b, c N,
−
−
− − − − − ∈
−
ab ac = a b+c . (8) Prove that a bc = (ab )c .
·
−
∈
2. THE NATURAL NUMBERS N
(9) (10) (11) (12)
45
Prove that if b > 1, and r < s, then b r < b s . Prove that if b, r, s N, with b > 1 and b r = b s , then r = s. Prove that if b e = c e , then b = c. (Associativity for Exponentiation) For which a, b, c N is it true that
∈
c
a(b
)
= (ab )c ?
∈
∧∧ recursively on natural numbers via a if b = 1, a ∧ ∧b = a(a∧∧c) if b = c .
(13) Define an operation
Put the following numbers in order from least to greatest: I
∧ ∧ V, II ∧ ∧ IV, III ∧ ∧ III, IV ∧ ∧ II, V ∧ ∧ I, and II ∧ ∧ V .
46
2. ARITHMETIC
3. The Division Algorithm 3.1. Divisibility and Quotients
Divisibility is the multiplicative analogue of inequality. Let a, b N. We say “ a divides b”, “ b is a multiple of a”, and “ a b” provided that there is a number x N so that ax = b.
∈
Definition.
|
∈
Here are some basic properties of divisibility, which the reader should verify: Proposition 2.21.
Let a, b, c
∈ N. We have:
• 1|a and a |a. • If a |b and b |c, then a|c. • If a |b, then ac |bc, and conversely. • If a |b, then a |bc. • If a |b and a |c, then a |b + c. • If a |b, a|c, and b > c, then a |(b − c). Proposition 2.22. let a, b ∈ N. If a |b, then a ≤ b. Proof. The hypothesis implies that b = ac for some c Lemma 2.8, the proposition follows from Proposition 2.19.
∈ N.
Since 1
≤ c by
Let’s prove something concrete. Proposition 2.23.
II does not divide III.
|
Proof. Suppose
|
that II III. We also have II III, so by the last item in Proposition 2.21, we may conclude that II I. But this contradicts the previous proposi tion. Let a, b N so that q a = b.
Definition.
∈
q
|
∈ N with a|b.
We define b
÷ a to be the number
Note that this number is uniquely determined, by the Cancellation Law of Multiplication. For example n 1 = n for all n N. Here is a typical proposition and proof involving this definition:
÷
Proposition 2.24.
|
∈
|
|
If a b and c d then ac bd and bd
÷ ac = (b ÷ a) · (d ÷ c).
By popular demand, we give two proofs, one straightforward and another conceptual. First, the straightforward and unimaginative one. Proof. Let’s just write out all the divisibility definitions. Say aq = b and cp = d. Then aqcp = bd, so by definition, (b a) (d c) = qp = bd ac.
÷ · ÷
Next, for the student with a crisp sense of the definitions.
÷
3. THE DIVISION ALGORITHM
Proof. It
of bd
÷ ac:
is enough to show that ( b (ac)((b
Definition.
47
÷ a)(d ÷ c) satisfies the defining property
÷ a) · (d ÷ c)) = (a(b ÷ a)) · (c(d ÷ c)) = b · c.
{ ∈ N | d|n}.
Let Div(a) = d
{
}
Thus, Div(a) is the set of divisors of a. For example, Div(XV) = I, III, V, XV . Here are translations of parts of Proposition 2.21:
• {1, a} ⊆ Div(a). • If a ∈ Div(b) and b ∈ Div(c), then a ∈ Div(c). • If a ∈ Div(b), then a ∈ Div(bx). Can you translate the rest?
3.2. Including Zero
. This is a good time to append the number ‘0’ to our set of numbers. set of whole numbers N is defined as the union of the natural numbers N with a new element “ 0”. Definition. The
∈
∨ ∈
Thus if n N, then (n = 0) (n N). Now that we have two sets of numbers, we must be careful to specify whether a variable ‘ n’ is in N or N. We now describe how to extend succession, addition, inequality, multiplication, and exponentiation to N. We put 0 = 1. Note that now N fails (INF); this is okay because N = N. (Although if we just rewrote Peano’s Axioms with 0 in place of 1 we would have defined N!)
Definition. The
operation of addition in N is defined via
m + n =
the same m + n m n
if m, n N, if n = 0, if m = 0.
∈
Note that this is consistent if m = n = 0. Inequality: As before, for a, b
number x
∈
N, we define a < b provided that there is a N so that a + x = b. For example 0 < 1.
∈
The following basic properties of addition and inequality which we proved for N also hold for N: Commutativity and Associativity of Addition, Trichotemy, the Creeping Lemma. It is a bit tedious to verify that so many proposition we proved for N still hold for whole numbers, so let us just give a sample, leaving the rest to the reader. Proposition 2.25.
For a, b
∈ N we have a + b = b + a.
48
2. ARITHMETIC
∈
Proof. If a, b N, then this has already been proved. If a = 0, then by the definition above both sides of the equation are equal to b. Similarly if b = 0. Therefore a + b = b + a in all cases.
Obviously Lemma 2.8 fails and should be replaced with 0 the proof is just by two cases: either n N or n = 0.
∈
−
≤ n for all n ∈ N. Again,
∈
≥
Subtraction should now be extended to m n for m, n N satisfying m n. As before m n is defined as the number x so that n + x = m. Since we now have n + 0 = n, this gives n n = 0. Also note that x 0 = x by the same token.
−
−
−
Multiplication should give no surprise: Definition. The
operation of multiplication in N is defined via
m n =
·
the same m n 0
·
if m, n N, if (m = 0) (n = 0).
∈
∨
Commutativity, Associativity, and Distributivity, and Proposition 2.22 [fix] are still true in N and quite easy to check. Unfortunately we must give up on the Cancellation Law for Multiplication for whole numbers, since 0 0 = 0 1, for example. A subtle ramification of this is that although the equation 0 x = 0 has solutions, it does not have a unique solution, so we do not define the expression 0 0. We do have 0 n = 0 for n N . Before we forget let us treat exponentiation, which might give a mild surprise:
·
÷
÷
Definition. The
n
·
·
∈
operation of exponentiation in N is defined via
m =
the same m n 1 0
∈
if m, n N, if n = 0, if (m = 0) (n = 0).
∧
Note that this means 0 x = 0 except at x = 0, a kind of “discontinuity”. One can check that this definition satisfies the usual exponentiation rules. Remark: One reason for the definition 0 0 = 1 is for the facility of power series.
xi 00 evaluated at x = 0 gives 1 = , which suggests that i=0 i! 0! we define both 0 0 and 0! to be 1. A more philosophical reason that 00 = 1 is as follows. An expression like 00 or 1 0 , or even 0! is what’s called an “empty product”, in which “nothing” is being multiplied. Now an “empty sum” should be the neutral number for addition, which is 0, but the “empty product” should be the neutral number for multiplication, which is 1. For example, ex =
∞
4. The Division Algorithm
Here is a very important theorem about integers which lies at the heart of arithmetic.
4. THE DIVISION ALGORITHM
Theorem 2.26. (Division
Algorithm, Existence) Let a
are numbers q, r
∈ N so that b = qa + r and r < a.
49
∈ N and b ∈ N. Then there
Fix a; we want to induct on b. Write P (b) for the statement of the theorem. If b = 0 put q = r = 0. Suppose P (k) is true. Proof.
Then k = qa + r, with r < a. Then we have k = qa + r . By the Creeping Lemma, r a. If r < a, then we are done. If r = a, then by the definition of multiplication, k = q a = q a + 0. Thus P (k ) is true, so we are done.
≤
Remark: We have used the induction scheme
(P (0)
∧ P (k) ⇒ P (k )) ⇒ (P (n)∀n ∈ N).
This relates to Standard Induction as follows. If P (0) is true, then by the hypothesis, so is P (1). Then together with P (k) P (k ) we recover the hypotheses of Standard Induction, thus get P (n) for all n N. We also have P (0), so we finally get P (n) for all n N.
⇒ ∀
∈
Let us pause for a tiny application. For sanity’s sake, let 2 = II. Let n N. Then n is even provided that 2 n. On the other hand, n is odd provided that 2 n.
∈
Definition.
Lemma 2.27.
Let n
|
∈ N. Then n is odd if and only if ∃k ∈ N with n = 2k + 1.
Proof. ( ) Suppose that n is odd. Apply the division algorithm to n and 2 so that n = 2k + r with r, k N and r < 2. If r = 0, then 2 n, a contradiction. The only possibility is then r = 1, which gives n = 2k +1 as claimed. ( ) Suppose n = 2k+1 is even. Then 2 2k+1. Since plainly 2 2k, we also have 2(2k+1) 2k, thus 2 1. But this is impossible by Proposition 2.22 [fix] since 1 < 2. This contradiction shows that n must be odd.
⇒
∈
|
|
Lemma 2.28.
Let n
|
|
⇐
−
∈ N. If n 2 is even, then n is even.
Proof. We
prove this by contraposition. If n is not even, then by the previous lemma, we may write n = 2k + 1 for some k N. Then n2 = 4k2 + 4k + 1 = 2 2(2k + 2k) + 1, which by the previous lemma implies that n 2 is odd.
∈
I want to break the rules and talk about rational numbers for just a moment, to show one of the greatest mathematical proofs of all time, the irrationality of 2, as an application of the work we’ve just done.
√
Theorem 2.29. There
does not exist a rational number whose square is 2.
Proof. We prove this by contradiction. Suppose that there exists q Q with q 2 = 2. We may assume q > 0. (Why?) By Exercise 10 [fix] in Section 7 of the a a2 previous chapter, we may write q = , with a, b N not both even. Then 2 = 2, b b so that a2 = 2b2 . In particular, a2 is even. By the previous lemma, this implies
∈
∈
50
2. ARITHMETIC
that a is itself even. We may write a = 2k for some k N . Thus (2k)2 = 2b2 , and canceling 2’s gives 2k 2 = b 2 . So now b2 is even, and again this means that b is even. This contradicts the fact that at least one of a, b is odd, and we conclude that such a q must not exist.
∈
Now we return to Peano Theory. For the “uniqueness” part of the Division Algorithm, we generalize the “( )” part of the proof of Lemma 2.25. [fix]
⇐
Theorem 2.30. (Division
q 1 , q 2 , r1 , r2
Algorithm, Uniqueness) Let a
∈ N with
∈ N.
Suppose there are
q 1 a + r1 = q 2 a + r2 , and r1 , r2 < a. Then q 1 = q 2 and r1 = r 2 . Proof. The
idea is simple enough, but working it out carefully is a good benchmark for our Peano Theory. Consider Trichotomy for r1 and r2 . If they are unequal then one is greater than the other. Say r1 < r2 . By definition of subtraction, this means that q 1 a = (q 2 a + r2 ) r1 . By Lemma 2.14 [fix], q 1 a = q 2 a + (r2 r1 ). This shows that q 1 a > q 2 a, and indeed that q 1 a q 2 a = r2 r1 . By Proposition 2.19 [fix], we have a(q 1 q 2 ) = r2 r1 . The left hand side is plainly a nonzero multiple of a, thus by Proposition 2.22 [fix],
−
− −
−
a
−
−
≤ r2 − r1 ≤ r2 < a,
which contradicts Strong Trichotomy. Thus r 1 = r 2 , which implies that q 1 a = q 2 a by the Cancellation Law of Addition. By the Cancellation Law of Multiplication we conclude that q 1 = q 2 . 4.1. Exercises
(1) Prove the two cancellation laws for division: (a) If a, b, n N with n a and n b and a n = b n, then a = b. (b) If a, b, n N with a n and b n and n a = n b, then a = b. (2) Suppose b a and d c. Prove (a b) + (c d) = (ad + bc) (bd). (3) Let a, m, n N with m n. Prove that a n−m = a n am . (4) Let (hn ) denote the Hemachandra sequence. Prove that if d n then h d hn . (Use Exercise 9 in Section 7.4. [fix]) (5) Suppose a, b1 , b2 N satisfy a b 1 + b2 . Prove that there are a 1 , a2 N so that a = a 1 + a2 , a 1 b1 , and a 2 b2 . (6) Consider the set N = N , with addition and multiplication defined via
|
∈ ∈
∈
|
∈
m + n =
·
m n =
∞
| | ≤
÷
| |
÷ ÷ ÷
÷ ÷
÷
÷
≤ ≤ ≤ ∪{∞}
∈
|
| ∈
the same m + n if m, n N, if (m = )
∞ ∨ (n = ∞),
the same m n if m, n N, if (m = )
∞
·
∈
∞ ∨ (n = ∞).
Which of the following properties does addition/multiplication in N not satisfy? Commutativity, Associativity, Distributivity, Cancellation.
5. SUPERLATIVES
51
Just give counterexamples for any failing properties. Suppose we wanted to extend N further to include 0, and extending the rules of N. Clearly 0 + should be . Are there any values we can give 0 so that no further properties (other than ones you’ve discarded in the previous paragraph) fail? (7) Let a 2 and b N. Prove that there are numbers q,r, e N so that
∞
∞
≥
· ∞
∈
b = qa e + r and 0 < q < a and 0
≤ r < ae.
∈
(Follow the proof of the Division Algorithm.) (8) Let x, y N. Say that x y provided that there is an n N so that n x = y. Determine whether is transitive. In other words, if a b and b c, is it necessarily true that a c? (9) Let x, y N. Say that x y provided that there is an n N so that x n = y. Determine whether is transitive. In other words, if a b and b c, is it necessarily true that a c? (10) Prove that there is no rational number whose square is 21 . (11) Prove that if n is a whole number and n2 is divisible by 3, then n is divisible by 3. (12) Prove that there is no rational number whose square is 2, 3, 6, or 32 . (13) Prove that there is no rational number whose cube is 2.
∈
⊥
⊥ ⊥ ⊥
∈
∈ ∈
⊥
5. Superlatives
Let us discuss the notion of the “minimum” and “maximum” member of a set of numbers. Let S provided that (s S ) of S provided that (s Definition.
⊆ N. A number ∈ N is a lower bound of S ∈ ⇒ ( ≤ s). A number u ∈ N is an upper bound ∈ S ) ⇒ (u ≥ s).
For example, if S is the set of positive even numbers, then 1 and 2 are the only natural numbers which are lower bounds of S , and S has no upper bounds. If S is any subset of N, then certainly 1 is a lower bound of S . We will soon prove that every nonempty set S of natural numbers has a lower bound m which is an element of S . We call this the “minimum” of S . This is something special about natural numbers, as compared to subsets of the real numbers. Denote by R >0 the set of positive real numbers; then R>0 does not have a minimum. It has a “greatest lower bound”, namely 0, but 0 / R>0 .
∈
∅
∈ ∈
If S = , then pure logic dictates that every n N is both an upper and lower bound of S . (Can you see that?) However since n / S for all n, it obviously cannot have a minimum element. Theorem 2.31. (Well-Ordering,
is an element m
Proof. This
⊆ N be nonempty. Then there
Min Form) Let S
∈ S which is a lower bound of S.
will be a proof by contradiction. Suppose that the theorem is false. This means that no lower bounds of S are themselves elements of S . Let T
52
2. ARITHMETIC
be the set of lower bounds of S . That is, T = n
{ ∈ N | n is a lower bound of S }. Since S = ∅ , we know that 1 ∈ T (using Lemma 2.8). We will prove that T is inductive. Let n ∈ T . This means that n ≤ s for every s ∈ S . Since we are supposing the theorem is false, we cannot have n ∈ S . Therefore n = s, so n < s. By the Creeping Lemma, we know that for all s ∈ S , n ≤ s. Therefore n ∈ T . This reasoning shows that T is inductive.
By (IND) we conclude that T = N, and therefore S = . This is the contradiction, which finishes the proof.
∅
Definition. The
number m in the above theorem is called the minimum
of S , or min S .
⊂ ∈
Theorem 2.32. (Well-Ordering,
Max Form) Let S N be a nonempty subset of N which is bounded above. Then there is an element M S which is an upper bound of S . Exercise; apply the Min Form to the set of upper bounds for S . number M in the above theorem is called the maximum of S , or max S . Definition. The
Example: Let a, b
∈ N and put
{ ∈ N | bn ≤ a}. Then a is an upper bound of S , since if n ∈ S , then n ≤ bn ≤ a. Therefore S has S = n
a maximum. (How can you compute it?)
Example: Write Div(a, b) for the set of common divisors of a and b. In other
words, Div(a, b) = Div(a)
∩ Div(b), the intersection of the two sets. This set is bounded above by a, and 1 ∈ Div(a, b) so it is not empty. Therefore it has a maximum element, called the greatest common divisor of a and b.
Let gcd(a, b) = maxDiv(a, b); it is called the greatest common divisor of a and b. Definition.
{
}
{
}
Example: Div(42) = 1, 2, 3, 6, 7, 14, 21, 42 and Div(24) = 1, 2, 3, 4, 6, 8, 12, 24 , so Div(42, 24) = 1, 2, 3, 6 . It is easy to see now that gcd(42, 24) = 6.
{
}
Similarly write Mult(a, b) for the set of common multiples of a and b. Since ab Mult(a, b) this is nonempty; thus it has a minimum element.
∈
Definition.
Let lcm(a, b) = min Mult(a, b); it is called the least com-
mon multiple of a and b.
Example: Mult(42) = 42, 84, 126, 168, 210, . . . and Mult(24) = 24, 48, 72, 96, 120, 144, 168, 192, . . . , so Mult(42, 24) = 168, . . . . Therefore lcm(42, 24) = 168.
{
{
}
}
{
}
6. EUCLIDEAN ALGORITHM
53
More generally, let a1 , . . . , an be any finite set of numbers. Write Div(a1 , . . . , an ) for the set of numbers dividing each of a 1 , . . . , an , and Mult(a1 , . . . , an ) for the set of numbers divisible by each of a 1 , . . . , an . As above we may let gcd(a1 , . . . , an ) = maxDiv(a1 , . . . , an ) and lcm(a1 , . . . , an ) = min Mult(a1 , . . . , an ). 6. Euclidean Algorithm
If you are like most students, you have an old habit of thinking about the gcd of two numbers as follows. You take your two numbers, factor them, and then for each prime note the smaller exponent that occurs in the factorizations of both numbers. The exponents of primes appearing in the factorization of the gcd will be these smaller exponents. While we will eventually derive this characterization of the gcd, you should forget about it for a while for two reasons. One, it is usually inefficient to factor large numbers. Two, at this point in the course we are trying to train you to understand the logic of the definition of mins and maxes, as well as digest the theory of divisibility. Try to work through the following two lemmas, to break yourself from the aforementioned habits. Lemma 2.33.
If b = qa + r, then gcd(a, b) = gcd(a, r).
Proof. This follows from the fact that Div(a, b) = Div(a, r), which the reader should prove. Lemma 2.34.
|
If a b, then gcd(a, b) = a.
Proof. This
is a good exercise for you to do right now. Use the definition of
gcd!
These two lemmas allow us to compute the gcd of any two natural numbers. Consider, for example, a = 51 and b = 36. (Allow me to use normal notation for numbers for this example.) Applying the division algorithm yields 51 = 1 36 + 15.
·
By the first lemma, we conclude that gcd(51 , 36) = gcd(36, 15). So we have simplified the problem. Next, 36 = 2 15 + 6. Thus gcd(36, 15) = gcd(15, 6). Next,
·
· But since 3|6, we know gcd(3, 6) = 3 by the second 15 = 2 6 + 3.
Thus gcd(15, 6) = gcd(6, 3). lemma. Thus gcd(51, 36) = 3.
This is a great algorithm for computing gcd’s, and originates in the first proposition of Book VII of Euclid’s Elements. It is described therein as “antenaresis”, or “repeated subtraction”.
54
2. ARITHMETIC
There is a second phase of this algorithm, which allows us to express gcd( a, b) as the difference of a multiple of a and a multiple of b. We iteratively use the idea that if b = qa + r, then r = b qa. Thus we retrace the steps of the first algorithm, each time writing the remainder as the dividend minus the quotient times the divisor. In our present example we start with
−
3 = 15
− 2 · 6.
The next step is to start with the smaller of the underlined numbers on the right, find the equation in which it is the remainder, and use that equation to substite in a difference of larger numbers. 3 = 15
− 2 · (36 − 2 · 15),
Then combine terms. 3 = 5 15
· − 2 · 36.
Now the 15 is the smaller of the underlined numbers, so again subsitute and combine: 3 = 5 (51 1 36) 2 36.
·
− · − · 3 = 5 · 51 − 7 · 36.
This expresses 3 as the difference of a multiple of 51 and a multiple of 36. Very soon (after we learn Strong Induction), we will formally prove: Theorem 2.35. (Euclidean
are m, n
∈ N so that ma −
Algorithm) If a, b nb = d.
∈ N and d = gcd(a, b), then there
Meanwhile just practice with the algorithm; the proof won’t be much more than that. 7. Strong Induction
In Exercise 8 in Section 8.4 you were asked to prove a formula for the Hemachandra numbers, using the induction scheme (P (1)
∧ P (2) ∧ (∀k(P (k) ∧ P (k + 1) ⇒ P (k + 2)))) ⇒ ∀nP (n).
This meant that you were to verify the formula at n = 1 and 2, and then prove that if the formula is correct for two consecutive numbers, then it is true for the next number. We are now in a position to prove that this is a valid induction scheme. It comes down to the following proposition: Proposition 2.36.
Let S
⊆ N be a subset of N satisfying the following properties.
(1) 1, 2 S . (2) n N, we have (n, n + 1
∈ ∀ ∈
Then S = N.
∈ S ) ⇒ (n + 2 ∈ S )
7. STRONG INDUCTION
55
Before proving this, let me spell out the relationship with the above induction scheme. Suppose you are given propositions P (n) as in the Hemachandra situation, then let S = n N P (n) . Then knowing P (1), P (2) and knowing that P (k) and P (k + 1) together imply P (k) tells you that S satisfies properties (1) and (2). Therefore S = N and so P (n) is true for all n N.
{ ∈ |
}
∈
Let T = S c be the complement of S in N. Suppose T is nonempty. Then T has a minimum, say m = min T . Since 1, 2 S we know they are not in T . Therefore m = 1 and m = 2. Consider the numbers m 1 and m 2. Since they are both less than m they are not in T . Therefore they are in S . By property (2) m S , so it is not in T . This is a contradiction. Therefore we know T must be empty and therefore S = N. Proof.
∈
−
−
∈
I hope you get the feeling from the above proof that this “min” technique is very powerful. It seems unfit to use it for such a random-looking induction scheme. In fact, this same technique will take us all the way to Strong Induction. The idea of Strong Induction is as follows: Again you have a sequence P (n) of propositions, and know that P (1), say, is true. Suppose you can always reduce P (k) to either (1) some P () for < k, or (2) some combination of P ()s with various < k. Then you know P (n) is true for all n. This should make sense to you, because you are always decreasing n until it finally gets down to 1. You shouldn’t have to worry exactly how it decreases, just that it does. We codify the above as P (1)
∧ (∀k > 1(P (1) ∧ . . . P ( k − 1) ⇒ P (k))) ⇒ ∀nP (n)
Although you may not necessarily need all k between 1 and n, it’s simpler to suppose that you do.
7.1. The Chocolate Bar Problem
In this paragraph we step outside Peano Theory to give an example of problem resolved with Strong Induction. Suppose I have a bar of chocolate with vertical and horizontal lines dividing it into an m n grid of segments, for some m, n N. I want to break the bar into mn pieces, by breaking the bar along the lines. Let’s say that a split is the act of breaking a bar along a line to get two smaller bars. How many splits will it take?
×
Proposition 2.37. It
∈
always takes mn
− 1 splits.
56
2. ARITHMETIC
It’s not hard to come up with an algorithm that breaks up the bar in some organized fashion, and check that your algorithm takes mn 1 splits. But the stronger fact is that no matter how you break it up, it always takes the same number of splits.
−
[Someday, a diagram...] Proof. We
prove the proposition by strong induction on the number of segments, which is mn. In other words our statement P (N ) is “If a bar has N segments, then it always takes N 1 splits to separate them into N pieces.
−
This is clear for P (1), because this is the case of only one piece, so no splits are required. Suppose k > 1. Then some split is possible. Suppose the splitting is as in the diagram. Then the total number of splits required is 1 + (m1 n
− 1) + (m2n − 1) by P (m1 n) ∧ P (m2 n). Since this is equal to mn − 1 = N − 1, we have proved the inductive step. We are done by strong induction.
7.2. Proof of Strong Induction
We now prove the validity of the strong induction scheme. It’s just like the proof of the validity of the Hemachandra induction scheme. As before, our proof will use the Min Form of Well-ordering, which was proved using (IND). So really, standard induction implies strong induction. The validity of the scheme amounts, as usual, to a statement about subsets of N. Remember, the set S below should be thought of as the set of n N so that P (n) is true.
∈
Theorem 2.38. (Strong
⊆
N be a subset of N satisfying the
(1) 1 S . (2) k > 1, we have (1, . . . , k 1 S ) numbers less than k are in S , then k
(In other words, if all
following properties.
∀
∈
Induction) Let S
− ∈ ⇒ (k ∈ S ). ∈ S .)
Then S = N. Let T = S c the set of natural numbers not in S . If the theorem is not true then S is not N and therefore T is nonempty. Let m = min T . Note m > 1 since 1 S . Also note that if k < m, then k / T so k S . By hypothesis, m S which is a contradiction. Proof.
∈
∈
∈
7.3. Exercises
(1) Theorem 2.32, Lemma 2.33, and Lemma 2.34.
∈
8. PLACE-VALUE SYSTEMS
57
∈
(2) Let a N. To compute Div(a), we need to check whether each number n a divides a. Naturally we would do this in order starting with n = 1. Write Div(a) = d1 , d2 , d3 , . . . with d i < di+1 for all i. (a) Prove that a is either a perfect square, or the product of two consecutive divisors d m , dm+1 . (b) Suppose that a = dm dm+1 is the product of two consecutive divisors dm , dm+1 . Prove that
≤
{
}
·
{
÷ dm, a ÷ dm−1, . . . , a ÷ d1}.
∈
−
Div(a) = d1 , d2 , . . . , dm , a
(c) What happens if instead, d 2m = a? (This helps compute Div(a). ) (3) For each of the following pairs of numbers a, b, find d = gcd(a, b) and numbers m, n N so that ma nb = d and numbers p, q N so that pb qa = d. (a) a = 9409, b = 7081. (b) a = 165, b = 224. (4) If a = 2 in Theorem 2.35, and b is odd, what are m and n? What if a is odd and b = 2? (5) Find natural numbers x, y,z so that 35x + 15y 21z = 1. (6) Fix a number a N. Suppose S N is a set of numbers satisfying the following two properties. (a) a S . (b) Whenever n S then n S. Prove that S contains all numbers greater or equal to a. (7) Prove gcd(ac,bc) = c gcd(a, b) for a, b, c N. (8) If d = gcd(a, b) prove that gcd(2 a 1, 2b 1) = 2d 1. (Use Problem 5 from Section ?? .) (9) Prove gcd(a + b, b) = gcd(a, b). (10) Prove that ( a odd P (a) ( kP (k) P (2k))) nP (n) is a valid induction scheme. (11) Prove that every natural number can be expressed as a sum of distinct Hemachandra numbers. (12) Consider the following two player game, played using two piles of pennies: Players take turns. In each turn a player picks one pile and removes some (natural) number of pennies from that pile. The player removing the last penny wins. Prove that, as long as the two piles begin with an equal number of pennies, the second player can always win. (13) Write hk for the k-th Hemachandra number, starting with h1 = h 2 = 1. Prove gcd(hk , hk+1 ) = 1 for all k . (14) Prove that if hd hn then d n. (Suggestion: Use the previous problem, problem 6, and problem ?? in Section ?? .)
−
∈
∈
∈
·
∈ −
−
∧ ∀
|
−
⊆
∈
∀
∈
⇒
−
⇒∀
|
8. Place-Value Systems
There are very effective ways to represent numbers, using what is called the PlaceValue system. This is the way of expressing whole numbers by assembling a (finite) string of digits. Let us begin with base X.
58
2. ARITHMETIC
{
}
Our digits will be the familiar Hindu-Arabic numerals 1, 2, 3, 4, 5, 6, 7, 8, 9 in place of I, II, III, IV, V, VI, VII, VIII, IX . Thus the definition of 2 is 1 , the definition of 3 is 2 , etc. We will also use the common words “one, two, three,..., nine” to have the usual meaning.
{
}
{
}
These numerals comprise a fixed set of “digits” 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 which is used, in that order. The successor of each digits is another digit, unless the digit is equal to 9. Here is a number expressed in the most common place-value system. 2608 = 2 X3 +6 X2 +0 X1 +8 X0 .
·
·
·
·
How do you take the successor of a number in place-value language? If you have a number N represented by a string of digits dm dm−1 d1 d0 , then N is usually given by d m dm−1 d1 d0 , where d 0 is the digit coming after d 0 . That rule doesn’t work if d0 = 9, and in that case, the successor of N is usually given by dm dm−1 d1 0. And so forth. Either one reaches a digit unequal to 9, or all of the digits must be equal to 9, in which case the successor of N is of course 100 0, with m zeroes.
·· ·
· ··
· ··
· ··
(One could be more rigorous here. Any instance of “and so forth” in mathematics can be replaces with an inductive argument.) The place-value system is an invention of Indian astronomers who worked with cosmic units of time, like the kalpa , which is precisely 4, 320, 000, 000 years. Certainly this also necessitated the use of the digit 0, or sh unya ˆ to them, meaning “void”. According to the tremendous book [ 4], the place-value system was in place by 458 CE, and certainly originated during the Gupta Dynasty. After you learn to take successors, you learn to add and multiply. First you memorize addition/multiplication tables, which tell you how to add/multiply digits. Then you learn inductive algorithms for adding/multiplying entire strings. There are of course algorithms for subtracting and long division as well. All of these algorithms of arithmetic can be proved using the definition of the strings, and the distributive property. We will not do this here. A student of mathematics should lean to distinguish between the notion of number and the strings of digits we use to represent them. A good way is to learn arithmetic in another base. One can use any b > 1 to replace the role of X in the above. The digits used are instead 0, 1, . . . , b 1 . Base b = III is called ternary, and base b = X is called decimal. The ternary digits are 0, 1, 2 , and the string
{
− }
{ } 2011 = 2 · b10 + 0 · b2 + 1 · b + 1 = XXII,
in our “neutral” Roman numerals. (Note that the exponent 10 here is the usual “three”.) The first several ternary numbers are: 1, 2, 10, 11, 12, 20, 21, 22, 100, 101, 102, 110, 111, 112, 120, 121, 122, 200, 201, 202, 210, . . . You should check as above using the exponential sums that they convert into the right numbers. Note that this counting follows the same rules as for base X except that 2 is the last digit instead of 9. Also notice analogues of facts in the decimal
8. PLACE-VALUE SYSTEMS
59
system: The last digit of a ternary number is 0 iff the number is divisible by 3. What does it mean if the last two digits are zeros? As another example, say the base b = 7. With this convention, we have for instance 234 = 2 b2 + 3 b + 4, which is CXXIII.
·
·
Remark: Sometimes one writes “234 7 ” to distinguish this from the base ten expression, which would be 234 10 =CCXXXIV. But too much notation can be a headache, so we will rely on context. Ten is a conveniently sized number, and it is related to our anatomy so easy to learn as a child. But in many situations, for example computer science the bases II and powers of II are common. It is also true that certain math problems (see the exercises) are more easily solved in another base.
8.1. Practice with binary arithmetic
In this section we will compute the sum 1000
n!
n=1
with base b = II, thus in binary. Binary arithmetic is simpler than decimal in the sense that rather than needing two nine-by-nine tables of addition and subtraction, we only need to know that 1+1 = 10 and 1 1 = 1. (The rules for 0 being universal.) The sum is equal to
·
1! + 10! + 11! + 100! + 101! + 110! + 111! + 1000!. The exercise here is to try not to convert these into decimal, do the operation and convert back, but to do the entire computation in binary. We have of course 1! = 1 and 10! = 10. The next term is 11! = 11 10 1 = 11 10. We use “long multiplication”:
· ·
·
11 10 00 + 110 110
×
It should be clear from this that the rule for multiplying by 10 is to simply append a zero to the end of your digits. Similarly the rule for multiplying by 100 is to append two zeros to the end. Thus 100! = 100 110 = 11000. Next we use again “long multiplication” to compute 101! = 101 11000:
·
11000 101 11000 1100000 + 1111000
×
·
60
2. ARITHMETIC
·
The next calculation, 110! = 110 1111000, is a little more interesting, since we have some carrying of addition: 1111000 110 11110000 + 111100000 1011010000
×
Did you catch that? We used 1 + 1 = 10 in the sixth place, and 1 + 1 + 1 = 11 in places seven through nine, carrying the ‘1’ each time. Similarly with 111! = 111 1011010000: 1011010000 111 1011010000 + 10110100000 101101000000 1001110110000
·
×
Finally, 1000! = 1001110110000000. Thus the sum
1000 n=1 n!
is equal to: 1 10 110 11000 1111000 1011010000 1001110110000 + 1001110110000000 1011010010011001
as the reader should verify. Strictly speaking, this was just an illustration of basic binary arithmetic. But I’d like to point something out. Suppose we were to continue this computation, adding on successively higher factorials. All higher factorials N ! with N 1000 end in at least seven zeros, since they are divisible by 1000!. Therefore the last seven digits of the sum N 1000 is 0011001. Moreover, as n increases, n! ends in n=1 n! for N more and more zeros. Thus more and more ending digits of the sum will stabilize. For example, 10000! ends in fifteen zeros, so we conclude that the last fifteen digits N of n=1 n! for N 10000 will be the same. (In fact, they are 111101000011001.)
≥
≥
≥
This process gives us an infinite binary expansion going to the left . Does it eventually repeat? No one knows.
8.2. Subtraction and Long Division
You should be able to figure out how subtraction is done in other bases. For instance, in ternary we have
8. PLACE-VALUE SYSTEMS
−
61
201210 122212 1221
Check! Of course there was some ”borrowing” as you subtract from right to left. As usual, you can doublecheck by adding 1221 + 122212 and see whether you get 201210. Now you’re ready for long division. Base ten long division, as taught in elementary school, is a very mysterious-looking algorithm. It is a good question to ask, why does it gives the correct quotient and remainder of the Division Algorithm? We will not answer that question in this book, but instead we will demonstrate how to perform long division in some other bases. We proceed by analogy. Let us start with binary again, and in binary, divide 111011 by 101. Here’s how the final long division looks: 1011 R 100 101 111011 -101 1001 -101 1001 -101 100
Can you figure out what happened? In one way, this is easier than decimal long division, because there are only two digits involved. In the first step, we ask, is 101 1, 11, 111? And since 111 is the first part of the number 111011 which is at least as big as 111, we put the digit 1 above the third 1, where we are forming the quotient q . Then we subtract 101 from 111 to get 10. Now we bring down the digit 0. The new number 100 is still less than 101, so we put the next digit 0 in the quotient, and bring down another digit. And so on. When we are “out of digits” the remainder is 100 which we record next to the big “ R”.
≤
·
From our belief in long division, we conclude that 111011 = 1011 101 + 100. [Add a ternary example.]
8.3. Converting numbers from one base to another
I’ve presented N in this book as the set 1, 1 , 1 , . . . . Later I agreed to use Roman numerals I, II, III, IV . . . for numbers less than four thousand. That’s our “neutral” way to express numbers, even though it is not far from decimal.
{
{
}
}
{
}
It is also fine to think of N as the set 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, . . . , when we are tacitly using decimal notation. But if binary is the convention, then is is fine to think of N as the set 1, 10, 11, 100, 101, . . . . And so on for the different bases. As
{
}
62
2. ARITHMETIC
long as it’s clear what 1 is, and what the successor function is, and Peano’s axioms are satisfied, it’s just N with different notation. To understand how this works, let us work through some examples of how to convert from one base to another. The easiest is converting to decimal. The number written as 25037 in base b = 8 can be computed in decimal as: 2 84 + 5 83 + 0 82 + 3 81 + 7 80 = 10783.
·
·
·
·
·
How to convert a decimal number like 343 into binary or base 5? There are two methods. Method I: Top-down This method finds the digits from left to right.
In binary, you need to express your number as a sum of distinct powers of 2, including 20 . So, find the highest power of 2 less than your number. Here the highest power is 256 = 2 8 . Subtract it off and you get 87. The highest power of 2 less than 87 is 64 = 2 6 . Subtract to get 23. Then subtract 16 = 2 4 to get 7, which is 4 + 2 + 1. Therefore 343 = 28 + 26 + 24 + 22 + 21 + 20 . Fill in the zeros and ones to prepare to write the binary expansion: 343 = 1 28 + 0 27 + 1 26 + 0 25 + 1 24 + 0 23 + 1 22 + 1 21 + 1 20 .
·
·
·
·
·
·
·
·
·
The binary representation is then 101010111. The top-down method is less elegant in other bases. Let’s try to convert the decimal number 343 into base b = V. The highest power of 5 less than 343 is 125 = 5 3 . So we know the result will have four digits. But to figure out the first digit, we need to determine the highest multiple of 125 which is less than or equal to 343. We have 250 = 2 125 < 343 < 375 = 3 125, and so the first digit is 2. The next step is to subtract off the 2 125 to get 93. Iterating, the highest power of 5 less than 93 is 25, and the highest multiple of 25 less than 93 is 75 = 3 25. The number 18 remains. It is easy to see what to do from here; 18 = 3 5 + 3.
·
·
·
·
·
Finally we have computed that 343 = 2 53 + 3 52 + 3 51 + 3 50 .
·
·
·
·
Our conclusion is that 343 in decimal converts to 2333 in base 5. Method II: Bottom-up This method finds the digits from right to left.
The right most digit of a number N in base b is the remainder when you divide N by b. For instance, when you divide 343 by 2 we get q = 171 and r = 1. So the unit place digit is 1. You then iterate this; divide 171 by 2 to get q = 85 and r = 1. So far we have learned that the binary expansion of the decimal number 343 is: ( the binary expansion of 171 )1 = ( the binary expansion of 85 )11
8. PLACE-VALUE SYSTEMS
63
Continuing this way, we get ( the binary expansion of 42)111 = (21)0111 = (10)10111 = (5)010111 = (2)1010111 = 101010111 Note that we simply wrote (21) for “the binary expansion of the decimal number 21”, et cetera. Similarly to convert the decimal number 343 into base 5, with similar notation we compute: (343) = (68)3 = (13)33 = (2)333 = 2333 With some practice you’ll do fine. Hexadecimal Notation Bases with b > X have occasional use. Of course one
needs some symbols for digits beyond the Hindu-Arabic numerals. Let us discuss the hexadecimal system, which is base b =XVI. Here the convention is to take the union of the symbols 0, 1, . . . , 9 and the symbols A,B,C,D,E,F . We have A = 9 , B = A , . . . , F = E =XV. Thus, for instance, if we wanted to convert the hexadecimal 2FACE into decimal, we would compute, in decimal,
{
}
{
}
2 164 + (15) 163 + (10) 162 + (12) 161 + 14 = 195278.
·
·
·
·
Let’s convert 343 into hexadecimal. Following the notation above, we have (343) = (21)7 = (1)57 = 157. If remainders larger than 9 had occurred, then we would have used the letters. For instance, 26 = 1A. Converting between Non-decimal bases
Finally, how do you convert between two bases like binary and ternary, neither of which is decimal. There are multiple ways. One way is to just use the division algorithm as above, but performing it in the given base. You need to know how to do that. Another way is cheap: convert the binary number into decimal, and then the decimal number into ternary. It’s cheap but you’re less likely to make a mistake. As an example, let’s convert the binary number 1011 into ternary. Long division of 1011 by 11 in binary gives a quotient of 11 and a remainder of 10, which is 2 in
64
2. ARITHMETIC
ternary. We write this symbolically as (1011) = (11)2. Then long division of 11 by 11 of course gives remainder 0 and quotient 1. So (1011) = (11)2 = 102 in ternary. The cheap way is to note that 1011 in binary is 11 in decimal, and 11 = 1 9 + 2 1, so we get 102 again.
·
·
Just learn one way that works for now, and eventually learn how to perform long division in other bases.
8.4. Existence and Uniqueness of Place-value notation
Let us be more formal now and prove that any whole number can be written uniquely in “place value notation” with any integer b > 1 as a base. The existence proof will use the idea of the “bottom-up” approach above, since anyway what we’re doing is converting numbers into a given base place-value system. For the uniqueness proofs, both “bottom-up” and “top-down” proofs are given. Let b d < b.
Definition.
with 0
≤
∈ N with b > 1.
A base-b-digit is an integer d
∈ N
Proposition 2.39. (Existence
and let N
∈
of Place-value Representation) Fix a number b > 1, N. Then there is a number m N and base-b-digits d0 , . . . , dm so that
∈
m
N =
di bi .
i=0
Proof. Strong Induction on N . For N = 0, 1 let m = 0 and d0 = 0, 1. Assuming the proposition for numbers less than N , apply the division algorithm to N and b, to yield N = qb +r with r a base-b-digit. In fact, r will be the units digit of ˜ i N . Since b > 1, we know q < N . So using the inductive hypothesis, q = m i=0 di b for base-b-digits d˜i . Since N = qb + r we have
m
N =
d˜i bi+1
+ r.
i=0
If we now define d 0 = r and d i = d˜i−1 for i
≥ 1, we have
m+1
N =
di bi ,
i=0
as required.
Proposition 2.40. (Uniqueness
of Place-value Representation) Let b be a natural number greater than 1. Suppose that there is a number m N and base-b-digits m i i d0 , d1 , . . . , dm and d0 , d1 , . . . , dm so that m i=0 di b = i=0 di b . Prove that di = d i for all 0 i m.
≤ ≤
∈
I give two proofs of this: one “bottom up” and the other “top down”.
8. PLACE-VALUE SYSTEMS
65
Proof. (Bottom up) Standard Induction on m. If m = 0 it is clear, since there is only d 0 = d 0 . The remainder of the LHS (resp. RHS) upon division by b is d0 (resp. d0 ). By the Uniqueness part of the division algorithm, we must have d0 = d0 . Now subtract this remainder from both sides and divide by b. The result is:
−
m 1
−
m 1 i
di+1 b =
i=0
di+1 bi .
i=0
Note there is one fewer digit. By the inductive hypothesis, d i = d i for 0 < i Therefore all the digits are equal, as claimed. We are done by induction.
≤ m.
For the “top-down” proof, the idea is to successively show that the highest digits are equal, and then cancel them one by one. If the highest digit on one side is bigger than the other, then surely there is some contradiction. The key is the following lemma: Lemma 2.41.
Let d0 , . . . , dm−1 be base b digits, and c
−
m 1
di 10i < c 10m .
·
i=0
Write b − for the digit b − = b Proof. In
∈ N. Then
− 1.
fact this is seen by the inequalities m 1
−
m 1
di b
i
i=0
− ≤ −· − − · bi
b
i=0
m 1
<
= b
b
bi
i=0 m
≤ c · bm . We are using the “successor” rule that ( b− · ·· b− ) = b0 · ·· 0, where the number of − 0’s is equal to the number of b s.
Now we give our second proof: Proof. (Top
down proof of Proposition 2.40)
Standard Induction on m. If m = 0 it is clear, since there is only d 0 = d 0 . Suppose that d m = d m . We may assume that dm < dm . But then, subtracting dm bm from both sides gives
−
m 1 i=0
di bi = (dm
− dm)bm +
−
m 1 i=0
di bi
≥ (dm − dm)bm.
Putting c = d m dm we see this contradicts the lemma. Therefore d m = d m . We −1 i m−1 i now subtract d m bm = d m bm from both sides to get m i=0 di b = i=0 di b .
− ·
·
66
2. ARITHMETIC
Note there is one fewer digit. By the inductive hypothesis, d i = d i for 0 Therefore all the digits are equal, as claimed. We are done by induction.
≤ i < m.
8.5. Exercises
(1) In base b = III, perform long division to divide 21110210 by 21. (2) For n N write S n for the sum of the digits (base X ) of n. Prove that if a, b N then 9 (S a + S b S a+b ). Is the analogous statement true in any base b? (3) Prove that, if n > 2 there is no solution to n x + ny = nz with x,y, z N. (Suggestion: First think about n =X. Then think about the problem in a general base n.) (4) Let a, b N. Prove that 2a 1 2b 1 if and only if a b. (Hint: think in binary) (5) Let a, b N and suppose r is the remainder when you divide b by a. Show that 2r 1 is the remainder when you divide 2 b 1 by 2a 1. (6) Compute the sum
∈
∈
|
−
∈
∈ ∈ −
− | −
|
−
−
100
n!
n=1
in the base b = III. (Note that 100 = IX.) Show all of your work; you should start by writing out the little addition/multiplication tables base III. (7) Write each of the numbers CDLIII, DCLXXVII, and CMXI in the three bases b = II,IV,and VIII. What is an easy way to convert a binary expansion into a base IV expansion and a base VIII expansion? (If you don’t see the pattern, make more examples.) (8) Base ten long division, as taught in elementary school, is a very mysteriouslooking algorithm. Explain why it gives the correct quotient and remainder of the Division Algorithm. Please note: I’m not just asking you to articulate the algorithm. The problem is to explain “why”, not “how”. (9) Let N N. Prove that there is a number m N and digits d 0 , . . . dm with dn N, d n n so that
∈
∈
∈
≤
m
N =
di i!.
i=0
Are these digits unique? (Hint: Recall Exercise 3 in Section 8.4.)
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC
67
9. The Fundamental Theorem of Arithmetic 9.1. Euclidean Algorithm II
Now we turn to some unfinished business. In a previous section we showed that if a = 51 and b = 36, then d = gcd(a, b) = 3 and moreover that 3 = 5 51 7 36. Are the factors 5 and 7 unique here? In fact, by adding and subtracting 51 36, we also get the solution 3 = (5+36) 51 (7+51) 36. We can iterate that idea to get the solutions 3 = (5+36 k) 51 (7+51k) 36 for any k N. But it doesn’t stop there; one can go the other direction as well. You and I know that there are such things as negative integers, and if we let, for instance k = 1, then we are led to the solution 3 = (5 36) 51 (7 51) 36, which we rewrite quickly as 3 = 44 36 31 51,
· −· ·
· −
∈
−
· −
− · − ·
· − −
·
·
·
before anyone notices we used negative numbers. Let us prove the following proposition: Proposition 2.42.
that (ma
−
∈
Let a, b N and gcd(a, b) = d. Then there are m, n nb = d) (nb ma = d).
∨
−
∈ N so
Because of the trick from the previous paragraph, it is actually true that you can express d in both ways. However for the main part of the proof, one first realizes one way or the other. How will we prove this important theorem? It will be an induction proof. One needs to locate an integer quantity that will strictly decrease as we iterate the algorithm. We use the remainders that occur in each iteration of the division algorithm. Proof. We
proceed by strong induction on a. Write P (a) for the statement of the proposition. If a = 1, then d = 1 and 1 a 0 b = 1.
· − ·
Now we assume a > 1 and assume P (< a). By the division algorithm, b = qa + r for some q, r N with 0 then a b, so gcd(a, b) = a and we may use 1 a 0 b = a.
∈ · − ·
÷
≤ r < a.
If r = 0
∃
∈
If r > 0, then we have gcd(a, r) = gcd(a, b) = d. We may apply P (r). So m0 , n0 N so that m0 a n0 r = d n0 r m0 a = d. Eliminating the r gives (m0 a
−
∨
−
− n0(b − qa) = d) ∨ (n0(b − qa) − m0a = d);
thus ((m0 + n0 q )a
− n0b = d) ∨ (n0b − (n0q + m0)a = d).
This proves P (a), and so we are done by induction. We now show that the “or” is really not required:
68
2. ARITHMETIC
Proof. (of
Theorem 2.35)
Suppose we have m, n N so that nb ma = d. Pick k large enough so that bk and ak n (for instance k = max(m, n)). Then (bk m)a (ak n)b = d.
∈
≥
−
−
−
−
≥ m
9.2. Euclidean Applications
Euclid’s Algorithm has vast application. Definition. Numbers a and b are
called relatively prime provided that
gcd(a, b) = 1. The word “coprime” is a synonym of “relatively prime”. If two numbers a and b are relatively prime, then by the Euclidean algorithm ma nb = 1 for some choice of m, n. If x is any number, we may multiply this equation by x to obtain (xm)a (xn)b = x; therefore any number may be written as an “integral combination” of a and b.
−
Proposition 2.43.
Let a, b, c
|
a c.
−
∈ N and suppose that a|bc, and gcd(a, b) = 1. Then ∈
Proof. By
−
the Euclidean Algorithm, there exist m, n N so that ma nb = 1. Then mac nbc = c. Since a mac and a nbc, we see that a divides the LHS. Thus a divides the RHS, which is c.
−
|
Proposition 2.44.
Let a, b
|
∈ N be relatively prime. Then Mult(a, b) = Mult(ab).
Proof. It should be clear that Mult( ab) Mult(a, b). Let µ Mult(a, b). By the Euclidean algorithm, there are numbers m, n so that ma nb = 1. Therefore maµ nbµ = µ. Because µ is a common multiple of a and b it is easy to see that aµ and bµ are divisible by ab. Therefore the LHS and hence µ is a multiple of ab.
⊆
−
Proposition 2.45. Suppose
−
∈
a and b are relatively prime, and both divide some
number c. Then ab c.
|
Proof. By
the Euclidean Algorithm, there are numbers m and n so that ma nb = 1. Multiply this by c to get mac nbc = c. Using the hypothesis we see that both terms of the left hand side are divisible by ab, thus the right hand side is as well.
−
−
Proposition 2.46.
Let a, b
∈ N. Then Div(a, b) = Div(gcd(a, b)). −
Let d = gcd(a, b). We have ma nb = d for some numbers m, n. If c is a common divisor of a and b, then c divides the left hand side, and therefore c d. Conversely, since d Div(a, b), any divisor of d is also in Div(a, b). Proof.
∈
|
Here is a sort of converse to the Euclidean Algorithm. Proposition 2.47.
prime.
If a,b,m,n
∈ N and ma − nb = 1, then a and b are relatively
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC
69
Proof. Let d be a common divisor of a and b. Then d divides the LHS and therefore d 1. It follows that Div(a, b) = 1.
|
The reader should contrast the following definitions, which will be referred to in the exercises: Let a1 , . . . , an N. We say they are pairwise coprime provided that for all i = j, gcd(ai , aj ) = 1. We say they are relatively prime provided that gcd(a1 , . . . , an ) = 1. Definition.
∈
For example, the three numbers 10, 21, 121 are pairwise coprime and relatively prime, and the three numbers 6, 15, 35 are relatively prime but not pairwise coprime. Here are some propositions which use these notions. We will not need them in the sequel, so we just present them as exercises. Proposition 2.48.
∈
c1 , . . . , cn
If a1 , . . . , an N so that
∈ N are relatively prime, then there are numbers
− n 1
c i ai
i=1
Proposition 2.49.
− cnan = 1.
If a 1 , . . . , an
∈ N are pairwise coprime, then Mult(a1, . . . , an).
9.3. Exercises
∈
(1) Let a, b1 , . . . , bn N and suppose that gcd(a, bi ) = 1 for all i. Prove that gcd(a, b1 bn ) = 1. (2) Let a, b N. Proposition 2.42 gives numbers m, n N so that either ma nb = d or nb ma = d. Prove that if one follows the Euclidean Algorithm, then actually m b and n a. (Strong Induction; follow the proof of Proposition 2.42.) (3) Let a, b N be relatively prime. Let N (a 1)(b 1). By the Euclidean Algorithm we know there are m, n N so that ma nb = N . The goal of this exercise is to prove that there exist c, d N so that ca + db = N . (The class size problem.) Let k0 = max k N m bk . Prove that ak0 n. (Hint: Keep the Creeping Lemma Handy) Modify m, n using k 0 to get c, d as desired. (4) Let a, b N be relatively prime. Prove that there do not exist m, n N with ma + nb = ab a b. (Thus, using the previous exercise, ab a b is the largest such number.) (5) Given a natural number n, write φ(n) N for the number of integers from 1 to n which are relatively prime to n. For example φ(12) = 4 since there are four such numbers: 1, 5, 7, 11 . Compute φ(n) for all the numbers n from 1 to 25. Is it true that φ(mn) = φ(m)φ(n) for all m, n N? (6) Prove that if a and b are relatively prime, and a bc, then a c. (7) Let a, b N. Prove that lcm(a, b) divides every element of Mult(a, b). (8) For which pairs of numbers d, µ do there exist a, b N so that d = gcd(a, b) and µ = lcm(a, b)? (9) Let d = gcd(a, b). Prove that a d and b d are relatively prime.
−
· ·· ∈
∈
−
≤
∈
∈
≤ ≥ −
− ∈ { ∈ | ≥ }
≥
∈
−
∈ − −
− −
∈
{
}
| ∈
∈
÷
÷
|
∈
70
2. ARITHMETIC
(10) Let d = gcd(a, b) and µ = lcm(a, b). Prove that ab = dµ. [Suggestion: First show ab d Mult(a, b). Then show that if ν Mult(a, b) then ab d divides ν . You may use the previous exercise.] (11) Let a, b N be relatively prime. Suppose that there are m1 , m2 , n1 , n2 N so that m1 a + n1 b = m 2 a + n2 b.
÷ ∈
÷
∈
∈
∈
Suppose m2 m1 . Prove that there is an integer k N so that m2 = m1 + bk. Prove that gcd(a,b,c) = gcd(a, gcd(b, c)). Use this to compute gcd(290177, 241133, 190747). Prove that lcm(a,b,c) = lcm(a, lcm(b, c)). Proposition 2.48 and Proposition 2.49. Prove gcd(a + b, b) = gcd(a, b).
≥
(12) (13) (14) (15)
∈
9.4. Ords
If m > 1 and n are natural numbers we want to define ord m (n) to be the maximum number of times m divides n. For example, ord3 (18) should be 2. Suggestively, 18 = 2ord2 18 3ord3 18 . In fact, we will later prove that these ords are the exponents that occur in prime factorizations of numbers. Let us quickly check that these maximums exist. Remember, a set is guaranteed a maximum as long as it has some upper bound, and is nonempty. Proposition 2.50. i N; mi n .
{∈
∈ N.
Let m > 1 and n
|}
Then n is an upper bound for the set
will prove by induction on n that mn > n. Once we have done this, if i is in this set, then m i n < m n , which by Exercise 9 in Section 2.4 implies that i < n. Therefore n will be an upper bound of the set. Proof. We
≤
The case n = 1 is obvious. Suppose we know that mk > k. Then multiplying both sides by m we see that m k+1 > mk. Since m > 1 we know that mk > k and therefore mk k + 1. Appending this to the previous inequality we conclude that mk+1 > k + 1. Thus we are done by induction.
≥
Certainly 0 is in this set, so it is nonempty, and since it is bounded above, it has a maximum. We can therefore make the following definition: Definition.
Let m > 1 and n
∈ N. Then ord m(n) = max{i ∈ N; mi|n}
For example, ord6 (12) = 1, ord6 (100) = 0, and ord2 (48) = 4. Remarks: (1) This terminology comes from the world of analysis, where “ord” means “order of vanishing”. For example, the order of vanishing of f (x) =
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC
71
x2 (x + 1) at x = 0 is 2, and at x = 1 is 1, so one would say ord 0 (f ) = 2, ord−1 (f ) = 1 and ord1 (f ) = 0. (2) We will avoid defining ordm (0), but the obvious choice is ord m (0) = .
−
∞
The following is a convenient reformulation of the definition of ord m : Let m > 1 and n ∈ N. Then ordm (n) = i if and only if there ∈ N so that n = miu and m u.
Proposition 2.51.
is a number u
Proof. ( ) Suppose ord m (n) = i. Then mi n, say mi+1 n, contradicting the maximality of i. Thus m u.
|
⇒
|
n = mi u. If m u, then
|
( ) If n = m i u then ordm (n) i. If mi+1 n we would have m u. So since m u, ordm (n) < i + 1 and therefore ord m (n) = i.
⇐
≥
|
|
The following is comparable to the triangle inequality in analysis:
∈
Proposition 2.52.
Let m > 1 and a, b N. Then ordm (a+b) Moreover if ord m (a) < ord m (b) then ord m (a + b) = ordm (a).
≥ min(ordm(a), ordm(b)).
Before working through this proof the reader should try a few examples. Proof. Let i = ordm (a) and j = ordm (b). By the previous proposition we may write a = m i u and b = m j v, with m u, v. If i j then
≤
a + b = m i (u + vm j −i ).
If i < j it is easy to see that n (u + vm j −i ), so that ord m (a + b) = ordm (a) by the previous proposition. If i = j then the same equation shows that m i (a + b). Since ordm (a + b) is the maximum of such exponents, we conclude that ord m (a + b) i. Of course the case i j is similar.
|
≥
≥
9.5. Prime Numbers
For every number n N, it is easy to see that 1 , n these are the only elements of Div(n).
∈ Div(n).
∈
For some numbers,
Let n > 1 be a natural number. We say that n is prime provided that Div(n) = 1, n . We say that n is composite provided that it is not prime. Definition.
{ }
For example 8675309 and 314159 are prime. Thus a number p > 1 is prime if whenever d p, then d = 1 or p. Another way to say this is as follows. One might call a divisor d of n a “proper divisor” if 1 < d < p. A number is composite iff it has a proper divisor. Then, p is prime if and only if it does not have a proper divisor.
|
Remark: The number 1 is neither considered a prime nor a composite. We do have a name for such things; it is called a “unit”.
72
2. ARITHMETIC
Activity: Prime Number Bee
Have all the students all stand up, and pick some order to go around the class. Successive students must recite the prime numbers 2, 3, 5, . . .. Any student who gives a composite number, skips a prime, or takes more than ten seconds must sit, and the last one standing receives a prize. Proposition 2.53. The
number 2 is prime.
∈
Proof. If d N divides 2, then 1 Div(2) = 1, 2 , and so 2 is prime.
{ }
Proposition 2.54.
≤ d ≤ 2. Therefore d = 1 or d = 2. Thus
If N
∈ N with N > 1, then N has a prime factor. ≥
Proof. Strong Induction on N 2. By the previous proposition we have the base case N = 2. Indeed, if N is prime then we are done. Otherwise, N is composite, so it factors in some nontrivial way, say N = de, with d,e < N . By the inductive hypothesis, d has a prime factor, which is therefore also a prime factor of N .
The next theorem has one of the most famous and treasured proofs in mathematics. It goes back to Euclid’s Elements. Theorem 2.55. There
are infinitely many prime numbers.
Proof. By
Contradiction. Suppose there are only finitely many p1 , . . . , pm . Consider the number N = p1 pm + 1. By the previous proposition N must have a prime factor. This prime factor must be one of the p i and therefore divides the LHS of N p1 pm = 1, which is of course a contradiction.
···
− · ··
Remark: Note that the number N = 2 3 5 7 11 13 + 1 = 30031 is not prime,
· · · · ·
since 30031 = 59 509. The argument of the proof does not imply that N is prime, only that it is divisible by a prime not in the list.
·
Proposition 2.56.
Let N N with N > 1. Then there is an r N and prime numbers p1 , . . . , pr (not necessarily distinct) so that N = p1 p2 . . . pr .
∈
∈
Proof. Strong
Induction on N > 1. For N = 2 use Proposition 2.53 again. By Proposition 2.54, there is a prime divisor p of N . If N = p we are done. Otherwise 1 < N p < N and so by our inductive hypothesis, N p is a product of primes:
÷
÷
N p = p1 p2 . . . pr .
÷
Thus N = p 1 p2 . . . pr p, as required.
Corollary 2.57. (Fundamental
∈
Theorem of Arithmetic, Existence) Let N N with N > 1. Then there is a number m N, distinct primes p1 , . . . , pm , and e1 , . . . , em N so that N = p e11 pe22 . . . pemm .
∈
∈
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC
73
Proof. This follows from the previous proposition by gathering together iden tical prime factors.
Now we start to deal with the issue of the uniqueness of prime factorization. For example, 1001 = 11 91 = 143 7. Does this violate unique factorization into primes? (Thanks to [ 2] for this cute example.)
×
Proposition 2.58.
Let p, a, b
×
∈ N with p prime. Then ( p|ab) ⇒ (( p|a) ∨ ( p|b)).
Proof. Suppose
that p a. Then Div(a, p) = 1, thus by the Euclidean Algorithm there are numbers m, n so that ma np = 1. Multiplying this by b yields mab npb = b. By hypothesis, p divides both parts of the left hand side, and therefore it divides b.
−
−
Let us analyze the preceding proof a bit more. Let P ,Q,R be the statements: P: “( p is a prime)
∧( p|ab)”
|
Q: “ p a”
|
R:“ p b”. The proof actually took the form (P Q) R, which by propositional logic is equivalent to P (Q R). Does that help you understand the proof better?
∧ ¬ ⇒
⇒ ∨
Note that we can use this proposition to analyze the putative prime factorizations of 1001 above. For instance 7 is prime (check!) and divides the right-hand side. It therefore divides 11 or 91, which should lead the reader to suspect that 91 is not a prime number. Proposition 2.59.
i
∈ N with p prime. If p|(a1 ·· · an), then ∃1 ≤
Let p, a1 , . . . , an
≤ n so that p |ai. Proof. Induction
on n. This is clear if n = 1, suppose it is true for n = k. Then if p (a1 ak+1 ) = (a1 ak ) ak+1 , we have p (a1 ak ) or p ak+1 by Proposition 2.58. In the first case, p divides some ai by the case n = k. In the second case we are also done.
| ···
···
·
| · ··
|
The next proposition shows that if p is prime, then the function ord p : N behaves much like a logarithm. Proposition 2.60.
If p is prime and a, b
→ N
∈ N then ord p(ab) = ord p(a) + ord p(b).
Let i = ord p (a). Thus there is a number u so that a = p i u, and p u. Similarly if j = ord p (b) there is a v so that b = p j v and p v . So ab = p i upj v = pi+j uv. The contrapositive of Proposition 2.58 tells us that p uv. Therefore ord p (ab) = i + j. Proof.
Corollary 2.61.
If p is prime, a
∈ N and e ∈ N, then ord p(ae) = e ord p(a).
This next proposition is to impress upon you the power of ord 2 :
74
2. ARITHMETIC
Proposition 2.62.
Let a, b
∈ N. Then a2 = 2b2.
that a2 = 2b2 . Taking ord2 of both sides gives 2 ord2 (a) = 2ord2 (b) + 1. The RHS is odd and the LHS is even, which is a contradiction. Proof. Suppose
√
This is the essentially the proof that 2 is irrational, i.e., Theorem ??. This argument should strike you as much more powerful and direct than the classic proof we gave earlier. See Proposition 2.71 below for the final word in such problems. Lemma 2.63.
|
If p and q are primes, and p q , then p = q .
Proof. This
is easy enough to do in your head.
Proposition 2.64.
Let p be prime and n
∈ N. Then Div( pn) = { pe | 0 ≤ e ≤ n}.
∈ Div( pn), and let e = ord p(d). Then d = p eu, with p u.n p. This implies that q | p , If u = 1 then by Proposition 2.54 it has a prime factor q = thus Proposition 2.59 implies q | p, contradicting the above lemma. We conclude that u = 1 and thus d = pe . It is easy to see that e ≤ n. Corollary 2.65. Let p, q be distinct primes, and m, n ∈ N. Then gcd( pm , q n ) = 1. Proof. Suppose d
above proposition shows that if x Div( pm , q n ) then x = pe = q f for some e, f N. If e = 0 then x = 1. Otherwise f > 0 as well and we have p q f which by Proposition 2.59 implies p q so p = q again, a contradiction. Proof. The
∈
∈
|
|
Theorem 2.66. (Fundamental
Theorem of Arithmetic, Uniqueness) Let N > 1, and suppose N factors in some way as f 2 1 N = pf 1 p2
· · · · · pf r , with the pi distinct prime numbers and f i ∈ N. Then the pi are all the prime r
divisors of N , and f i = ord pi (N ). Proof. Obviously
the pi at least form a subset of the prime divisors of N , and the definition of ord implies that f i ord pi (N ). It follows that the RHS of the equation in the corollary is no bigger than the RHS of the equation in the theorem, and equality can only hold if we have equality of the e i and f i .
≤
· · · ·
For example, 108 = 2 2 3 3 3. If we group together like factors we obtain 108 = 22 33 .
·
Corollary 2.67.
Let N > 1, and suppose p1 , . . . , pr are the (distinct) prime factors of N . Let e i = ord pi (N ). Then N = pe11 pe22
· · · · · pf r .
Proof. This
r
Corollary is obtained by combining the Existence and Uniqueness forms of the Fundamental Theorem of Arithmetic.
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC
75
9.6. More about ords
The Fundamental Theorem of Arithmetic says that knowing a number is equivalent to knowing its ords. And given whole numbers e p for every prime p, there is an n with ord p (n) = e p for all p when all but finitely many of them are 0. Namely, put n = p pep , a finite product. Moreover, knowing the ords of a number tells us how it behaves multiplicatively. To understand this, start with the following:
{ }
Proposition 2.68.
Let a, b ord p (c) = ord p (a) + ord p (b). Proof. The
∈
N. Then ab = c if and only if for all primes p,
direction ( ) is Proposition 2.60. We prove the other direction here. Let p 1 , . . . , p be the list of all the primes dividing a, b, or c. Then
⇒
· ·· · pe ) · ( pf 1 · pf 2 · ·· pf ),
a b = ( pe11 pe22
·
1
2
where ei = ord pi (a) and f i = ord pi (b). Let gi = ord p (c). Since ei + f i = gi this product becomes pg11 pg22 pg = c,
· ·· ·
as desired. Proposition 2.69.
Let a, c
∈ N. Then a |c ⇔ ord p(a) ≤ ord p(c) for all primes p.
|
If a c then the result follows from the “only if” part of the previous proposition. Conversely, if ord p (a) ord p (b) for all primes p, then let let p 1 , . . . , p be all the primes dividing a or c. Then put Proof.
≤
b =
−
ordpi (c) ordpi (a)
pi
.
i
Then the “if” part of the previous proposition proves that ab = c.
Thus we can characterize Div(n) as the set of numbers a so that for all primes p, ord p (a) ord p(n). For every p, there are (ord p (n) + 1) choices for ord p (a), and it follows that there are exactly p (ord p (n) + 1) divisors of n.
≤
Proposition 2.70.
∈
Let m, n N. Then there exists an integer r iff for all prime numbers p, we have m ord p (n). Proof. This
|
∈ N with r m = n
is left to the reader.
∈
Proposition 2.71. Let m, n N, and suppose there with r m = n. Then there do not exist integers a, b
∈
does not exist an integer r N with a m = nbm .
∈ N
Proof. In view of Proposition 2.70, the hypothesis implies that there is a prime number p so that m ord p (n). Applying ord p to both sides of am = nbm gives m ord p (a) = m ord p (b) + ord p (n).
Thus m does divide ord p (n), a contradiction.
76
2. ARITHMETIC
√ ∈ ⇒ √ ∈ −
m Remark: This is really a proof that m n Q n Z. It’s related to the application of the rational root test to the polynomial x m n.
∈
Proposition 2.72. Let a, b N. If d = gcd(a, b), then ord p (d) = min(ord p (a), ord p (b)) for all primes p. If µ = lcm(a, b), then ord p (µ) = max(ord p (a), ord p (b)) for all primes p.
Let p be a prime, and suppose i = ord p (a) ord p (b). Then pi a p b, pi Div(a, b), which is equal to Div(d) by Proposition 2.46. Therefore pi d. However, pi+1 a, so certainly pi+1 d. It follows that ord p (d) = i = min(ord p (a), ord p (b)) in this case. Obviously if ord p (b) ord p (a) a similar argument holds. Proof. and i so
| |
≤
∈
|
≤
The statement about the least common multiple is an exercise.
We now have a straightforward way to compute the least common multiple of two numbers. For instance let a = 75 and b = 21. We factor to get a = 3 52 and b = 3 7. The only nonzero ords are for p = 3, 5, 7. If µ = lcm(a, b) then we must have ord3 (µ) = 1, ord5 (µ) = 2, and ord7 (µ) = 1. This determines µ = 3 52 7.
·
·
· ·
Here is another nice application of ords: Proposition 2.73.
Let a, b
∈ N. Then gcd(a, b) · lcm(a, b) = ab.
Proof. Given
the above discussion, this reduces to proving that min(x, y) + max(x, y) = x + y for all x, y N. This is obvious from the proper point of view, or the skeptical reader may consider the trichotomy of x and y.
∈
This gives a way to compute lcm(a, b) without having to factor a and b. For example, if a = 2000002 and b = 2000004 then the Euclidean Algorithm gives that gcd(a, b) = 2 and therefore lcm(a, b) = ab 2 = 2000006000004.
÷
9.7. Exercises
(1) The second half of Proposition 2.72, and Proposition 2.70. (2) Let m > 1 and a, b N. Prove that ordm (ab) ordm (a) + ordm (b). (3) There is a formula relating ord2 (n!) and S n , where S n is the sum of the binary digits of n. Can you find it? Can you prove it? (4) Recall Euler’s totient function φ from Exercise 5 from Section 9.3. Explain why if p is prime, then φ( p) = p 1. Find a formula for φ( p2 ). Find a formula for φ( pk ), with k N. If q = p is another prime, prove that φ( pq ) = φ( p)φ(q ). (5) Consider the sequence 41, 43, 47, 53, . . . obtained by beginning with the number 41 and successively adding all positive even integers 2 , 4, 6, . . .. Are all the numbers in this list prime? Give a proof or a counterexample. Also answer the same question starting with 11 or 17. (6) Use Problem 4 in Section 8.4 to prove that the Fermat numbers are pairwise coprime. Why does this imply that there are infinitely many prime numbers?
∈
≥
∈
−
10. CHAPTER 2 WRAP-UP
77
(7) Prove that 2n 1 is composite when n is composite. If n is prime, is 2 n 1 necessarily prime? (8) For n N, write d(n) for the number of divisors of n, that is, d(n) = Div(n) . For m, n N, prove that d(mn) d(m) d(n). When is this an equality? (9) Prove that gcd(a2 , b2 ) = gcd(a, b)2 for a, b N. Is this true for other powers? (10) Prove that gcd(a,bc) gcd(a, b) gcd(a, c). (11) For which a, b, c is gcd(a,b,c) lcm(a,b,c) = abc? (12) If a x and b y prove that gcd(a, b) gcd(x, y). (13) Suppose that a bc. Prove that there are numbers a1 , a2 N so that a = a1 a2 , a1 b, and a2 c. (Use factorization and Exercise ?? in Section ??.) (14) Let m, n be relatively prime with m n. If mn is even, then
−
−
∈ |
|
∈
≥ ∈
|
|
|
|
|
gcd(n2
·
·
·
|
∈
|
≤
− m2, n2 + m2) = 1 = gcd(2mn,n2 + m2).
If mn is odd, then gcd
n2
− m2 , n2 − m2 2
2
= 1 = gcd mn,
n2
− m2 2
.
(15) Prove that log 2 3 is irrational. (If it were rational and positive , then 2 raised to some power would be equal to 3 raised to some nonzero power.) (16) Prove that log 18 12 is irrational. Let p = q be primes and m,n,x,y integers. Under what conditions is
log pm qn px q y irrational? 10. Chapter 2 Wrap-up 10.1. Rubric for Chapter 2
In this chapter you should have learned
• the Peano theory; how properties of arithmetic derive from just a few axioms • how to work with different kinds of definitions, for example, inductive definitions, the definition of a − b, and the definition of gcd(a, b). • strong induction • the role of the division algorithm and Euclidean Algorithm • arithmetic in other bases • the ord “coordinates” of numbers 10.2. Toughies for Chapter 2
(1) (Uniqueness of the Natural Numbers) Suppose M is a set with an element µ, and a “successor” function m m# for m M satisfying the analogue of the Peano Axioms. That is to say, (INF) m # = µ for all m M,
→
∈
∈
78
2. ARITHMETIC # (INJ) If m # 1 = m 2 , then m 1 = m 2 , and (IND) If S M is a subset satisfying µ S and m# S whenever m S , then S = M. Define a bijection f : N M so that for all n N, f (n ) = f (n)# . [Suggestion: Define your function “inductively”.] Be sure to prove that your function is bijective. At what points do you use the axioms (INF),(INJ),(IND) for N and M? (You need all six.) (2) For which a, b is it true that a b = b a ? Let’s see a proof! (3) Let a, b N. Recall the definition of the statement “a b”; this means there is a natural number n N with an = b. If a > 1 write log a b = n in this situation. Is loga b uniquely defined? Prove that if a, b, c N with a > 1, a c and c b, then
⊆
∈
∈ →
∈
⊥
(loga b)
∈
⊥
∈
⊥
∈
∈
÷ (loga c) = logc b.
(4) Prove that the numbers q,r,e in Problem ?? in Section ?? are uniquely determined. (5) Let n N. Prove that in base X arithmetic there is a multiple of n which is written as a string of ‘1’s followed by a string of ‘0’s. For example 11100 is a multiple of VI. (6) Recall that hn denotes the nth Hemachandra number. Let a, b N. Prove that gcd(ha , hb ) = h gcd(a,b) . (7) Prove that given a number N one can find N consecutive numbers, each having prime factors other than 2 or 3. Generalize this to any finite set of primes. (8) There is a formula relating ord p (n!) and S n , where S n is the sum of the base p digits of n. Can you find it? Can you prove it? (9) Prove that ord p ( nk ) is the number of carries that occur in the base p addition of k and n k. (10) Let m, n N . Put urdm (n) = min i N; n mi , if this set is nonempty, and let urdm (n) = otherwise. For example, urd6 (4) = 2 since 4 62 but 4 6 1 , and urd4 (6) = since 6 4 i for any i. Find and prove some interesting properties of urd. (11) Find two sequences of base-ten-digits a 1 , a2 , . . . and b 1 , b2 , . . ., with a 1 = 2 and b1 = 5, so that for any natural number n, the product of the two numbers (written in decimal) a n an−1 a2 a1 bn bn−1 b2 b1 ends with at least n zeros. For example, if your sequences started with a 1 = 2, a2 = 1, b1 = 5, and b2 = 2, then they would check out for n = 1 and n = 2, because 2 5 = 10 ends with a zero, and 12 25 = 300 ends with two zeros. Also, are there analogues in any base, not just 10?
∈
∈
∈
×
− ∞
{∈
| }
···
×
|
∞
×
·· ·
CHAPTER 3
Functions and Relations
79
80
3. FUNCTIONS AND RELATIONS
1. Relations Relations 1.1. Introduction: Multivariable Propositional Propositional Logic
So far we have seen propositions P which P which do not depend on a variable, and propositions P sitions P ((x), where x where x ranges over some set X . X . We also might like to make propositions P ( P (x, y), where x ranges over some given set X set X ,, and y and y ranges over some other set Y . Y . For example, consider the statement, “She has enough money to buy it.” Here she and it are variables. We may write the statement as P ( P (she, it), where she ranges over the set of females, and it ranges over the set of commodities (things you can buy). buy). Its truth truth value alue depends depends on the pair ( she, it). Generally Generally,, the truth value value of P ( P (x, y) will be true for certain pairs (x, ( x, y), and false for the rest.
× ×
If X and Y Y are sets, write X Y for Y for the set of (ordered) pairs (x, y), with x X and y Y . Y . The pairs pairs (x1 , y1 ), (x2 , y2 ) X Y are equal provided that ( that (x x1 = x = x 2 ) (y1 = y2 ). Definition.
∈
∈
{
∈ × ×
∧
}
{
}
For example, if X = X = 1, 2, 3 and Y = A, B , then
× × Y =
(1)
X
(1, (1, A), (1, (1, B ), (2, (2, A), (2, (2, B ), (3, (3, A), (3, (3, B )
.
If X and Y are Y are finite, our convention is to display the product set as an array of pairs, with the rows corresponding to elements of X X and and the columns corresponding to elements of Y . Y . If X = X = Y = R, then X then X Y = R2 is the familiar Cartesian plane consisting of pairs (x, y) of real numbers numbers.. If X = R2 and Y = R, then X Y Y is, strictly speaking, the set of pairs ( (v, y ), where where v = (x1 , x2 ) is a pair of real numbers, and y is a real number. number. Naturally Naturally we view ( (v, y) as the triple (x ( x1 , x2 , y) and thus identify X Y with three-dimensional space R space R 3 .
×
× ×
× ×
Let A, A X and B, B Y . Y . If A B = A B (as subsets of X Y ), Y ), is it necessarily true that A = A and B and B = B ? Not quite, quite, because because if, say say A = , then A B = (right?), and so the information of B gets lost. But this is essentially essentially the only counterexam counterexample, ple, by the following following lemma. I hope you appreciate appreciate our rigor.
×
⊆ ∅
⊆
×
×
× ×
∅
Let A, A, A X and B, B , B Y Y with A, A, B nonempty. If A A B = A = A B as subsets of X Y , Y , then A A = A = A and B = B = B .
⊆
Lemma 3.1 .
× ×
⊆
×
×
Let a A and b B. B . Then (a, (a, b) A B = A B , so a so a A and b B . Since this is true for all such a, b, we have A have A A and B and B B . Reversing Reversing the argument gives the opposite inclusion. Proof.
∈
∈
∈
∈ × ⊆
×
⊆
∈
1. RELATIONS
81
For our bivariable statement P ( P (x, y), its truth set is given by
{(x, y) ∈ X × × Y | | P ( P (x, y) is true }. Any such statement is equivalent, then, to a relation in the following sense:
⊆ ×
Let X, X , Y Y be sets. A subset R subset R X Y is Y is called a relation from X to Y . Y . Writ Write e Rel(X, Rel(X, Y ) Y ) for the set of relations from X to Y . Y . If X = Y , Y , then a relation from X to X X is simply called a relation on X . Write Rel(X Rel(X ) in this case for Rel( for Rel(X, X, X ). Definition.
× × Y ). Y ).
Note that Rel(X, Rel(X, Y ) Y ) = ℘( ℘(X
This subset could be anything. For instance, if again X = = 1, 2, 3 and Y and Y = A, B , then a relation could be given by
{
(2)
R =
(1, (1, B ), (2, (2, A), (3, (3, A),
(3, (3, B )
}
{
}
.
A more efficient notation is to simply denote this by the “asterisk-matrix”
∗ ∗∗ ∗ 0
AR =
0
.
The asterisk-matrix has the same number of rows and columns as the table corresponding sponding to X to X Y , Y , and we put an asterisk in the ( x, y)-entry if (x, ( x, y) R and put a 0 there if (x, ( x, y) / R.
×
∈
∈
Of course this depends on the way we order the sets X and Y and Y .. For instance, if we had ordered the columns as B, as B, A instead of A, A, B , then we would need to accordingly switch the columns of A A R to fit this convention. Among other things, asterisk-matrix notation gives us a nice way to catalogue finite relations on finite sets. There are 16 relations on the set X = X = a, b . Here they are represented as asterisk-matrices, with the first row and column corresponding to a and the second row and column corresponding to b: b :
{ }
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∅ ∗ ∗ ∗ ∗ 0 0 0 0
,
0 0 0
,
0 0 0 0
0
,
0 0 0
,
,
0 0
,
0 0 0
, ,
0 0 0
,
0 0
,
0 0 0
, ,
We have have names names for some of these these matrices matrices.. For instan instance, ce, 0 = A =
I X =
0
0
∗
, A X ×X =
∗ ∗
, E a,b a,b =
0 0 0
0
0
0
,
0
,
0 0 0 0
,
.
If X X = Y = R, R , then any figure in the plane is a relation on R. Let us give give names names to some pleasant relations on R, R , so we can use them later.
, .
82
3. FUNCTIONS AND RELATIONS
(1) Write S 1 for the circle (x, y) R2 x2 + y 2 = 1 . (2) Write D 2 for the disc (x, y) R2 x2 + y 2 1 . (3) Given m R, write m for the line (x,mx) R 2 x origin, with slope m. (4) Write ∞ for the y-axis.
{ {
∈
Let R be a relation on X statement “(x, y) R”.
∈
× Y .
∈ | ∈ | {
For x
} ≤ } ∈ | ∈ R } through the
∈ X and y ∈ Y we write “xRy” for the ∅
×
One always has the “empty relation” R ∅ = , and the “total relation” R = X Y . In the former case xRy is always false, in the latter case xRy is always true. If x0 X and y 0 Y , write δ x0 ,y0 for the from from X to Y consisting of the single pair (x0 , y0 ) . These are called “Dirac-delta” relations. In this case xRy (x = x0 ) (y = y0 ).
∈ { ∧
}
∈
⇔
1.2. Reflexivity
The diagonal is an important relation. Definition. The
{
relation ∆X = (x, y)
diagonal of X .
∈ X × X | x = y } is called the
We have a∆X b a = b. This can be thought of as the graph of the function “y = x”. Of course this is equal to 1 when X = R.
⇔
If X is a finite set with say three elements, then we would naturally represent ∆ X with the diagonal asterisk-matrix, which we write as I X : I X =
Definition. A
∆X
∗
∗
0 0 0 0 0 0
∗
.
relation R on X is called reflexive provided that
⊆ R.
This simply means that xRx is true for all x.
|
Example: The relation x y on N is reflexive, but the relation x < y on N is not. Example: Let
relation on
L be the set of lines in the plane.
Then parallelism is a reflexive
L, but orthogonality is not.
1.3. Transposes of Relations
Not all relationships are symmetric. The relation R : “Person x is taller than person y.” is certainly not. The transpose of a relation is what you get when you switch the roles of x and y. So, the transpose of R is the relation R T : “Person y is taller than person x.”
1. RELATIONS
83
Definition. Let X, Y be sets. If R is a relation from inverse RT is the relation from Y to X defined via
RT = (y, x)
{
X to Y , then its
∈ Y × X | (x, y) ∈ R}.
Say that a relation R on X is symmetric provided that R T = R.
Example: Let X = N. The transpose of “x < y” is “x > y”. The transpose of
x = y is x = y. Example: The transpose of the relation (2) is T
R =
(A, 2), (A, 3), (B, 1), (B, 3)
,
whose asterisk-matrix is simply
ART =
0
∗ ∗ ∗ 0 ∗
.
It is generally true that the asterisk-matrix corresponding to the transpose relation is the transpose of the asterisk-matrix corresponding to the original relation. In other words, ART = (AR )T . Example: Take X = R. To find the transpose, one reflects through the line x = y.
This has the effect of switching the coordinate (x, y) to (y, x). Note that S 1 and D2 are symmetric. The transpose of the x-axis 0 is the y-axis ∞ . What is the transpose of m with m = 0? (Pssst: If R is the graph of a function f , and f is invertible, then R T is the graph of the inverse of f !) The transpose of the second quadrant is the fourth quadrant. The first and third quadrants are symmetric.
Proposition
3.2. The following hold for relations R from X to Y :
• ∅T = ∅. T • (X T ×T Y ) = Y × X . • (R ) = R. T • For x ∈ X and y ∈ Y , we have δ x,y = δ y,x . Can you prove each of these statements?
1.4. Exercises
(1) Let X be a set with n elements. How many relations R on X are symmetric? How many are reflexive? How many satisfy R T R = ?
∩
∅
84
3. FUNCTIONS AND RELATIONS
2. Composition of Relations
∈
∈
Let X , Y , and Z be three sets, R Rel(X, Y ) and S Rel(Y, Z ). The composition S R Rel(X, Z ) is the relation defined as Definition.
◦ ∈ S ◦ R = {(x, z) ∈ X × Z | ∃y ∈ Y so that ((x, y) ∈ R) ∧ ((y, z) ∈ S )}. We say that a relation R on a set X is transitive provided that R ◦R ⊆ R. Thus x(S ◦ R)z ⇔ ∃y so that xRy ∧ yRz. To say that a relation R on a set X is transitive, is the same as to say that whenever xRy and yRz are true for x, y,z ∈ X , then we also have xRz is true.
Example: Let X = 1, 2, 3 , Y = A, B , and Z = γ,δ, . Let
{
}
{
R = and S =
}
{
(1, B), (2, A), (3, A),
(3, B)
(A, γ ),
}
.
(A, ), (B, δ )
.
Please check that
◦
S R =
(1, δ ), (2, γ ), (2, ), (3, γ ), (3, δ ), (3, )
.
Example: Let X = Y = Z = N, and let R be the relation “<”. When is
∈ ◦
∈ − ≥
(x, z) R R? Exactly when there is a y N so that x < y < z . Clearly, this is possible, for integers, exactly when z x 2. Thus x(R R)y x + 2 y.
◦
⇔
≤
Example: (“Squaring the Circle”) Let R = S 1 , the graph of the unit circle in R 2 .
Then
◦
{
R R = (x, z)
∈ R2 | ∃y ∈ R so that x 2 + y2 = 1 = y 2 + z2}.
Let us try to understand this relation well enough to sketch it. Certainly it implies that x2 = y 2 , or that x = z. But if you have such a pair (x, z), there still may not exist such a y, because you also need to solve x 2 + y 2 = 1. This is possible iff 1 x 1. So our relation is:
±
− ≤ ≤
∈ R2 | (x2 = z 2) ∧ −1 ≤ x ≤ 1} which is the graph of the two diagonals of the square [ −1, 1] × [−1, 1]. ◦
{
R R = (x, z)
2. COMPOSITION OF RELATIONS
85
2.1. Asterisk-Matrix Composition
The calculation of S R in the previous section can be reduced to the“asteriskmatrix multiplication” of 0 0 AR = 0 and A S = . 0 0
◦
∗ ∗∗ ∗
∗
∗
∗
∗
For this I ought to tell you how to add and multiply 0’s and ’s. The idea is to treat as “unknown”. The sum of an unknown and anything is an unknown, but the product of an unknown and 0 is still 0. This leads to the following addition/multiplication tables:
∗
+ 0 0 0
0 and 0 0 . 0 Using these rules you can define matrix multiplication using the usual dot products. For instance, in the above example, the product of A R and A S is given by
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗∗ ∗ 0
0
0
∗
0
× 0 ∗
∗ ∗
∗ ∗ ∗
0
0
=
∗ 0 0 ∗ ∗ ∗
.
Note that this is precisely equal to A S ◦R , and therefore in this case,
·
AS ◦R = AR AS . Here the multiplication on the right hand side is asterisk-matrix multiplication. In fact, this is true in general. We omit the proof, but isn’t it curious that the order of multiplication changed? 2.2. Properties of Composition Proposition 3.3 . (Associativity
∈
Rel(X, Y ), H
of Composition) Let X, Y,Z,W be sets. Let K Rel(Y, Z ), and G Rel(Z, W ). Then
Proof. The
∈
∈
(G H ) K = G (H K ).
◦
◦
◦ ◦
left hand side is equal to
{(x, w) ∈ X × W | ∃y ∈ Y so that ((y, w) ∈ G ◦ H ) ∧ ((x, y) ∈ K )}. Using the definition of (y, w) ∈ G ◦ H , this breaks further into {(x, w) ∈ X ×W | ∃y ∈ Y, ∃z ∈ Z so that ((y, z) ∈ H )∧((z, w) ∈ G)∧((x, y) ∈ K )}. But this is the same as
{(x, w) ∈ X × W | ∃z ∈ Z so that ((z, w) ∈ G) ∧ ((x, z) ∈ H ◦ K )}, which is now equal to G ◦ (H ◦ K ). Thus these two sets are equal. Remark: The above proof used the “associativity” of ; that is that (P Q) R
P
∧
∧ (Q ∧ R) for statements P,Q,R. Do you see where?
∧ ∧ ≡
86
3. FUNCTIONS AND RELATIONS
3.4. (Composition and Transposes) Let X , Y , Z be sets. Let H Rel(X, Y ) and G Rel(Y, Z ). Then Proposition
∈
(G H )T = H T GT .
◦
Proof. Note
∈
that G
◦
◦ H ⊆ X × Z and so (G ◦ H )T ⊆ Z × X .
The left hand side of the equation in the proposition is equal to
{(z, x) ∈ Z × X | (x, z) ∈ G ◦ H }. ◦ H , this is equal to {(z, x) ∈ Z × X | ∃y ∈ Y so that ((x, y) ∈ H ) ∧ ((y, z) ∈ G)},
By the definition of G
which is equal to
{(z, x) ∈ Z × X | ∃y ∈ Y so that ((y, x) ∈ H T ) ∧ ((z, y) ∈ GT )}. Let us rewrite this as
{(z, x) ∈ Z × X | ∃y ∈ Y so that ((z, y) ∈ GT ) ∧ ((y, x) ∈ H T )}. This set is now equal to H T ◦ GT .
∧; that is that P ∧ Q ≡
Remark: The above proof used the “commutativity” of
Q
∧ P for statements P, Q. Did you see where?
2.3. Graphical Views of a Relation
Suppose you have a graph G consisting of vertices V and edges E between them. Every edge e starts at some “initial” vertex v 1 and ends at some “terminal” vertex v2 . In graph theory, it doesn’t really matter if the edges are straight lines or not, and it is fine to move the vertices and edges around, as long as the same vertices are connected with the same edges. When we use computers to study graphs, the essential information we upload is our vertex set, maybe V = a,b,c , and our edge set. To describe an edge to a computer, we only need to say where it begins and ends. So e above would be uploaded as e = (v1 , v2 ) V V . Thus, the edge set E V V is simply a relation on V .
{
}
∈ ×
⊆ ×
For example, consider the following graph on the vertex set V = a,b,c :
{
}
[There should be a nice picture here. Can you reconstruct it? The edges must all have arrows.] This corresponds to the relation
{
}
E = (a, c), (b, a), (b, b), (c, a) . Let us note some phenomena above. The edge from b to b is called a “loop”; generally a loop is an edge that begins and ends at the same vertex. With our earlier terminology, the set of loops is exactly E ∆V . Thus E will be reflexive if there is a loop at every vertex.
∩
2. COMPOSITION OF RELATIONS
87
Vertices a and c have two edges between them, but they are not considered the same edge because they are going in different directions. When we have two vertices joined in both directions by two edges, we should replace the two edges with a “simple” edge with no arrows: [Picture where we replaced the two edges with arrows from a to c with a single edge without an arrow.] The relation E is symmetric iff all the edges are now “simple”. [I’d like to describe what I call the ”bipartite view” of a relation between two sets. This is where you draw two horizontal ovals with some dots in between them, and draw arrows from the dots on the left to the dots on the right. For instance when I spoke in class about functions I drew several of these.]
2.4. Exercises
(1) There are three relations on X = a, b which are not transitive. Find them. (2) Find a relation R on X = a, b which is transitive, but for which R R = R. (3) Let X be a set, and R a relation on X . Prove that if R is reflexive and transitive, then R R = R. (4) There are five equivalence relations on the set X = a,b,c of three distinct elements. Find them, and write them as “asterisk-matrices”. (5) Let P1 be the set of lines in R2 passing through the origin. On P1 , consider the relation 1 R2 provided that 1 and 2 are orthogonal. Compute R R. (6) Let P2 be the set of lines in R3 passing through the origin. On P2 , consider the relation 1 R2 provided that 1 and 2 are orthogonal. Compute R R. (7) Compose the relations m , ∞ , S 1 , D2 from the first section with each other. Can you find examples of relations which do not commute here? Also compose these relations on both sides with the “total relation” R 2 . (8) Consider the relation R R2 given by the square pictured below. The square has vertices (0, 0), (1, 0), (0, 1), and (1, 1), and it is the union of four closed intervals in the obvious way. Describe the relation R R. y
{ }
{ }
◦
◦
{
}
◦ ◦
⊆
◦
(1, 1)
x (9) If R ⊆ R and S ⊆ S , then R ◦ S ⊆ R ◦ S .
⊆ × Z and R ⊆ X × Y be relations. Suppose that S = S 1 ∪ S 2.
(10) Let S Y Prove that
◦
◦ ∪ (S 2 ◦ R). (11) Let R ⊆ X × Y be a relation. Prove that ∆ Y ◦ R = R and R ◦ ∆X = R. S R = (S 1 R)
88
3. FUNCTIONS AND RELATIONS
(12) What happens when you compose the elementary relations δ x,y with other relations? (Think about both δ x,y R and R δ x,y .) Suggestion: Experiment with the R 2 relations. (13) Let X, Y be sets, x, x X , and y, y Y . Give a formula for δ x,y δ x ,y , where x, x X and y, y Y . (14) Square all 16 “asterisk matrices” in the previous section. Make a list of the 2 2 matrices which are “squares”. I think it is an interesting project to determine which n n “asterisk matrices” are squares. This, of course, is equivalent to determining which relations on a set with n elements are squares.
◦
∈
∈
×
∈
◦
∈
◦
×
3. Functions
The modern definition of a “function” was not enunciated until the middle of the 19th century, a considerable time after the advent of calculus, for instance. Mathematicians before this just basically dealt with expressions, usually power series or rational functions (quotients of polynomials). Words like “singularity” were used for a point not in the domain of a function. If you were an expert mathematician, you knew what you were doing. But without the modern notion, it is difficult to understand things like inversion. For instance, the inverse trig functions are difficult to grasp without the proper vocabulary of domain and codomain.
→
Let X, Y be sets. A function “ f : X Y ” is a rule which assigns to every element x X a unique element y Y . We write “ f (x) = y” to indicate this rule. The set X is called the domain of f , and the set Y is called the codomain of f . Definition.
∈
∈
The words “function”, “transformation”, and “map” are all synonyms. Some use the word “target” as a synonym for “codomain”. The word “range” is used inconsistently, sometimes meaning “codomain” and sometimes meaning “image” (see below). It is best avoided. For example, the function f : R codomain R.
→ R given by f (x) = x 2 has domain R and also
You may rightfully argue that in the (standard) definition above, I have used an undefined concept “rule”. This is not insurmountable; I will later indicate how one can alternately define function in terms of a certain kind of relation. (One that “passes the vertical line test”.) It is essential, however that the function is “well-defined”, or “well-formed”. Example: The expression
f (x) =
→
x2 + x if x < 1 5 if x > 1
−
does not define a function f : R R. The problem is that it defines for example f (0) in two different ways; both as 0 and 5! If a putative function (or rule) gives more than one answer it is called “not well-defined”. Actually to say that a function is well-defined is redundant, but we say so anyway to emphasize this point.
3. FUNCTIONS
Example: Writing f (x) =
1 2 x does
89
→
not define a function f : R R; it is obviously not defined at 2. You can’t later say f (2) = , unless you change the codomain to include . (Or . Let us not talk about that.)
−
∞
∞
±∞ √ z to define a function f : C → C is (very) bad. Example: Expecting f ( z) = √ What should −1 be? 3.1. Injectivity
The following is one of the most important definitions in mathematics: Definition.
provided that: x1 , x2
•∀
→ Y be a function. We say that f is injective
Let f : X
∈ X , we have (f (x1) = f (x2)) ⇒ (x1 = x2).
Using the contrapositive, we can equivalently say that f is injective iff
• ∀x1, x2 ∈ X , we have (x1 = x2) ⇒ (f (x1) = f (x2)). The word “one-to-one” is often used as a synonym for “injective”, however it is easily confused with the phrase “one-to-one correspondence” (see below), and so I shan’t use it. The noun form of this concept is “injection”. A function is an injection provided that it is injective. Suppose you are grading a student’s work, in which he is trying to demonstrate that 2 2 = 2. It reads:
− √ √
− √ 2 = √ 2 √ √ 1− 2= 2−1 √ √ (1 − 2)2 = ( 2 − 1)2 √ √ 1+2 2+2=2−2 2+1 √ √ 3−2 2= 3−2 2
To show: 2
Hence, proved.
Where did the student go wrong? First of all, he is not really explaining his logic with sentences. “Hence, proved.” is not useful. So we have to guess his thought process. The first step is evidently subtracting 1 from both sides of the equation, and the second step is squaring both sides of the equation. But, the implied logic seems to be: “I wanted to prove two things are equal, I apply some operations to both of them and they become equal. Therefore the original two things must be equal.” This only works if the operations are injective , and that is exactly where the putative proof breaks down. So, injectivity is an important notion to bear in mind throughout mathematics. Is f (x) = x 2 injective? Hold on; the question of injectivity really depends on the domain of the function. If we mean f : R R it is certainly not injective, since
→
90
3. FUNCTIONS AND RELATIONS
−
−
f (1) = f ( 1) = 1, but 1 = 1. On the other hand, if we mean f : [0, it is injective. The domain is important.
∞) → R then
If you look at the graph of a real-valued function, with domain some subset of R, the function is injective iff it satisfies the “horizontal line test”. This means that any horizontal line must only intersect the graph at most once if the function is to be injective. Can you explain why?
→ ∈
Suppose that f : I R, where I is an interval. We know from calculus that if f (x) > 0 for all x I , then f is a strictly increasing function. If you don’t know what that means, Definition.
that x1 , x2
∀
Lemma
∈
→
Let f : I R. Then f is strictly increasing provided I , we have (x1 < x2 ) (f (x1 ) < f (x2 )).
⇒
3.5. If f : I
→ R is a strictly increasing function, then f is injective.
Let x = y be in I . We may assume that x < y. Then f (x) < f (y), which implies that f (x) = f (y). Therefore f is injective. Proof.
Similarly, if f (x) < 0 for all x that it is injective.
∈ I , then f is strictly decreasing , which also implies
3.6. If f : I R has f (x) > 0 for all x f (x) < 0 for all x I , then f is also injective. Corollary
∈
→
∈ I , then f is injective.
If
Caution: The domain needs to be an interval (i.e. connected). The function
f (x) = tan x satisfies f (x) > 0 for all x in its natural domain, but an application of the horizontal line test to the graph (which you should know by heart) shows that it is certainly not injective. 3.2. Surjectivity
A function need not take on all the values in its codomain. Sometimes this set of values is hard to calculate, as with f : R R given by f (x) = x 4 7x + 1.
→
−
→ Y . The image of f is the set im f = {y ∈ Y | ∃x ∈ X so that f (x) = y }. For example, the set f : R → R given by f (x) = x 2 has im f = [0, ∞). Definition.
Let f : X
Let me state a powerful theorem from calculus, which combines the Extreme Value Theorem and the Intermediate Value Theorem:
→ { | ∈
3.7. Let a < b in R and f : [a, b] R a continuous function. Put m = min f (x) x [a, b] and M = max f (x) x [a, b] . Then the image of f is equal to [m, M ]. Theorem
{
| ∈
}
}
(The existence of the min and max is from the Extreme Value Theorem.)
3. FUNCTIONS
91
Next we have an example from multivariable calculus. Example: Let f : R
→ R 2 be the function f (t) = (cos(t), sin(t)). The image of f 2
is the unit circle in R . If we use the same formula for f , but view it as a function f : [0, π/2) R2 , then the image of f is the part of the unit circle in the first quadrant, with one endpoint closed and the other open.
→
Definition.
Let f : X
→ Y . We say that f is surjective provided that
Y = im f .
A synonym for “surjective” is “onto”. Example: If f : [a, b]
→ R is continuous then it is never surjective, by Theorem 3.7.
3.3. Bijectivity
Bijective functions are particularly important.
→ Y is bijective provided that it is both
function f : X injective and surjective. Definition. A
∞) → [0, ∞) given by f (x) = x2 is a bijection. Example: Let us see that the function f : R2 → R2 defined by f (x, y) = (x, x+y) is a bijection. For injectivity, suppose that f (x1 , y1 ) = f (x2 , y2 ). Thus (x1 , x1 + y1 ) = (x2 , x2 + y2 ), which is the statement that (x1 = x 2 ) ∧ (x1 + y1 = x2 + y2 ). It follows Example: The function f : [0,
easily that (x1 , y1 ) = (x2 , y2 ), and so f is injective.
For surjectivity, let (a, b) R2 . We must determine whether there is an (x, y) so that f (x, y) = (a, b). Thus we must solve (x, x + y) = (a, b), which is easily done. Put x = a and y = b a.
∈
−
Definition.
Let X be a set. The function f : X f (x) = x x
∀ ∈ X
→ X via
is called the identity function on X , written f = idX . It is obvious but important that id X is a bijection. Here is what calculus/analysis says about continuous bijections of intervals:
∈
→
3.8. Let a, b R, and f : [a, b] R a continuous function. Then f is a bijection onto its image iff either f is strictly increasing on [a, b] or f is strictly decreasing on [a, b]. Theorem
3.4. Composition of functions
A basic notion in mathematics is the composition of functions.
92
3. FUNCTIONS AND RELATIONS
→ → (g ◦ f )(x) = g(f (x)).
→
Let X , Y , Z be sets, and f : X Y , g : Y Z functions. The composition g f : X Z is the function defined by Definition.
◦
Example: Continuing notation from the following section, if B is a matrix with
◦
n rows and p columns, then LB L A = LBA , where BA is defined by matrix multiplication.
→ T is a third function, we have ((h ◦ g) ◦ f )(x) = (h(g(f (x))) = (h ◦ (g ◦ f ))(x).
Composition is associative, since if h : Z
Note that we also have
◦
◦
f idX = f, and idY f = f. Proposition
(1) (2) (3) (4) (5)
3.9. With notation as above,
If f , g are injective, then so is g f . If f , g are surjective, then so is g f . If f , g are bijective, then so is g f . If g f is injective, then so is f . If g f is surjective, then so is g .
◦ ◦ ◦
◦ ◦
{ }
To appreciate some of this, consider the following example: Let X = Z = a, b and Y = P,Q,R . Let f be the map f (a) = P , f (b) = Q. Let g be the map g(P ) = a, g(Q) = b, g(R) = b. Then g f = idX and is therefore a bijection, but f is not surjection and g is not injective.
{
}
◦
∈
Proof. For
the first part, let x 1 , x2 X . If g(f (x1 )) = g(f (x2 )), then since g is injective we have f (x1 ) = f (x2 ), and since f is injective we deduce that x 1 = x 2 . This shows that g f is injective.
◦
For the second part, let z Z . Since g is surjective, there is a y Y so that g(y) = z. Since f is surjective, there is an x X so that f (x) = y. Then g(f (x)) = g(y) = z, which shows that g f is surjective.
∈
◦
∈
∈
The third part follows from the first and second parts, and the rest you should do yourself. 3.5. Inverses of functions
→
Let f : X Y . A function g : Y provided that (g f = idX ) (f g = idY ). Definition.
◦
∧ ◦
→ X is inverse to f
◦ → R2, f (x, y) = (x, xy ) has inverse g(x, y) =
Merely one of the conditions, i.e. g f = idX is not enough by the example above. Example: The function f : R2
(x, y
− x), since: (g ◦ f )(x, y) = g(x, x + y) = (x, y), (f ◦ g)(x, y) = f (x, y − x) = (x, y).
3. FUNCTIONS
Proposition 3.10. (Uniqueness
93
→ X are both inverse to
of Inverses) If g 1 , g2 : Y
f : X
→ Y , then g1 = g2.
Proof. We
have
◦ = (g2 ◦ f ) ◦ g1 = g 2 ◦ (f ◦ g1 ) = g 2 ◦ idY
g1 = idX g1
= g 2 .
Proposition 3.11. A
→ Y is bijective iff it has an inverse.
function f : X
If f has an inverse g, then it is bijective by the last two parts of Proposition 3.9. Conversely, suppose that f is injective and surjective. Define g : Y X by the rule: Proof.
→
g(y) = the unique x
∈ X so that f (x) = y.
Note that such an x exists because f is surjective, and this x is uniquely determined because f is injective. Now for all y Y ,
∈
f (g(y)) = f (x) = y, and for all x
∈ X , and y = f (x), we have g(f (x)) = g(y) = x.
(Think these equations through!) Therefore g is the inverse of f .
Inverses depend very much on the domain and codomain of the function. For instance f 1 : [0, ) [0, ) given by f 1 (x) = x 2 has inverse g : [0, ) [0, ) given by g1 (x) = x, but f 2 : ( , 0] [0, ) given by f 2 (x) = x 2 has inverse g : [0, ) ( , 0] given by g 1 (x) = x.
∞ √ → ∞ ∞ → −∞
∞ → ∞
−∞ → ∞ √ −
This is especially prominent for the “inverse trig functions”. For instance, the sine function naturally has domain R and codomain R, but it is not injective or surjective as such. One typically restricts the domain to [ π/2, π/2] and the codomain to [ 1, 1] to obtain a bijection. Thus, one has an inverse “arcsin : [ 1, 1] [ π/2, π/2]. Note that with this convention arcsin(y) doesn’t take the value π, even though sin(π) = 0.
−
−
−
−
→
One could easily find other domains on which sin is injective, such as [π/2, 3π/2]. But the original one is more commonly taken, and numbers in this range are called the “principal value” of the arcsine function. A worse situation is trying to invert the cotangent function. [picture] Which domain should we take for cotangent? Different authorities make different choices.
94
3. FUNCTIONS AND RELATIONS
Wikipedia, for instance, says that we should invert cotangent on (0 , π) Mathematica says we should restrict cotangent to a function
−
∪ (0, π/2] → R. Both of these are bijections. What is arccot( −1), for instance? should be 3π/4 and Mathematica says it should be −π/4.
→ R, but
cot : ( π/2, 0)
Wikipedia says it
This can be confusing if you don’t have a handle on the domain/codomain concept. Rather than trying to memorize conventions, try to understand what the logical issue is. 3.6. Sections and Retractions of functions
Let f : X Y be a function. A function r : Y retraction of f provided that r f = idX . A function s : Y section of f provided that f s = idY .
→
Definition.
◦
◦ Proposition 3.12. Let f : X → Y be a function. Then
→ X is a → X is a
(1) f is an injection iff there is a retraction of f . (2) g is a surjection iff there is a section of f . Theorem 3.13.
Let X and Y be nonempty sets. Then there is an injection from X to Y iff there is a surjection from Y to X .
→
Proof. Suppose
→
there is an injection f : X Y . Let r : Y X be a retraction of f , so that r f = idX . Then r is the required surjection. Suppose there is an surjection f : Y X . Let s : X Y be a section of f , so that f s = idY . Then s is the required injection.
◦
◦
→
→
3.7. Exercises
(1) The last two parts of Proposition 3.9. (2) Write down all the 3 2 *-matrices corresponding to injective functions (from a set with 2 elements to a set with 3 elements). (3) Give an example of a function f : [0, 1] (0, 1) which is injective but not surjective. (4) Let a, b, c, d R. Consider the function L : R2 R2 defined by L(x, y) = (ax + by,cx + dy), and the function L∗ : R 2 R 2 defined by L∗ (x, y) = (dx by, cx + ay). Compute L L∗ and L∗ L. Let D = ad bc. If D = 0 prove that L is not injective. If D = 0 prove that L is a bijection. (5) Let X be the set of nonzero vectors in R3 . Consider the relation: (x,y,z) (x , y , z ) provided that there exists λ = 0 so that (x , y , z ) = (λx,λy,λz). Check that this is an equivalence relation. Describe an equivalence class under this relation. Can you describe the quotient set? (6) Let X = N 2 . Say that two pairs (a, b) and (a , b ) in X are proportional provided that ab = a b. Check that proportionality is an equivalence relation. Describe an equivalence class under this relation. Can you describe the quotient set?
×
→
∈
− −
◦
→ → ◦
−
∼
4. FUNCTIONS AS RELATIONS
→ → →
95
→
◦
→
(7) Let f : X Y and g : Y Z be functions. Suppose that g f : X Z is surjective. Prove that g is surjective. (8) Let f : R 2 R be given by f (x, y) = x + y. Find two different sections of f . (9) Let f : R R2 be the map defined by f (t) = (2t + 1, t 7). Find two different retractions of f .
−
4. Functions as Relations
In this section we embed the concept of a function into the concept of a relation. Definition.
If R
∈ Rel(X, Y ), and x ∈ X , put R(x) = {y ∈ Y | (x, y) ∈ R} ⊆ Y.
[Examples] Definition.
∈ Rel(X, Y ) is called a ∈ X , the set R(x) is a
Let X and Y be sets. A relation R
function from X to Y provided that for all x
singleton.
∅
Let us apply this definition when X = . In this case Rel(X, Y ) = ℘(X
× Y ) = ℘(∅ × Y ) = ℘(∅) = {∅}. ∅
Thus, Rel(X, Y ) consists of only the “empty relation” R∅ = . Let us determine whether this is a function from X = to Y . The question is whether it is true that for all x X = , the set R(x) is a singleton. Since it is never true that x , the statement x X, R(x) is a singleton.
∈
∅
∅
∈∅
∀ ∈
is vacuously true. Therefore R ∅ is a function. It is called the empty function. We now extend the notions of injective, surjective, and bijective to the context of relations. Definition. A
relation R is injective provided that: for x, x
∈ Y ,
y
∈ X and
(x, y), (x , y) R x = x . A relation R is surjective provided that y Y x X so that (x, y) R. A relation is bijective provided that it is injective and surjective. Proposition 3.14.
∈ ⇒ ∀ ∈ ∃ ∈
Let X be a set.
∅ → → ∅ ∅ → ∅
(1) There is a unique function X ; it is an injection. (2) There is no function X , unless X = . (3) There is a unique bijection .
∅
∈
96
3. FUNCTIONS AND RELATIONS
4.1. Exercises
(1) Write down all the 3 2 *-matrices corresponding to injective functions (from a set with 2 elements to a set with 3 elements). (2) Write down all the 2 3 *-matrices corresponding to surjective functions (from a set with 3 elements to a set with 2 elements). (3) Let X and Y be sets, and R Rel(X, Y ). We say that R is injective provided that, for x, x X and y Y , we have (x, y), (x , y) R x = x . Explain why the “empty function” is injective. (4) Let X and Y be sets, and R Rel(X, Y ). We say that R is surjective provided that, for all y Y there exists x X so that (x, y) R. Give an example of some R Rel(R, R) which is injective and surjective, but is not a function.
× ×
∈
∈ ∈
∈
∈
∈
∈ ⇒
∈
∈
5. Partially Ordered Sets
∈
relation R Rel(X ) is a partial ordering relation on X provided that: (1) x X , we have (x, x) R. (Reflexivity) (2) x,y,z X , we have (x, y), (y, z) R (x, z) R. (Transitivity) (3) x, y X , we have (x, y), (y, x) R x = y. (Antisymmetry: R RT = ∆X ) A set X considered with a partial ordering is called a partially ordered set, or simply a poset. Definition. A
∀ ∈ ∀ ∈ ∀ ∈ ∩
We usually write ‘x
∈
∈ ⇒ ∈ ⇒
∈
≤R y ’ or simply ‘x ≤ y’ to mean (x, y) ∈ R in this situation.
Examples:
• Let X = N. The usual order on N corresponds to the relation R< = {(a, b) | a ≤ b } on R. Then this is a total ordering. On the other hand, the relation R | = {(a, b) | a|b} on N is just a partial ordering. • Let S be a set, and let ℘(S ) be the power set of S . Write R for the relation {(A, B) ∈ ℘(S ) × ℘(S ) | A ⊆ B } on ℘(S ). This is the inclusion relation, which is usually just a partial ordering. • Here are two natural partial orderings on N2. The product order on N2 is the prescription that (a1 , b1 ) ≤ (a2 , b2 ) provided that a 1 ≤ a2 and b1 ≤ b2 . For instance (4, 6) ≤ (7, 6). The lexicographic (or dictionary) order is the rule that (a1 , b1 ) ≤ (a2 , b2 ) provided that (a1 < a2 ) ∨ (a1 = a 2 ∧ b1 ≤ b2 ). For instance (4, 6) ≤ (5, 1). In similar fashion one can take the product of any two posets, or indeed any finite number of posets. • If X is any set, simple equality is a partial ordering.
6. CHAPTER 3 WRAP-UP
97
Let X be a set, and ≤1 , ≤2 partial orderings on X . We say ≤1 is weaker than ≤2 provided that ∀x, y ∈ X, x ≤1 y ⇒ x ≤2 y. We also say here that ≤2 is stronger than ≤1 . Definition.
The equality relation is weaker than any other partial ordering. The product order on N 2 is weaker than the lexicographic order. Note that the relation “ partial orders on a set.
≤1 is weaker than ≤2” is itself a partial order on the set of ≤
partial ordering on X is a total ordering provided that, for all x, y X , we have (x y) (y x). A poset where the ordering is a total ordering is called a toset. Definition. A
∈
≤ ∨ ≤
If R is the corresponding relation, then this condition is equivalent to X = R RT .
∪
The usual ordering on N, Q, R are total orderings. The 5.1. Exercises
(1) List all partial orders on J 3 . How many are total orders? How many are well-orders? (2) How many total orders are there on J n ? (3) Show that on a set X , no partial orders are strictly stronger than total orders. (4) Let X be a toset. Let X =
X x .
∈
x X
Prove that exactly one of the following is true: (a) X = X . (b) X X is a singleton M , where M = max X . (5) Let X be a poset. Show that there is a subposet Y ℘(X ) so that X is order-isomorphic to Y .
−
{ }
⊆
6. Chapter 3 Wrap-up 6.1. Rubric for Chapter 3
In this chapter you should have learned..
• What a relation is, and how to compose two of them. • The definition of an equivalence relation, what an equivalence class is, and an idea of what the quotient set is. • What a function is, when they are injective, surjective, bijective. What an inverse is. • What equivalence relations have to do with partitions.
98
3. FUNCTIONS AND RELATIONS
• A little about what ∼-invariant functions are, especially for defining functions on angles.
6.2. Toughies for Chapter 3
(1) Let X be a set. Which relations on X commute with all other relations? (Relations R and S on X commute provided that R S = S R.) (2) Let X be a set. We say that a relation S on X is a square root of a relation R provided that S S = R. Does every relation have a square root? Is the square root unique if it exists? Does the unit circle relation on R have a square root?
◦
◦
◦
CHAPTER 4
Cardinality
99
100
4. CARDINALITY
1. Finite and Infinite Sets
We now define when two sets have the same “size”: Let X, Y be sets. We say that X and Y are equipotent provided that there is a bijection from X to Y . We write X Y if they are equipotent. Definition.
∼
Note that:
• X ∼ X by using idX . • If X ∼ Y , then Y ∼ Y . This is because the inverse of a bijection is another bijection. • If (X ∼ Y ) ∧ (Y ∼ Z ), then X ∼ Z . This is because the composition of two bijections is a bijection.
1.1. The sets J n
∈ N let J n = {0, 1, . . . , n − 1}, a subset of N. Let
Definition. Given n
∅
J 0 = . Lemma
4.1. Let n
J n−1 .
≥ 1 and 0 ≤ j < n. Then there is a bijection from J n − { j } to
Proof. Define f
: J n
− { j } → J n−1 by f (i) =
i if i < j i 1 if i > j
−
;
it is certainly a bijection. Proposition
(1) (2) (3) (4)
4.2. Let m, n
∈ N.
There is an injection from J m to J n iff m n. There is a surjection from J m to J n iff m n > 0, or m = n = 0. There is a bijection from J m to J n iff m = n. A function f : J n J n is an injection iff it is a surjection.
≤ ≥
→
Proof. If m n, then inclusion J m J n gives an injection. We will prove the converse, “If there is an injection from J m to J n , then m n,” by induction on m. If m = 0, then m n, so that is settled. So suppose that m > 0, and ϕ : J m J n is an injection. Let j = ϕ(m 1) J n . (This implies n > 0.) Then
≤
→
⊆
≤
ϕ J m
|
1
−
≤
− ∈ : J m−1 → J n − { j }
is an injection. Let f be the bijection from the lemma. The composition J m−1
→ J n − { j } → J n−1
is an injection, being the composition of two injections. By induction, we conclude that m 1 n 1, which implies that m n.
− ≤ −
≤
1. FINITE AND INFINITE SETS
101
For the second, use Theorem 3.13 together with the first part. The third part follows from the first two parts. Let f : J n J n be a surjection. Suppose that f were not injective. Then there would be unequal j, k J n so that f ( j) = f (k). (This implies n 2.) Note that
→
∈
|
≥
→ J n
f J n −{j }
would again be a surjection. (Why?) Composing this with a bijection from J n−1 to J n j (via Lemma 4.1) we would obtain a surjection from J n−1 to J n . This would contradict part (ii), so the original f must be an injection.
−{ }
We leave it to the reader to prove the converse, i.e. that an injection f : J n must be a surjection. 4.3. There is a bijection from J m from the power set of J n to J 2n . Proposition
→ J n
× J n to J mn. There is a bijection
Proof. If m or n are 0, the statements are clear. So assume m, n = 0. For the first statement, check that the function f (a, b) = na + b is a bijection. For the second statement, use the function
f (A) =
2a .
∈
a A
Example: Let n = 2. Then the bijection f : ℘(J 2 )
∅
{}
{}
→ J 4 is explicitly given by
f ( ) = 0, f ( 0 ) = 1, f ( 1 ) = 2, and f (J 2 ) = 3. 1.2. Finite Sets
set X is finite provided that n N so that X is equipotent to J n . A set is infinite provided that it is not finite. Definition. A
∃ ∈
× J n and the power set of J n are finite sets. Definition. If X is equipotent to J n we write |X | = n and say that “ X
We have just proved that J m has cardinality n”. In particular,
|∅| = 0.
4.4. Let X, Y be finite sets. Say X = m and Y = n. Let f : X be a function. Then
| |
Theorem
(1) (2) (3) (4)
| |
If f is injective, then m n. If f is surjective, then m n. If f is bijective, then m = n. If m = n, then f is injective iff f is surjective.
Proposition
≤ ≥
4.5. Let X and Y be sets, with X = m and Y = n.
| |
(1) The product set X
× Y has cardinality mn.
| |
→ Y
102
4. CARDINALITY
(2) The power set of X has cardinality 2m . Proof. (First
η : Y
Part) By hypothesis, there are bijections ξ : X
→ J n, and so
ξ
defined by
→
J m and
× η : X × Y → J m × J n
(ξ
× η)(x, y) = (ξ (x), η(y)) is a bijection. It is easy to check that ξ × η is a bijection. (“The product of bijections is a bijection.”) Thus the composition
× Y → J m × J n → J mn
X is a bijection. Thus X
| × Y | = mn. The second part is similar; if f : X → Y is a bijection, consider ℘(f ) : ℘(X ) → ℘(Y ) defined by ℘(f ) : A → f (A) for A ⊆ X . Check that ℘(f ) is a bijection, and then compose as in the first part.
→ J n.
4.6. Let X be a set, and suppose there is an injection f : X Then X is finite, and X n. Proposition
| | ≤
Proof. We
proceed by induction on n. If n = 0, then f : X implies that X = , so X = 0.
∅ | |
→ ∅, which →
Supposing veracity of the proposition for n, consider an injection f : X J n+1 . If f is surjective, then it is a bijection and so X = n + 1. Otherwise there exists j / f (X ). Thus we have an injection followed by a bijection:
| |
∈
→ J n+1 − { j } → J n,
X
| | ≤ n by induction.
and so X
4.7. Any subset of J n is finite. Any subset of a finite set is finite. If X Y with Y finite and X = Y , then X < Y . In particular, a finite set is not equipotent to a proper subset of itself. If f : X Y is a surjection, and X is finite, then Y is also finite. Corollary
⊂
| | | | →
The reader familiar with linear algebra may appreciate the following theorem, which is analogous to Theorem 4.4: 4.8. Let V, W be finite-dimensional vector spaces, with dim V = m and dim W = n. Let f : V W be linear. Then Theorem
→
(1) (2) (3) (4)
≤ ≥
If f is injective, then m n. If f is surjective, then m n. If f is bijective, then m = n. If m = n, then f is injective iff f is surjective.
1. FINITE AND INFINITE SETS
103
All linear maps on finite-dimensional vector spaces are the following, up to notation: Let A be a matrix with n rows and m columns, and let L A (v ) = Av (matrix-vector multiplication). Then LA : Rm Rn is a linear map, and thus the previous theorem applies. There are numbers called rank and nullity associated to matrices. The rank of A is n iff L A is surjective, and the nullity of A is 0 iff L A is injective. In fact, the theorem above is deduced from the “rank-nullity theorem” (which says that rank(A) + nullity(A) = m).
→
1.3. Some Combinatorics
| |
Proposition 4.9 .
| |
Let X and Y be finite sets, with X = n and Y = m. Suppose there is a map f : X Y whose fibres all have the same cardinality d. Then n = dm.
→
∈ Y , there is a bijection ϕ y : f −1(y) → J d. Define a map X → Y × J d by Proof. For
each y
F (x) = (f (x), ϕf (x) (x)).
Thus, if f (x) = y, then F (x) = (y, ϕy (x)). We claim that F is a bijection. Suppose that F (x) = F (x ). Then f (x) = f (x ) = y, so x and x are both in f −1 (y). The fact that F (x) = F (x ) also gives ϕy (x) = ϕy (x ). Since ϕy is injective, we have x = x . Therefore F is injective. To show that F is surjective, let y Y and j J d . Let x be the element in the fibre of y which maps to j under ϕ y ; it is easy to see that F (x) = (y, j).
∈
∈
Thus X = Y
| | | × J d| = md.
[Application to binomial coefficients.] Proposition 4.10.
→ Y
Let X and Y be finite sets. Suppose there is a map f : X whose fibres all have cardinality less than or equal to d. Then X d Y .
| | ≤ | |
Proof. Exercise.
The contrapositive of this is: Proposition 4.11. (Pigeon-hole
d Y for some integer d have f −1 (y) > d.
| |
|
|
≥
Principle)Let X and Y be finite sets. If X > 0, and f : X Y is a map, then for some y Y we
→
| | ∈
Let X be a set of pigeons, and Y be a set of holes. We can think of “putting pigeons in holes” in terms of a function from X to Y . The “fibre over a hole” is the set of pigeons put into that hole. If X > d Y , then there must be at least one hole with more than d pigeons.
| |
| |
104
4. CARDINALITY
1.4. Unions, Intersections, and Coproducts Definition.
Let A, B be subsets of a set X . The coproduct of A and
B, is the subset of X × J 2 given by A B = (A × {0}) ∪ (B × {1}). Note in particular that X X = X × J 2 . There is a natural injection iA : A → A B given by a → (a, 0) and similarly an injection i B : B → A B. There is a natural surjection p : A B → A ∪ B defined by p(a, 0) = a and p(b, 1) = b. Note that p is a bijection iff A ∩ B = ∅, and that p ◦ iA = idA and p ◦ iB = idB . Proposition 4.12. Let m, n ≥ 0. The coproduct J m J n is equipotent to J m+n . B, written A
Proof. A
bijection is given by f ( j, 0) = j and f ( j, 1) = m + j.
Corollary 4.13. The Corollary 4.14.
Proof.
coproduct of two finite sets is finite.
Let A, B be finite subsets of a set X . Then A
As above, there is a surjection from the finite set A
∪ B is finite.
B to A ∪ B.
• A B ∼ B A. • (A B) C ∼ A (B C ).
Proposition 4.15.
∈
Let X and I be sets, and for each i I , let a subset X i of X be given. The union of the sets X i is the subset i∈I X i defined by Definition.
{ ∈ X | ∃i ∈ I s.t. x ∈ X i}
X i = x
∈
i I
[Example] Let X and I be sets, and for each i I , let a subset X i of X be given. The intersection of the sets X i is the subset i∈I X i defined by X i = x X i I, x X i
∈
Definition.
{ ∈ | ∀ ∈
∈
i I
∈ }
[Example]
∈
Let X and I be sets, and for each i I , let a subset X i of X be given. The coproduct of the sets X i , written i∈I X i , is the subset of X I given by (X i i ). i∈I X i = Definition.
×
∈
i I
×{ }
1. FINITE AND INFINITE SETS
105
1.5. Infinite Sets Proposition 4.16.
(1) (2) (3) (4)
Let X be a set. The following are equivalent:
X is infinite. There is an injection from N to X . There is a surjection from X to N. X is equipotent to a proper subset of itself. (Dedekind’s Criterion)
⇒
Proof. For
| | → →
(2) (1), suppose that X is finite, with X = n, and there is an injection from N to X . Then there is an injection J n+1 N X J n , contradicting [earlier].
⇒
→ ⇒
→
→
For (2) (3), let f : N X be an injection, and r : X N a retraction of f , meaning that r f is the identity on N. Then r is necessarily a surjection from X to N. The proof of (3) (2) is similar.
◦
For (1) (2), suppose that X is infinite. We will recursively define an injection from f : N X , as follows. Since X is nonempty, some x 0 X . Put f (0) = x 0 . Now suppose that an injection f : J n X is given. If f is a surjection, then it is a bijection contradicting the infinitude of X . Otherwise, there is some xn X , and we put f (n) = x n . Thus we recursively have distinct elements f (n) for all n N; the resulting f is an injection.
⇒
→
∈
→
∈
∈
By the previous corollary, (4) implies (2). For (2) (4), let f : N X be an injection. Let xn = f (n) for all n and Z = X x0 , x1 , . . . . We define a bijection g : X X x0 so that g is the identity on Z and g(xn ) = xn+1 .
⇒ − {
}
→
→ − { }
1.6. Exercises
≥
→ × →
(1) Let n 0. Suppose that a function f : J n J n is an injection. Prove that it must also be a surjection. (2) Let m, n 0. Prove that the map f : J m J n J mn given by f (a, b) = na + b is a bijection. Do this both by proving injectivity, surjectivity, and also by giving an explicit inverse. (3) Let n 0. Prove that the map f : ℘(J n ) J 2n given by f (A) = a∈A 2a is a bijection. (4) Let A, B,C,D be nonempty sets, and let f : A B and g : C D be two maps. Consider the product map f g : A C B D given by (f g)(a, c) = (f (a), g(c)). Prove that f and g are injections iff f g is an injection. Similarly for surjection and bijection. (5) Let f : A B be a map. Define in a natural way a map ℘(f ) : ℘(A) ℘(B). Prove that f is an injection iff ℘(f ) is an injection, and similarly for surjection and bijection. (6) Prove that the power set of a finite set is necessarily finite.
≥
≥
→
×
×
→
→ → × → × ×
→
106
4. CARDINALITY
→
(7) Prove that if X is a finite set and f : X Y is a surjection, then Y is also a finite set, and Y X . This was sketched in class; please fill in the details. (8) Prove that if X is a finite set, and Y X but Y = X , then Y < X . (9) Proposition 4.10 (10) Let A, B be sets. Then there is a set A B and two maps p A : A B A and p B : A B B with the following (”universal”) property: If C is a set, and f A : C A, f B : C B are maps, then there is a unique map f : C A B so that p A f = f A and p B f = f B . (11) Let A, B be sets. Then there is a set A B and two maps i A : A A B and i B : B A B with the following (”universal”) property: If C is a set, and f A : A C , f B : B C are maps, then there is a unique map f : A B C so that f iA = f A and f iB = f B . (12) Given a natural number n, write nZ for the set of integer multiples of n. Describe the intersection n∈N nZ. (13) Given a natural number n, write n1 Z Q for the set of integer multiples of the fraction n1 . Describe the union e∈N 21e Z. What do these numbers look like in binary? (14) Describe the sets n∈N r∈Q (r n1 , r + n1 ) and r∈Q n∈N (r n1 , r + n1 ). Here (a, b) denotes the set of real numbers x with a < x < b. (15) Show carefully using the definitions in class that the coproduct X X is equal to X J 2 . (16) Give an explicit bijection from the coproduct J m J n to J m+n . (17) Prove carefully that the coproduct of two finite sets is finite, assuming the previous exercise. (18) Draw a picture of the coproduct n∈N nZ as a subset of Z N. (19) Prove that if X and Y are sets, then the coproducts X Y and Y X are equipotent. Let be the set of sequences, where each term is either 0 or 1. For instance, (0, 1, 1, 0, 0, 0, 1, 1, 0, 1, . . .) .
| | ≤ | |
× → × → →
⊂ ×
→ → →
◦
→
→
◦
| | | | × →
◦
→
◦
⊂
−
×
−
×
B
B
∈ B
B × B . B ⊂ R.
(a) Prove that is equipotent to (b) Find a surjection from to [0, 1]
2. Countable Sets
say that a set X is denumerable provided that it is equipotent to N. We say that X is countable provided that it is either finite or denumerable. We say that X is uncountable otherwise. Definition. We
If X is denumerable we write X = has cardinality aleph-naught”. Definition.
The following sets are denumerable:
• {2, 3, 4, . . .} • {2, 4, 6, 8, . . .}.
| | ℵ0 and say that “ X
2. COUNTABLE SETS
107
• Z. Can you find bijections to N? Note that if a set X is denumerable, this means that the elements of X can be expressed as a sequence x1 , x2 , . . . of distinct elements.
{
Proposition 4.17.
Let n
}
≥ 1. Then the product N × J n is denumerable.
Proof. A
g : N
→ N×
bijection f : N J n is given by
× J n → N is given by f (m, r) = mn + r. Its inverse
g(M ) = ((M
− r) ÷ n, r),
where r is the remainder of M upon division by n. Corollary 4.18. The
product of a denumerable set and a nonempty finite set is
denumerable. Proposition 4.19. Proof. A
Z is denumerable.
bijection f : N
× J 2 → Z is defined by the rule f (n, 0) = n − 1, f (n, 1) = −n
What is the inverse? Proposition 4.20.
N
× N is denumerable.
Proof. A
g : N
→ N×
bijection f : N N is given by
× N → N is given by f (a, b) = 2a(2b + 1). Its inverse
g(n) = (ord2 (n), n
÷ 2ord (n) ). 2
Corollary 4.21. The
product of two denumerable sets is denumerable. The product of two countable sets is countable. Proposition 4.22. An Proof. We
infinite subset X of N is denumerable.
define a function f : N
→ X recursively by putting
f (1) = min X, and given f (1), f (2), . . . , f (n), put f (n + 1) = min(X
− {f (1), f (2), . . . , f (n) }.
→ N could be
It is straightforward to see that f is a bijection. Its inverse g : X given by the prescription g(x0 ) = Corollary 4.23.
|{x ∈ X | x ≤ x0}|.
Let Y be a countable set. Then any subset of Y is countable. If there is an injection from a set X to Y , then X is countable. If there is a surjection from Y to X , then X is countable. The union of two countable sets is countable.
108
4. CARDINALITY
Corollary 4.24. The
set Q of rational numbers is denumerable.
×
Proof. Since Z is
denumerable, the product Z N is also denumerable. The function f : Z N Q given by f (z, n) = nz is clearly surjective, and so Q is countable. Since N is a subset of Q, it must be that Q is infinite. Therefore Q is denumerable.
× →
Proposition 4.25.
map f : X
→
Let X and Y be sets, with Y countable. Suppose there is a Y whose fibres are all countable. Then X is countable.
Proposition 4.26.
i∈I X i is
If I is countable, and each X i is countable, then
countable.
Proof. Since each X i is countable, there is an injection f i : X i N. We may therefore define a map F : i∈I X i N I via the prescription F (x, i) = (f i (x), i). Let us check that F is an injection. Suppose that (x, i) X i i and (y, j) X j j , and F (x, i) = F (y, j). Then (f i (x), i) = (f j (y), j), so i = j. We get f i (x) = f i (y), and since f i is an injection we conclude that x = y.
→
→ ×
∈ × { }
× { }
Since there is an injection of i∈I X i is countable.
∈
i∈I X i into the countable set N × I , we conclude that
Corollary 4.27.
If I is countable, and each X i is countable, then
countable. Proof. The
→
∈
i I X i
is
map p : i∈I X i i∈I X i defined by p(xi , i) = x i is a surjection. Since the domain is countable, the codomain must be as well. Example: The set Z[x] is denumerable: Let Z[x]d be the set of polynomials of
≤ d. It is clearly equipotent to Z d+1 and hence denumerable. Now
degree
Z[x] =
∞
Z[x]d ,
i=0
and is therefore denumerable.
To appeal to the corollary, one often says “a countable union of countable sets is countable”. 3. Uncountable Sets 3.1. Uncountability of R Theorem 4.28.
R is uncountable.
3. UNCOUNTABLE SETS
109
Proof. (Cantor’s Original Proof) Suppose it were countable. Then we could express R as a sequence x1 , x2 , . . . . Consider the decimal representation of these numbers. Form a new real number x by making its integer part 0, and for its nth decimal place, look at the nth decimal place of x n and change it to the next bigger decimal place. We claim that this number x we have just formed is not in that sequence. It’s not equal to xn because their nth decimal places differ. This is a contradiction.
{
}
Cantor mailed his proof to another mathematician, Dedekind. Dedekind pointed out a subtle flaw in his argument: Some numbers have more than one decimal representation, like 0.99999 . . . and 1.000 . . .. So it’s possible that the number x that’s formed in the proof really is equal to some xn , even though the digits be different. This flaw is easily remedied. It suffices to ensure that none of the digits of x are 0 or 9. Proof. (Cantor’s Fixed Proof) Suppose it were countable. Then we could express R as a sequence x1 , x2 , . . . . Consider the decimal representation of these numbers. Form a new real number x by making its integer part 0, and for its nth decimal place, look at the nth decimal place of xn and change it to the next bigger decimal place. Except, if the nth decimal place of xn is 8 or 9, we change it to 1 instead. We claim that this number x we have just formed is not in that sequence. It’s not equal to xn because their nth decimal places differ. This is a contradiction.
{
}
3.2. Uncountability of ℘(N) Theorem 4.29.
Let X be a set. There is no surjection f : X
Proof. Suppose
→ ℘(X ).
there does exist such a surjection. Let
R = {x ∈ X | x ∈/ f (x)} Since f is surjective, ∃r ∈ X so that f (r) = R. Here is the Question: Is r ∈ R? If yes, then r ∈ / f (r) = R, a contradiction. If no, then r ∈ f (r) = R, a contradiction. Thus anyway we have a contradiction, and we conclude that there does not exist such a surjection. Corollary 4.30. The
set of subsets of N is uncountable.
Theorem 4.29 implies that there is no bijection from any set to its power set. In some sense it means that, even if X is infinite, ℘(X ) is still “bigger”. Consider the following statement:
• If X is an uncountable subset of R, then X is equipotent to R.
110
4. CARDINALITY
This statement is called “The Continuum Hypothesis”. It basically asks if there is a set bigger than N but smaller than R. It is a very interesting question whether the Continuum Hypothesis is true or false. A reasonable person might say it was proved by the logicial Kurt G¨odel, but logicians would say that he and later Paul Cohen merely “resolved” the situation. More specifically, they proved that it is “independent of the axioms of set theory”. We will not say more here, but refer the reader to [ 3]. 3.3. Existence of Transcendental Numbers
I would like now to give an application to the theory of algebraic numbers.
∈
Let α C. We say that α is an algebraic number, provided that there is a nonzero polynomial p(x) Z[x] so that p(α) = 0. We say that α is transcendental otherwise. Write Q for the set of algebraic numbers. Definition.
Lemma 4.31.
Let p
is finite.
∈
∈ Z[x] be a nonzero polynomial. Then the set Z ( p) = {α ∈ C | p(α) = 0}
This follows from Theorem 7.34; please accept it for now to get to our application. Proposition 4.32. The Proof.
set of algebraic numbers is countable.
Let I = Z[x]
− {0}.
Then Q=
Z ( p)
∈
p I
is a denumerable union of finite sets, and is therefore countable. Corollary 4.33. There Proof. Since R is
exist transcendental numbers.
uncountable, it cannot be that R
⊆ Q.
3.4. Exercises
(1) Express the set [0, 1] in R as a countable intersection of open intervals in R. (2) Which of the following sets are countable? Explain. (a) Z3 , the set of triplets of integers. (b) (0, 1), the set of real numbers between 0 and 1. (c) The set of real numbers with terminating decimal expansions. (d) The set of square roots of positive rational numbers. (e) The set P of polynomials whose coefficients are all 0s and 1s. For example x 67 + x4 + x, x3 + 1 P . (f) The set of all sequences, where every term is either 1 or 1. For example, (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, . . .). (This set arises in probability theory modeling coinflips.)
− − −
∈ −
−
4. INTERLUDE ON PARADOXES
111
(g) The set of isosceles right triangles in R 2 . (h) The set of sequences (a0 , a1 , a2 , . . .) of real numbers which satisfy an+2 = a n + an+1 for all n 0. (i) The set of solutions to the differential equation f = f . (j) The set of square roots of natural numbers (positive and negative). (k) S 1 = (x, y) R2 x2 + y 2 = 1 . (l) The set of integers which are congruent to 2 modulo 3. (m) The set of real numbers with nonrepeating decimal expansion. (n) The set of sequences of natural numbers. (o) The set of all sequences of natural numbers which are eventually constant. (p) The set of all bounded sequences of natural numbers. (q) The set of finite subsets of N. (r) The set of rational functions (with integer coefficients), i.e. those of x2 +7x−4 the form p(x) q(x) with p, q Z[x] and q = 0. For instance x11 −2 is a rational function. Give an example of a function f : (0, 1) [0, 1] which is bijective. Let X and Y be sets, with Y countable. Suppose there is a map f : X Y whose fibres are all countable. Prove that X is countable. Make a sketch of the Cantor set. This is the set of numbers in [0 , 1] whose base-3 decimal expansion doesn’t have any 1s. (Start with the interval [0, 1]. The middle third is the set of numbers whose first digit base 3 is equal to 1. So erase the middle third; you are left with two intervals. The middle thirds of these intervals are numbers whose second digit base 3 is 1. So erase those as well. Continue...) Is this a countable set? Let us say a “book” is a finite list of ASCII symbols, including letters, spaces, fullstops, etc. Is the number of possible books countable? Say a real number is definable if one can specify it with a finite number of words. For instance, “the positive square root of two” is a satisfactory definition of 2. Is every rational number definable? Is every real number definable? In the spirit of the previous problem, is every subset of N definable? What is “the smallest positive integer not definable in fewer than twelve words”? Let X be an uncountable set, and Y X a countable subset. Prove that X Y is equipotent to X . Prove that the closed interval [0, 1] is equipotent to R. Find a surjective map from ℘(N) to [0, 1] R. What are the fibres of your map? Prove that ℘(N) and R are equipotent. (Suggestion: use the two problems above.)
≥
{
∈ |
}
∈
(3) (4) (5)
(6) (7)
→
→
√
(8) (9) (10) (11) (12)
−
⊂
⊆
4. Interlude on Paradoxes
We are soon going to run into some dangerous logical territory, with proofs by contradiction on the verge of creating paradoxes in mathematics. This section is likely to delight many of you but horrify the rest. Let’s begin.
112
4. CARDINALITY
4.1. The Liar’s Paradox
A great interview question, if you don’t like the candidate, is “Is the answer to this question, ‘No’?”. If they say ‘Yes’, then they haven’t answered the question properly. If they say ‘No’, then the answer to the question was not ‘No’, so they have not given the right answer. This paradox is a variation of the Liar’s Paradox, which is, “I am lying.” Is that true or false? Again, both answers lead to a contradiction. For us, the appropriate version is: This statement is false. Let us call this statement ‘P ’. Thus P P and P P . Do you remember earlier when I said: In mathematics, every statement is true or false? It seems that this must be a statement P so that both P P is true.
⇒¬ ∧ ¬
¬ ⇒
∧¬
This is very bad. Suppose that one has a statement P so that both P P is true. That is, a contradiction in mathematics. Let Q be any other statement . Then by the tautology P (P Q), Q must be true. Since Q was arbitrary, every possible statement is true. This sounds like a disaster for mathematics.
¬ ⇒ ⇒
How does one resolve this quandary? Well first of all, what I actually said earlier was: In mathematics, every well-formed statement is true or false. This and subsequent paradoxes point to the need to more rigorously define the notion of “well-formed”. When one studies the subject of Logic, one takes great pains to say exactly what is meant by this. For instance, Srivastava’s book [ 12], starts by defining what different kinds of symbols are, what terms are made up of these symbols, what formulas are made up of terms, etc. etc. The method is recursive, and one does not see a way to form a self-referential statement. We will not give these details in these notes, but the resolution is essentially that self-referential statements are not well-formed. Here are some similar paradoxes of this nature to enjoy:
• “The next statement is true. The previous statement is false.” • “‘Yields falsehood when preceded by its quotation’ yields falsehood when preceded by its quotation.” (Quine’s paradox) • “If this statement is false, then you are a zombie.” (Curry’s paradox) 4.2. The Grelling-Nelson Paradox
Next is a linguistic paradox. I do this because I don’t want you to go thinking that paradoxes are entirely the fault of mathematicians... Definition. An
adjective is called autological if it describes itself. It is called heterological if it does not describe itself. For example, the word “noun” is a noun. So the word “noun” is autological. The word “verb” is not itself a verb, so the word “verb” is heterological.
4. INTERLUDE ON PARADOXES
113
Here are some autological words: pentasyllabic, english, awkwardnessful, cutesy, erudite Here are some heterological words: bisyllabic, incomplete, tree, red, long (Don’t take this too seriously. Of course with many adjectives it is a grey area whether they are one or the other.) So here is the question: Is ‘heterological’ a heterological word? Think about it... Well if it is , then ‘heterological’ doesn’t describe itself. Which means that it must be ‘autological’. Which means that it must describe itself. Contradiction! But if it isn’t , then it does describe itself. So it must be ‘heterological’. Which means it must describe itself. Contradiction either way! Therefore there is no ‘correct’ yes or no answer to the question. Too bad for linguistics. Berry’s paradox is both mathematical and linguistic:
Let n be “the smallest natural number not definable in fewer than twelve words”. But we just defined it with eleven words!
4.3. Russell’s Paradox
Let denote the set of all sets, sometimes called the Universal Set. Remember that the empty set is the set so that x, x / . Well, is the set so that x, . Pretty simple to understand, right? One curious thing you’ll notice about x is that it is an element of itself, meaning . Can you think of other sets like that? We have the set of infinite sets, say, = A A is infinite ; certainly is infinite, so ? Does the set of abstract thought qualify?
S ∈ S S
∅
∀
∈∅ S S ∈ S I { ∈ S |
∀
} I I ∈ I Let R = { A ∈ S | A ∈ / A}. This is the set of sets which are not members of themselves. For instance N ∈ R , since N ∈ / N. (Of course N ⊆ N, but that is a different thing.)
R ∈ R ? Well, let’s look at both cases. If R ∈ R, then R ∈ / R. On the other hand, if R ∈ / R, then R ∈ R...Both ways lead to a contradiction! Here’s a way to think about it: S is not well-formed, because if it is itself to be a set, Here is the question: Is
then its definition is again self-referential. So not everything you can name qualifies as a set. This paradox is very important historically; it called for a profound reexamination of what qualifies as a “set”. Since sets are the foundation for all of modern mathematics, many logicians worked very hard to articulate exactly what should or what shouldn’t be a set. In fact, we now no longer allow any sets to be members of themselves. So there is no “set of all sets”, or “set of all infinite sets”. We don’t even have a “set of all finite sets”, or indeed a “set of all singletons”. I’ll show you soon how that last one leads to a paradox.
114
4. CARDINALITY
Most mainstream mathematicians accept the Zermelo-Fraenkel Axioms to describe what should and shouldn’t be a set, just like the Peano Axioms told us what N should be. If one follows this theory, then one starts with the empty set, and then allows taking of power sets, product sets, and subsets defined by well-formed conditions. At no point does one reach anything like the “set of all sets”. We don’t believe that these axioms can lead to a contradiction, but indeed noone has proved that they don’t!
4.4. The Singleton Paradox
Ready for more? Here’s a hair-raising one, that destroyed a monumental work of Frege, one of the fathers of logic. Frege had a theory of number that came before Peano. He had a wild idea for defining numbers. He said that the number “n” should be the set of all sets X with X = n. For instance “1” should be the set of all singletons. (By “singleton” I mean a set with only one element.) When the set-theory paradoxes appeared, one of them undermined his opus “Grundgesetze der Arithmetik”. We will describe this now.
| |
Let be the set of all singletons. Observe that every set X injects into , because one can define f : X to be f (x) = x . In particular, if we put X = ℘(S ) we obtain an injection f : ℘(S ) S .
S
→ S
→
{}
S
→ B an injection. Prove that f → A so that g ◦ f = idA. In particular, g is a surjection. By the exercise, there is a surjection g : S → ℘(S ), which contradicts the theorem Exercise: Let A, B be nonempty sets, and f : A
has a “left inverse” g : B above.
The resolution of this paradox is that there is no set of all singletons! This kind of paradox is quite alarming, because there’s noting obviously “self-referential” in the formation of S . Exercise: Let n
∈ N any number. Show that the notion of “the set of all sets with
n elements” leads to a paradox.
4.5. Unadoxes: Just for Fun
Most paradoxes of the sort mentioned above have a corresponding “unadox”. Recall that a paradox is a statement, which when assigned either truth value gives a contradiction. A unadox is a statement, which when assigned either truth value, does not give a contradiction. Each unadox could be true or false; there is no way to tell, but neither true nor false would give any contradiction. It is associated to the paradox by changing a key “false” somewhere (maybe implicit) to a ”true”. Example: The Liar’s paradox is: ”This statement is false.” The corresponding
unadox is: ”This statement is true.” If you declare that it is true, that is consistent because the statement is then true. If you declare that it is false, that is consistent because the statement is then false.
6. CHAPTER 4 WRAP-UP
115
Do you get the idea? Can you find a corresponding unadox for the following paradoxes should be?
• “Is the answer to this question ‘No.’?” • “The next statement is true. The previous statement is false.” • “‘Heterological’ is a heterological word.” (Grelling-Nelson paradox) • “Is the set of all sets which are not members of themselves a member of itself?” (Russell’s paradox) • “‘Yields falsehood when preceded by its quotation’ yields falsehood when preceded by its quotation.” (Quine’s paradox) • “If this statement is false, then you are a zombie.” (Curry’s paradox) 5. Some History
1800 BCE Babylonian tablet with “Pythagorean triples” 600 BCE A Cretan says, “The Cretans are always liars” 529 BCE Pythagoreans prove that there are irrational quantities 285 BCE Publication of Euclid’s Elements 400-500 Chinese Remainder Theorem 415 Death of Hypatia & of Classical Period 458 Place Value System in India 820 Al-Khwarizmi develops algebra 1350-1425 Madhava: infinite series 1673 Leibniz coins the word “function” 1680 Newton’s Principia Mathematica 1834-7 Dirichlet and Lobachevsky clarify the definition of “function” 1873 Cantor’s Theory of Sets and Cardinality 1889 Peano’s Axioms 1902 Russell’s Paradox 1908-1922 Zermelo-Fraenkel’s Axioms of Set Theory 1931 G¨odel’s Incompleteness Theorem ( See page 26 of Krantz’s book [ 6] for much more.) Sometimes we have precise dates, sometimes that information is lost... 6. Chapter 4 Wrap-up 6.1. Rubric for Chapter 4 6.2. Toughies for Chapter 4
≥
(1) Let m, n 0 be whole numbers. Prove the following: (a) There is a linear injection from R m to R n iff m n. (b) There is a linear surjection from R m to R n iff m n. (c) There is a linear isomorphism from R m to R n iff m = n. (d) A linear map f : Rn Rn is an injection iff it is a surjection. (Note: R0 = 0 .)
≤ ≥
{}
→
116
4. CARDINALITY
(2) Is there a countable group, with uncountably many subgroups? (3) For the following sets, which are denumerable? Which are equipotent to R? Which are equipotent to ℘(R)? (a) ℘(N) (b) Seq(N) [Hint: Learn what a “continued fraction” of a real number is.] (c) R2 (d) The set of bijections from N to N. (e) The set of open subsets of the real line.
CHAPTER 5
Equivalence
117
118
5. EQUIVALENCE
1. Equivalence Relations
There are some basic “axioms” of equality which we use all the time, usually without thinking. No matter what elements a, b, c we have of some set, we always have:
• a = a, • a = b ⇔ b = a, • ((a = b) ∧ (b = c)) ⇒ (a = c). In other words, the equality relation (which we called ∆X earlier), is reflexive, symmetric, and transitive. Often in life and in mathematics, we want to formulate some notion of “equivalence” for various things, meaning that we want to treat things as the same if they satisfy some criterion. The world is a big place, and simplification makes our lives easier. For instance, we might consider triangles equivalent if they have the same angles. This relation is called similarity. Or we might consider triangles equivalent if they have the same side lengths, even though their position in space may differ. That relation is called congruence, and it is a stronger condition than similarity. When should a relation count as an equivalence? Let R be a relation on a set X . Then R is an equivalence relation provided that it is reflexive, symmetric, and transitive. Definition.
Similarity and Congruence of triangles are clearly equivalence relations. Let’s do an example where we have to work to prove it.
∈
Let n Z. We say that two integers a, b mod n provided that n (a b). Definition.
Proposition
| −
∈ Z are congruent
5.1. Congruence mod n is an equivalence relation.
| − − − ∈ | −
∈
Proof. For reflexivity, we note that n (a a) for any a Z. For symmetry, let a, b Z with n (a b). Then since b a = ( 1)(a b), we also have n (b a). This proves symmetry. For transitivity, let a, b, c Z with (n (a b)) (n (b c)). Then certainly n (a b + b c); that is n (a c). This proves transitivity.
∈
| − | −
−
−
| − | − ∧ | −
Note that if n = 0, then equivalence mod n is the same as equality. If n = 1, then any a, b are equivalent mod n.
1.1. Partitions
How would a graph of vertices and edges look if it gives an equivalence relation on the vertices? Since it is reflexive, there will be a loop at every vertex. Since it is symmetric, every edge will be simple. Transitivity is the interesting property. If there is a (simple) edge connecting v 1 to v 2 and another connecting v 2 to v 3 , then there must be a third edge connecting v 1 to v 3 . [Picture]
1. EQUIVALENCE RELATIONS
119
Try drawing a graphs with this transitivity property. You’ll quickly find that your graphs are all disjoint unions of complete graphs. That is, no two of them intersect. Here is an example of such:
b
h
c d
f
a This graph on the vertices complete graphs. Definition.
g
j
e
i k
{ a,b,c,d,e,f,g,h,i,j,k} is a disjoint union of three
Let X 1 , X 2 be subsets of a set X . We say that X 1 and X 2 X 2 = .
are disjoint provided that X 1
∩
∅
These complete graphs illustrate what are called equivalence classes. Let X be a set with an equivalence relation , and x X . The equivalence class of x is the set of elements equivalent to x, i.e., y X y x . It is written [x].
∼
Definition.
∈
{ ∈ | ∼ }
The triangle in the graph above can be expressed as [ a] or [b] or [c]. And so [a] = [b] = [c]. The entire graph is the union X = [a] [d] [f ] of three equivalence classes. This union can also be expressed in other ways, for instance: X = [b] [e] [f ] of course.
∪ ∪
∪ ∪
These three equivalence classes form what is called a “partition” of X . Let X be a set. A partition of X is a family of pairwise disjoint subsets of X whose union is X . These sets are called parts. Definition.
We will prove that if X is a set with an equivalence relation, then the set of equivalence classes form a partition of X . As another example take X = Z and the “mod 2” equivalence relation above. The corresponding partition of Z is Z = odd numbers
{
} ∪ { even numbers}
The set of odd numbers is the equivalence class containing 1. It is also the equivalence class containing 13. In modular arithmetic we usually write x for [x]. So in mod 2 equivalence 0 denotes the set of even numbers, and 1 the set of odd numbers. But it is also true that 4 is the set of even numbers, since any number equivalent to 4 is even. Thus 0 = 4. Similarly, 13 = 1.
−
−
Remark: When using this symbolism, it is important that the mod 2 is understood. Context lets you know that this is not mod 3, for instance. 5.2. Let X be a set with an equivalence relation alence classes [x] form a partition of X . Proposition
∼. Then the equiv-
120
5. EQUIVALENCE
∈ ∈
Proof. Reflexivity
says that x [x], which shows in particular that X = [x]. Symmetry says that if x [y], then y [x]. Suppose this is so. Then if x X z is something else in [x], by transitivity, z [y]. Since everything in [x] is in [y], we write [x] [y]. The same argument starting with z [y] concludes that z [x], thus [y] [x]. This shows that [x] = [y]. Conclusion: if x [y], then [x] = [y]. This implies that equivalence classes don’t overlap. If any element z were in the overlap, i.e., z [x] and z [y], then [z] = [x] = [y]. Thus they are in fact the same class.
∈
⊆
∈
⊆
∈
∈
∈
∈
∈
∈
Conversely, if you have a partition of X , you can define an equivalence relation on it by saying x y if x and y are in the same part. Then the equivalence classes for will be the parts of the original partition of X .
∼
∼
1.2. Quotient Set
Let’s say you have a set with an equivalence relation. If you really think of equivalent things as being “the same”, in the sense that you replace x y with x = y, then you are shrinking the set X down to what is called the quotient set.
∼
Let X be a set and R an equivalence relation on X . We write X/R for the set of equivalence classes for R. Definition.
The set X/R is “the quotient of X by R”. In other words, X/R = [x] x
{ | ∈ X }. {
}
Example: Consider Z with the mod 2 equivalence relation. Then Z/R = [0], [1] .
Traditionally, we write Z/2Z for Z/R and so the previous sentence could also be written as Z/2Z = 0, 1 .
{ }
Example: More generally, let n
Z/R = Z/nZ = 0, 1, . . . , n by n.)
{
∈ N and consider congruence mod n. Then − 1}. (These are the possible remainders upon division
Example: Let I = [0, 1], the closed interval. Let me describe a partition of I with
{ }
infinitely many parts. If 0 < x < 1, then the singleton x will be a part. (A singleton is a set with exactly one element.) Other than these singletons, I declare 0, 1 to also be a part. This describes a partition. The equivalence relation corresponding to this partition is “x y provided that (x = y) (x = 0 y = 1) (x = 1 y = 0)”. The quotient set I / naturally forms a circle. The idea is that you can start at [0], move to the right through all the x (0, 1), and then end up at [1]. But [1] = [0], so you have wound up where you started from. Just like on a circle.
{ } ∨
∼
∧
×
∨
∼
∧
∈
Example: Let X be the square [0, 1]
[0, 1]. Consider the partition of X as follows: If 0 < x < 1 and y [0, 1], then the singleton (x, y) will be a part. Other than these, declare (0, y), (1, y) to be a part for every y [0, 1]. The equivalence relation corresponding to this relation really just says that we call the two vertical
{
∈
}
{
∈
}
1. EQUIVALENCE RELATIONS
sides of the square equivalent. The quotient set X/ Can you see how?
121
∼ naturally forms a “cylinder”.
Example: Let X be the square [0, 1] [0, 1]. Consider the partition of X as follows:
×
{
}
If 0 < x < 1 and 0 < y < 1, then the singleton (x, y) will be a part. Other than these, declare (0, y), (1, y) to be a part for every y (0, 1), declare (x, 0), (x, 1) to be a part for every x (0, 1), and declare (0, 0), (1, 0), (0, 1), (1, 1) to be a part. The equivalence relation corresponding to this relation really just says that we call the two vertical sides of the square equivalent, and also the two horizontal sides of the square equivalent. The quotient set X/ naturally forms a “torus”. Can you see how?
{
∈
}
∈
{
{ }
}
∼
Example: Let X be the square [0, 1] [0, 1]. Consider the partition of X as follows:
×
{
}
If 0 < x < 1 and 0 < y < 1, then the singleton (x, y) will be a part. Other than these, declare the union of all four sides of the square as one part. Thus, we “shrink” the entire boundary down to one point in the quotient set. In fact, The quotient set X/ naturally forms a “sphere”. Can you see how?
∼
How can you make a M¨oobius strip?
S
Example: Let
be the set of sequences of decimal digits s = (d1 , d2 , d3 , . . .) with each di a digit in base X =ten. Consider the following partition of : If s neither ends in repeating nines nor repeating zeros, then the singleton s will be a part.
{ }
S
Other than these, declare a pair of sequences of the form
{(d1, d2, . . . , dn, 9, 9, 9, . . .), (d1, d2, . . . , dn + 1, 0, 0, 0, . . .)} to be a part if dn = 9. (Actually we will need one more singleton {(9, 9, 9, . . .)}.) Thus we are really considering these sequences to be equal. The quotient set of S by this equivalence relation naturally forms the interval [0 , 1]. This can be used to construct the set of real numbers. Example: Let R[x] be the set of polynomials with real coefficients. Say that
polynomials f, g are equivalent provided that x 2 + 1 divides f (x) g(x). Believe it or not the quotient set is the definition of the complex numbers C. Why don’t you ponder that for a while?
−
1.3. Exercises
∈
∈
(1) Let n N. Prove that two numbers a, b N are equivalent mod n iff they have the same remainder upon division by n. (2) Let X be the set of functions from R to itself. Let g X , and write O(g) for the set of functions f X so that f (x) lim exists and is finite. x→∞ g(x)
∈
∈
Consider the relation f g provided that f O(g). Is this an equivalence relation? Explain. Now fix a function h X (for example, x 2 or e x ). Consider the relation f g if f g O(h). Is this an equivalence relation? Explain.
∼ ≡
− ∈
∈ ∈
122
5. EQUIVALENCE
2. The Positive Rationals Q+
We are about to make our first great leap in mathematical thought: the construction of (positive) rational numbers. One defect of the natural numbers is one can solve some division problems but not others. For example the theory does not include any meaning for dividing 1 by 2. When we buy apples in a grocery store this doesn’t cause any problem because we only need to add and occasionally multiply them. But when we want to share them with a friend or make muffins we may need to speak of fractional parts of apples. Now if all our recipes were written in terms of eighths of apples, for instance, we could do the following. We could write “1” for an eighth of an apple, 8 for a full apple, and multiply all our previous tallies by 8. This is unpleasant for several reasons, aesthetic and practical. And if I wanted to distribute an apple amongst a set of quintuplets I would be at a loss. In other words, we would like a logical system in which we can add, multiply and divide natural numbers.
2.1. Ratios
The solution to our problem is roughly like this: We form the set of division problems, decide which of them should be equivalent, and then “mod out” by this equivalence relation. We would like to do arithmetic, so we will need to spend some care defining addition and multiplication on these problems. We will denote by + the set of “positive ratios” (a : b), where a and b are natural numbers. Strictly speaking, a ratio is just an ordered pair of numbers, but you should think of them as division problems. So (a : b) is the division problem of a by b. Consider the ratios (2 : 1) and (4 : 2). Strictly speaking they ask two different questions, although they both should have the same answer “2”. We call two ratios proportional if they should have the same answer.
R
Definition. Ratios (x : y) and
Write (x : y)
(a : b) are proportional if x b = a y.
·
·
∼ (a : b) if they are proportional.
Although we have not defined fractions yet, you may intuitively think of the ratio x (x : y) as getting at the fraction ; this will help explain some of the formulas. For y x a example, it (x : y) (a : b) if = . y b
∼ Example: (2 : 6) ∼ (3 : 9); these ratios are proportional but not equal. Just to be clear, two ratios (a : b) and (c : d) are equal only if a = c and b = d. You should check that proportionality is an equivalence relation on Proposition
R+.
R+.
5.3. Proportionality is an equivalence relation on
Proof. Reflexivity:
Since x y = y x, we have (x : y)
·
·
∼ (x, y).
2. THE POSITIVE RATIONALS Q+
∼
·
123
·
·
·
Symmetry: Suppose that (x : y) (a : b). Then x b = a y, thus b x = y a and therefore (a : b) (x : y). We will henceforth not be so careful about the order of multiplication.
∼
∼
Transitivity: Suppose that (x : y) (a : b) and (a : b) two equations xb = ay, ad = bc.
∼ (c : d). Then we have the
Multiplying the first equation by c gives xbc = ayc. Using the second equation gives xad = ayc. Using the cancellation law for multiplication gives xd = yc. This implies that (x : y) (c : d).
∼
Now that we have an equivalence relation we can start thinking about equivalence classes, such as [(1 : 1)]. In fact (a : b) [(1 : 1)] exactly when a = b.
∈
[Explain what’s bad about
a b
+
c d
=
a+c b+d .]
We now present the proportion-invariant operations of addition and multiplication on + / .
R ∼
Addition is defined via (a : b) + (c : d) = (ad + bc : bd) and multiplication via (a : b) (c : d) = (ac : bd). Check that these operations are “ -invariant”. Thus they give operations on + / .
·
∼
R ∼ Definition. The set R+ / ∼, with the above addition and multiplication a + laws is called Q . We write b for the equivalence class [(a : b)].
Thus we have the familiar rules a c ad + bc a c ac + = and = . b d bd b d bd
·
One should check that Associativity, Commutativity, and Distributivity of Addition and Multiplication are satisfied. They follow from the corresponding properties of N. [include proof of associativity.]
2.2. Relationship with N
We would like to identify certain fractions as being integers. Here is how one formally makes this identification. Define a function ι : N Q + via ι(n) = n1 . (We want to identify n with n1 .) First note that if ι(m) = ι(n), then m = n, so we don’t lose any information by applying ι. Next, check that ι(m + n) = ι(m) + ι(n) and ι(mn) = ι(m) ι(n). Thus the identification “preserves addition and multiplication”. Another way of viewing this is to say that the addition and multiplication on Q+ “extend” the addition and multiplication on N. In summary, via ι, we may view Q + as an extension of N.
→
·
124
5. EQUIVALENCE
Note for instance, that
a b
∈ N exactly when b |a.
[Division.] Let p, q N. A ratio ( p : q ) is called reduced if p and q are relatively prime.
∈
Definition.
Proposition
5.4. Every ratio is proportional to a reduced ratio.
∈ N. Let d = gcd(a, b). It follows from Exercise 9 in Section ÷ d) = 1. It is easy to see that (a : b) ∼ (a ÷ d, b ÷ d).
Let a, b 9.3 that gcd(a d, b Proof.
÷
2.3. Exercises
For the first four exercises, use the equivalence class definitions.
(1) Check that the operations of addition and multiplication above are indeed -invariant. (2) In this exercise we will use the first quadrant of the Cartesian plane to “plot” sets of ratios. To a ratio (a : b) associate the point (a, b). (a) Plot the set of ratios equivalent to the ratio (1 : 2). It should be an infinite sequence of collinear points, and determines a line. What is its slope? (b) Plot the set of ratios which are of the form (1 : 2) (a : b), with a, b N. (c) Plot the set of ratios which are of the form (2 : 4) (a : b), with a, b N. (d) Plot the set of ratios which are of the form (1 : 2) + ( a : b), with a, b N. (3) Prove that if two ratios in + are proportional and reduced, then they must be equal. (Note we are only using positive numbers.) (4) Does the distributive law hold in Q+ ? Does it hold in the set + of positive ratios? Give proofs or counterexamples.
∼
∈ ∈ ∈
· ·
R
R
The rest of the exercises do not focus on the equivalence class definitions.
∈
∈
(5) (Improper Fractions) Let m, n N. Prove that there are numbers a, b N with b < n so that m b = a + . n n (6) Let p be a prime number, and a, n N. Prove that there are whole numbers b, a1 , . . . , an with
∈
n
a ai = b + , n p pi i=1 and each a i < p.
2. THE POSITIVE RATIONALS Q+
125
(7) Suppose a, b1 , b2 Q+ satisfy a < b 1 +b2 . Prove that there are a1 , a2 Q+ so that a = a 1 + a2 , a 1 < b1 , and a 2 < b2 . (8) Suppose a, b1 , b2 Q+ satisfy a < b1 b2 . Prove that there are a 1 , a2 Q+ so that a = a 1 a2 , a 1 < b1 , and a 2 < b2 . (See Exercise 13 in Section 9.7.)
∈ ∈
∈ ∈
CHAPTER 6
Rings
127
128
6. RINGS
1. Abstract Algebra
You probably learned a lot of algebra in school. You learned how to solve equations like ax + b = c, and quadratic equations, you learned how to combine terms, that you should add exponents when multiplying powers, you learned to solve systems of equations, et cetera. If you learned about imaginary and complex numbers, you didn’t have to relearn those rules or develop much more algebraic intuition. You had to learn the rule i2 = 1 and how to rationalize complex denominators, but you could still use all the skill from before.
−
If you’ve had some linear algebra, you know that square matrices of the same size can be treated much like numbers. They can be added, multiplied, raised to powers, and you can often solve an equation of matrices AX = B by multiplying both sides by A−1 . Again, much of what is true about the algebra of numbers is also true for matrices. Of course, much of the training of linear algebra is to be cautious with your intuition, since most of the time it is not true that AB = BA. You have to distill out which of your intuition comes from commutativity and which is independent of it. But you still want that overall algebraic intuition. In working with modular arithmetic, you can use a great deal of the intuition from those high school algebra days. As we saw, you can still add b to both sides of an equation to cancel a b. You can often divide both sides of an equation by a. If the modulus is odd, the quadratic equation still works basically the same way.
−
There are tons of these different algebra systems in mathematics, and we’re going to focus for this chapter on one type of algebra system called a ring. Examples of rings we have seen so far include Z, Q, R, and all of the Z/nZs for n > 1. Philosophically, knowing something is a ring means you can transfer a certain amount of algebraic intuition to studying it. More practically, you can prove lots of results in the context of abstract ring theory, and they will automatically be true for not only every ring you’ve ever met, but every ring you’ll meet in the future. In the next section we introduce the concept of a ring, and study the abstract idea of divisibility. In particular, when can the product of two things be 0 or 1? A field is a special kind of ring, when everything is divisible by everything else (except 0). We will discover ways to produce new rings from old. For instance, given a ring R, you can talk about the ring of matrices M n (R) whose entries are in R, and the ring of polynomials R[x] whose coefficients are in R. One of the greatest analogies in mathematics is that between the integers Z and the ring of polynomials F [x] where F is a field. The most important themes of Chapter 1 carry over, in particular the Fundamental Theorem of Arithmetic, in the sense that any nonzero polynomial factors into a product of irreducibles in essentially one way. I hope you will find many other similarities as well. Finally we demonstrate how to “mod out” in the general context of a ring. This is an interesting way to create new rings satisfying (almost) whatever relations you
2. RINGS
129
like. for instance we can force a ring to have an element x satisfying x 2 = leads to the complex numbers.
−1; this
2. Rings 2.1. Definition
Intuitively a ring is a place where you can add, subtract, and multiply. You can’t necessarily divide. That’s what you tell your friends. Of course, you need to define what you mean by “add, subtract, and multiply”, though the words are suggestive. A ring is a set R with two operations + and satisfying the following axioms. (1) (Associativity of Addition) For a, b, c R, (a+b)+c = a+(b+c). (2) (Commutativity of Addition)For a, b R, a + b = b + a. (3) (Additive Identity) There is an element 0R R so that for all a R, a + 0R = a. (4) (Additive Inverses) For a R there is an element a R so that a + ( a) = 0R . (5) (Associativity of Multiplication) For a, b, c R, (a b) c = a (b c). (6) (Multiplicative Identity) There is an element 1 R R so that for all a R, 1R a = a 1R = a. (7) (Distributivity) For a,b,c R, a (b + c) = a b + a c and (a + b) c = a c + b c. (8) (Nontriviality) 0R = 1R .
·
Definition.
∈ ∈
∈
∈
−
∈
∈
·
·
∈
·
· ·
∈
·
− ∈ · · · · ∈ · ·
So for instance, 0Z/nZ = 0 and 0Q = 10 . Definition. A
ring R is said to be commutative provided that it also
satisfies
• (Commutativity of Multiplication) For a, b ∈ R, a · b = b · a. Most of our rings will be commutative; rings of matrices will be the main examples of noncommutative rings. The first four axioms give the fundamentals of addition and subtraction. They imply you can always solve the x + a = b problem for x: If a, x, b R and x + a = b, then adding a to the right of both sides yields
∈
−
− − x + (a + (−a)) = b + (−a) x + 0R = b + (−a) x = b + (−a). (x + a) + ( a) = b + ( a)
Above we have used associativity, the property of additive inverses, and the property of the additive identity. That proof reminds me to make a definition:
130
6. RINGS
Definition.
Let a, b
∈ R. Then a − b is defined as a + (−b).
So we have our first result in pure ring theory: Proposition
6.1. If a, b, x
∈ R and a + x = b, then x = b − a.
I remark that we won’t generally be so pedantic about the use of parentheses or even mention that we’re using associativity of addition and multiplication; we did enough of that with the Peano Arithmetic. Also when convenient we will drop the dot in a b and write ab instead.
·
Note we have not included an axiom for a multiplicative inverse. This is intentional, and part of what makes ring theory interesting. More on that later. Here is another example of a proposition in pure ring theory. Proposition
6.2. Let x
∈ R. Then 0 R · x = 0R.
Proof. Since
0R is the additive identity, x+0 R = x. Multiplying this equation by x yields x x+0 R x = x x. By the previous proposition, 0R x = x x x x = 0R , as desired.
·
·
·
·
· − ·
I am now reminded to make another definition. Definition.
∈
Let R be a ring, a an =
R and n
a a am
·
∈ N. Then
if n = 1, if n = m .
The axiom of nontriviality really only excludes the trivial “ring”. This is because if 0R = 1R , then if x R, we have x = 1R x = 0R x = x by the above Proposition. By the way, this is what we would get if we considered Z/nZ for n = 1.
∈
·
·
Here are some more ring facts which the eager reader may enjoy proving. (1) (2) (3) (4) (5) (6)
For a ∈ R, −(−a) = a. − a = (−1R)a. If b ∈ R, then ab = (−a)(−b). am an = a m+n for m, n ∈ N. n
If R is commutative, then (ab) = a n bn . (am )n = a mn .
Remark: There are a couple different conventions about what axioms a ring should have. First, some authors do admit the trivial “ring”. Secondly it is sometimes interesting to study “rings” which don’t have a multiplicative identity, like the even integers. We will not pursue this. 2.2. Divisibility
In this section and the next R is a commutative ring.
2. RINGS
Let a, b R so that ac = b
Definition.
∃c ∈
∈ R.
131
|
We say a divides b, or a b, provided that
Basic properties of divisibility from Z carry over to any ring, for the same reasons. Proposition
∈ R. Then:
6.3. Let a,b,c,x,y
• If a |b and b |c, then a|c. • If a |b, then ac |bc. • If a |b, then a |bx. • If a |b and a |c, then a |bx + cy. • a|0R. Proof. Left
to the reader.
∈ R. Then a is a unit provided that a |1R. In other words, a is a unit if there is a b ∈ R so that ab = 1R . Definition.
Let a
element b in the above situation is called the inverse of a, and we write b = a −1 . Definition. The
Proposition
6.4. Let x,y, u
∈ R with u a unit. Then x|y ⇔ x|uy ⇔ xu|y.
Proof. We
prove one direction, saving the others for the reader. Suppose that x uy. Then there is an element c R so that xc = yu. Multiply both sides by u −1 . This gives x(cu−1 ) = y, and therefore x y.
|
∈
|
Units are invaluable in solving the ax = b problem, as you may recall from our study of this problem for modular arithmetic. If a is a unit, then the solution to the problem is x = a −1 b.
{ − }
The only units of Z are 1, 1 . Every nonzero element of Q and R are units. The units of Z/nZ are the congruence classes a where a and n are relatively prime. Note that 1 R and 1R are units in any ring. Since 0 R x = 0R for all x, 0R is never a unit. Sometimes it is the only nonunit of a ring, and such rings have a special name.
−
Definition. A
·
ring R is a field if every nonzero element of R is a unit.
Thus Q and R are fields. It is a good exercise to think through the following: Proposition
6.5. The ring Z/nZ is a field if and only if n is prime.
The ring Z is not a field. The number 2 does not have an inverse in Z. It doesn’t matter that in the bigger ring Q it has an inverse. Let R be a ring. Say that an element a to b provided that a b and b a. Definition.
|
|
∈ R is associate
132
6. RINGS
− ∈
For example in Z, 3 and 3 are associate. Note this is an equivalence relation on elements of R. If a R and u is a unit in R, then au is associate to a. Thus 2 and 2 = 4 Z/6Z are associate.
−
∈
2.3. Zero Divisors
Let’s recall how to solve an equation like x 2 5x + 6 = 0 in algebra. The normal thing to do is to factor it into ( x 2)(x 3) = 0 and then argue that either x 2 = 0 or x 3 = 0. One then concludes that the only solutions are x = 2 or x = 3.
−
−
−
−
−
The most interesting step here is the one that comes after the factoring. Why, exactly, is it that if two numbers multiply to 0 then one of them must be 0? In the ring Z/10Z, for instance, the elements 5 and 6 multiply to 0, though neither of them is itself 0 = 0 Z/10Z . And note that in Z/10Z, the congruence class 8 is another solution to x 2 5x + 6, in addition to 2 and 3. Are there any others?...
−
We need to keep track of when nonzero elements can multiply to be 0 R . nonzero element x of a ring R is a zero divisor if there is a nonzero element y R so that xy = 0R . Definition. A
∈
Thus an element is a zero divisor if it is part of a “nontrivial” factorization of 0 R . Note that 0R itself is not considered a zero divisor. Check that in Z/10Z the zero divisors are 2, 4, 5, 6, and 8. How do we know the other elements aren’t zero divisors? Proposition 6.6 .
Let R be a ring, and u a unit of R. Then u is not a zero divisor
of R. Proof. Suppose uy =
0R . Multiplying both sides by u −1 yields y = 0R .
For the general ring Z/nZ, we have an easy characterization: 6.7. Let n > 1 and 0 < a < n. Then a is a zero divisor of Z/nZ if and only if a is not relatively prime to n. Proposition
If a is relatively prime to n then a is a unit, thus not a zero divisor by the above. On the other hand, say d = gcd(a, n) > 1. Let e = (n d) < n. Then since a is a multiple of d, ae is a multiple of n, so ae = 0. But neither a nor e = 0. Thus a is a zero divisor. Proof.
÷
Here is a practical way to think about a zero divisors. Suppose a is not a zero divisor, and ab = ac. Then a(b c) = 0. This is a factorization of 0, so b c = 0 and thus b = c. So even though we didn’t use an inverse of a, since it was not a zero divisor, we could cancel it from both sides of the equation.
−
Rings without zero divisors are important.
−
2. RINGS
133
Definition. A
ring is called an integral domain if it does not have any zero divisors. Since units and 0R are never zero divisors, any field is an integral domain. The ring Z is an integral domain, but is not a field. So the converse does not hold. Proposition
6.8. Let n > 1. The ring Z/nZ is an integral domain if and only if
n is prime. The reader should think this through. Remark: A finite ring is an integral domain if and only if it is a field. Can you prove it?
2.4. Products of Rings
Here is a way to combine two rings. Let R and S be rings. Write R × S for the set of pairs {(r, s) | r ∈ R, s ∈ S }, with addition and multiplication on R × S defined “componentwise” via (r1 , s1 ) + (r2 , s2 ) = (r1 + r2 , s1 + s2 ) and (r1 , s1 ) · (r2 , s2 ) = (r1 · r2 , s1 · s2 ). For example, (Z/2Z) × (Z/2Z) has four elements which we will denote by 0 = (0 , 0), Definition.
e1 = (1, 0), e 2 = (0, 1), and 1 = (1, 1). Its addition/multiplication tables are: [Make tables.]
×
This is a different ring than Z/4Z! You can tell, because every element of Z/2Z Z/2Z added to itself is 0, whereas this is not the case for the ring Z/4Z. Later we will study more systematically what it means for two rings to be “different”. What are the units and zero divisors of Z/2Z
× Z/2Z?
In fact in general R S is a ring. [Check distributivity, as an example.] Its additive identity is 0R×S = (0 R , 0S ) and its multiplicative identity is 1 R×S = (1R , 1S ). The negative of (r, s) is ( r, s).
× − −
In general, is R
× S ever a field? An integral domain?
2.5. Rings of Functions
Here is another interesting way to derive new rings from old.
F
Definition. Let R be a ring, and X a set. Write (X, R) for the set of functions from X to R, with addition and multiplication defined “pointwise” as follows. If f and g are two functions with domain X and values in R, then f + g and f g are defined via (f + g)(x) = f (x) + g(x) and (f g)(x) = f (x)g(x) for every point of X .
·
·
134
6. RINGS
F (X, R) is in fact a ring.
[Check a ring property]. Its additive identity 0 F (X,R) is the constant function f 0 whose value at every point of X is 0R . In other words f 0 (x) = 0 for all x. Its multiplicative identity 1 F (X,R) is the constant function f 1 whose value at every point of X is 1R . In other words f 1 (x) = 1 for all x. The negative of a function f is the function defined by ( f )(x) = (f (x)).
−
− Let’s take X and R to both be the real numbers R. Then F (R, R) is just the set of real-valued functions with domain R.
The function f (x) = x 2 + 1 is a unit because it has an inverse g(x) = x21+1 so that for all x, f (x)g(x) = 1. Thus f g = f 1 . The function f (x) = x is not a unit because if g (R, R) is any function, f (0)g(0) = 0, so f g cannot be equal to f 1 .
∈ F
F
What are the units of (R, R)? The functions f (x) =
0 1
if x 0 if x > 0
g(x) =
1 0
if x 0 if x > 0
≤
and
≤
satisfy f g = f 0 , though neither f nor g is f 0 . So f and g are zero divisors.
F (R, R)? You should compare the rings R × R and F (X, R) if R is a ring and X is a set with What are the zero divisors of two elements.
2.6. Subrings
Checking all the ring axioms is a little tiresome. Luckily there is a way to generate rings as subsets of other rings. For example we will say that Z is a subring of Q. But you can’t take any subset. For instance N is a subset of Z but doesn’t have a zero, or any negatives inside N. As another example, the subset 1, 0, 1 isn’t a subring for the basic reason that the operation of addition from Z takes you outside 1, 0, 1 so it isn’t a ring in its own right. Here is the definition of a subring.
{−
{−
}
}
⊆ ∈
Let R be a ring, and S R a subset of R. Then S is called a subring of R provided that the following conditions are satisfied: (1) If s 1 , s2 S , then s1 + s2 S . (2) If s 1 , s2 S , then s1 s2 and s 2 s1 S . (3) If s S , then s S . (4) 1R S . Definition.
∈ ∈
∈ ∈
− ∈
·
· ∈
In other words S is closed under addition, multiplication, subtraction, and contains 1. Note that if S is a subring then 1R
∈ S, so is −1R and thus 1R + (−1R) = 0R ∈ S .
2. RINGS
135
If S is a subring then it becomes a ring itself under the operations already defined in R. The first two axioms just say those operations on S don’t go outside of S . You don’t need to check associativity, commutativity, or distribution because they’re true in R and in particular for elements of S .
C
Examples: Z is a subring of Q, which is itself a subring of R. The set (R, R) of continuous real-valued functions on R is a subring of (R, R). The subset of differentiable real-valued functions is a subring of (R, R), and the subset of polynomial functions is a subring of that. The following proposition is pretty easy.
F
C
Proposition 6.9 .
If S is a subring of R and T is a subring of S then T is a subring
of R. Thus Z is a subring of R, etc. Proposition 6.10.
a subring of R
Let R be a ring. Then the “diagonal” ∆R = (r, r) r
{
× R.
| ∈ R } is
∆R , Proof. Let (a, a) and (b, b) be in ∆R . Then (a, a)+(b, b) = (a+b, a+b) (a, a) (b, b) = (ab,ab), and (a, a) = ( a, a), which shows that ∆ R is closed under addition, multiplication, and negation. Moreover 1 R×R = (1R , 1R ) ∆R .
·
−
∈
− −
∈
A subring of an integral domain is an integral domain. You should think through why that is so. But the converse is not true, as you can see from Proposition 6.10 with R an integral domain. A subring of a field is not necessarily a field as the example of Z Q shows.
⊂
2.7. Exercises
(1) Propositions 6.3 and 6.4. (2) (a) Let R be a commutative ring. Prove that associativity is an equivalence relation on R. (Go back and read the boxed definition of “ a is associate to b”.) (b) Let R = Z/12Z. Write out the equivalence classes for the above relation. (3) Let R be “3-space” R3 , with addition defined as usual: (x,y,z)+(x , y , z ) = (x+x , y+y , z+z ). Define multiplication via the cross-product: (x,y,z) (x , y , z ) = (yz y z, x z xz , xy x y). Which ring axioms does R satisfy? Be sure to specify what the additive and multiplicative identities are, if they exist. (4) The following addition and multiplication tables describe a ring R with four elements. Which elements of R are units? Which are zero divisors? +
−
♠ ♠ ♦ ♥ ♣ ♦ ♠ ♣ ♥
♥ ♣ ♦ ♥ ♠
♦ ♠ ♥ ♦ ♣
−
♣ ♥ ♠ ♣ ♦
× ♠ ♥ ♦ ♣
×
−
♠ ♦ ♠ ♦ ♠
♥ ♠ ♥ ♦ ♣
♦ ♦ ♦ ♦ ♦
♣ ♠ ♣ ♦ ♥
(5) Show that if u, v are units in R then uv is a unit in R.
136
6. RINGS
|
|
(6) Let R be an integral domain. Prove that if a b and b a, then there is a unit u R so that b = au. (7) Show the previous exercise may be false if R is not an integral domain. (8) Let R be a commutative ring. Say an element x R is nilpotent if there is an n N so that x n = 0. Prove that the sum of two nilpotent elements is nilpotent. (9) Let X be a set. Let R be the set of subsets of X , with the following addition and multiplication laws. For A, B R, define multiplication via A B = A B. Define addition via
∈
∈
∈
·
∩
∈
− B) ∪ (B − A). Here A − B = {a ∈ A | a ∈ / B } denotes the set of elements in A which are
(10)
(11) (12)
(13)
(14)
A + B = (A
not in B . Check that R is a ring under these operations. Be sure to specify what the additive and multiplicative identities are. Write out the multiplication and addition tables in the case where X has two elements. Let R = Z Z, with addition and multiplication defined “componentwise”, i.e. (a, b) + (c, d) = (a + c, b + d) and (a, b) (c, d) = (ac,bd). Determine the units and zero divisors of R, and show your reasoning. Explain why the subset of real numbers with terminating decimal expansions is a subring of R. What are the units? Let p be a prime. Let Z ( p) = x Q ord p (x) 0 . Thus Z ( p) is the set of fractions with no “ p’s in the denominator”. Check that it is in fact a subring of Q, using properties of ord p . What are the units? Is it a field? Let p be a prime. Let Z[ p1 ] = x Q ord q (x) 0 for all primes q = p . Thus Z[ p1 ] is the set of fractions with “only p’s in the denominator”. For 3 instance 25 is in Z[ 15 ] but not in Z[ 13 ]. Check that it is also a subring of Q. What are the units? Is it a field? Suppose 1R +1 R = 2R is a unit in R. Prove that the equation x2 +bx+c = 0 has a solution in R if and only if b 2 4c is the square of an element in R.
×
·
{ ∈ |
≥ }
{ ∈ |
≥
}
−
F
For the next three problems, let R = (R, R) be the ring of all functions from R to R as above and S the subring of continuous functions. (15) Prove that every function in R is either zero, a unit, or a zero divisor in R. (16) Find a function in S that is neither zero, a unit, nor a zero divisor in S . (17) Find three different solutions to the equation f 2 + f = 6 in R. How many are there in S ?
3. ABSTRACT LINEAR ALGEBRA
137
3. Abstract Linear Algebra 3.1. Definition of M n (R)
∈
Let R be a commutative ring, and n N. In this section we will define a new ring M n (R) of n n matrices with entries in R, and develop its properties. We hope the reader is familiar with basic linear algebra.
×
∈ M n(R) is an n × n array of elements of R, i.e.,
An element X
· ··
·· · a1n ·· · a2n . X = · · · ·· · ·· · an1 an2 ·· · ann It is called a matrix, and the element in the ith row and j th column is called the (i, j)th entry of X . The addition rule for n × n matrices X and Y is defined as a11 a21
a12 a22
follows. If the (i, j)th entry of X is a ij and the (i, j)th entry of Y is b ij , then the (i, j)th entry of Y is a ij + bij . Since addition in R is commutative and associative, so is addition in M n (R). The zero element of M n (R) is the n n matrix, all of whose entries are 0 R ; thus
×
· · ·
· · · 0R · · · 0R . 0M (R) = ··· ··· ··· 0R 0R · · · 0R The negative of a matrix X ∈ M n (R) is the n × n matrix whose entries are the 0R 0R
n
0R 0R
negatives of the entries of X ; thus
−X =
− −· · · −
a11 a21
an1
−a12 · · · −a22 · · · · ·· ·· · −an2 · · ·
−a1n −a2n ··· −ann
.
Certainly X + ( X ) = 0M n (R) .
−
Write Rn for n-tuples of elements of R. They are called vectors, and the elements are called components. Thus a typical vector v Rn can be written v = (a1 , . . . , an ).
∈
A given row or column of a matrix forms a vector as usual; for instance the second row of X above is the vector (a12 , a22 , , an2 ).
· ··
If i N is not larger than n write ei for the vector whose ith component is 1R and whose other components are 0 R . It is called the ith standard basis vector. Addition and vectors is performed componentwise in the usual way. We require the notion of scalar multiplicatio. If a R and v = (a1 , . . . , an ) then we will write av = (aa1 , . . . , a an ).
∈
∈
Note that if v = (a1 , . . . , an ) then v =
n i=1 ai ei .
138
6. RINGS
Definition. If v = (a1 , . . . , an ) and w = (b1 , . . . , bn ) are Rn , then the dot product of v and w is given by v w =
·
is an element of R.
two vectors in n i=1 ai bi . This
∈ Rn, and a ∈ R. The following properties are easy to check:
Let u,v, w
• u · (v + w) = u · v + u · w. • (u + v) · w = u · w + v · w. • v · ei = ei · v is the ith component of v . • v · (aw) = a(v · w). • u · v = v · u. If X M n (R) and v R n then the product Xv is defined to be the vector in Rn whose ith component is the dot product of the ith row of X with v .
∈
Definition. (MM1)
∈
Let X, Y M n (R), a R, and v, w Rn . The following properties follow from the above properties of the dot product:
∈
∈
∈
• (X + Y )v = X v + Y v. • ()Xe i is the ith column of X . • X (v + w) = X v + Xw. • X (av) = a(Xv). The property () belongs to any linear algebraist’s toolkit, and is worth meditating on. It implies that X is determined by its multiplications against the standard basis. In particular, X =
| Xe1 |
| Xe2 ·· · |
| Xe n |
.
The last two properties can be iterated to show that if v = (a1 , . . . , an ) = n then X v = i=1 ai (Xei ).
In other words, if
X = then X v = a 1 v1 + a2 v2 +
| |
v1
· ·· + anvn.
| | v2 · ·· vn | |
,
Now we define matrix multiplication. If X, Y M n (R) then the product XY is defined n matrix whose j th column is the product of X with the j th
Definition. (MM2)
to be the n column of Y .
×
∈
n i=1 ai ei ,
3. ABSTRACT LINEAR ALGEBRA
139
Thus, if w 1 , . . . , wn are the column vectors of Y , then
·
X
| | | w1 w2 · · · wn | | |
=
| Xw1 |
| Xw 2 · ·· |
| Xwn |
.
Let X , Y , Z M n (R). Write I = 1M n (R) for the n n “identity” matrix; this is the square matrix whose diagonal elements a ii are all 1R and whose other elements are 0R . Thus, 1R 0R 0R 0R 1R 0R 1M n (R) = .
∈
×
· · · 0R
··· ··· ··· ··· ··· 0R · · · 1R
Note that the ith row is e i , as is the ith column.. Since R is nontrivial, I = 0M n (R) . The following properties follow from the above properties of matrix-vector multiplication:
• (X + Y )Z = X Z + Y Z . • X (Y + Z ) = X Y + XZ . • XI = I X = X . Note that since we have defined matrix multiplication in terms of columns, we can translate it by ( ) into X (Y ei ) = (XY )ei . The RHS tells you what the ith column of XY should be, given the ith column, Y ei , of Y . Associativity of multiplication sprouts out of this. Proposition 6.11.
∈ M n(R) and v ∈ Rn then X (Y v) = (XY )v.
If X, Y
Let v = (a1 , . . . , an ) = thus far show that Proof.
n i=1 ai ei .
n
X (Y v) = X (Y (
ai ei )) = X (
i=1
n
n
ai ((XY )ei ) =
i=1
The properties we have developed
n
n
ai Y (ei )) =
i=1
i=1
n
(XY )(ai ei ) = (XY )(
i=1
ai X (Y ei ) =
ai ei ) = (XY )v.
i=1
Theorem 6.12.
Proof. It
If X , Y , Z M n (R) then (XY )Z = X (Y Z ).
∈
is enough to show that the columns of the LHS and RHS are the same. The jth column of the LHS is (XY )zj , where z j is the jth column of Z . The jth column of the RHS is X (Y zj ). By the previous proposition we are done.
140
6. RINGS
3.2. Noncommutative Rings
By the previous section we have our first example M n (R) of a noncommutative ring. We know M 2 (R), for example, is never commutative, because
1 0 0 0
0 1 0 0
0 1 0 0
=
=
0 0 0 0
=
0 1 0 0
1 0 0 0
.
Note that we are finally relaxing the subscripts of 1 R and 0R . I would like to make some general remarks now about the theory of divisibility, units, and zero divisors in noncommutative rings. What should we mean by a b in a noncommutative ring R? It makes a difference whether you say there exists a c R so that ac = b or so that ca = b! Consider 1 0 1 2 the following example, with R = M 2 (R). Let A = and B = . 0 0 0 0 Does A B? In fact, if we put C = B then you’ll see that AC = B but CA = B . In fact, a moment’s computation will show you that there is no C M 2 (R) with CA = B! This motivates the following definition.
|
∈
|
∈
∈ ∈
Let R be a ring, and a, b R. Then a is a left divisor of b provided that there is a c R so that ac = b. We say a is a right divisor of b provided that there is a c R so that ca = b. Definition.
∈
Thus in the above example, A is a left divisor of B but not a right divisor of B . I would propose the notations a r b and a b, but we shall not have much occasion to use this.
|
|
What about units?
∈
Let R be a ring, and a R. Then a is a unit of R provided that it is both a right divisor and a left divisor of 1R . Definition.
This left/right nuance for units doesn’t get noticed in linear algebra. Fact 6.13.
Let F be a field. Then a matrix A only if it is a right divisor of I .
∈ M n(F ) is a left divisor of I if and
As an exercise, see how your linear algebra textbook covers this. (Anyone know about the case of M n (R) for general R? Drop me a line.) Units in matrix rings play an important role in linear algebra, but they usually are called something else. Let F be a field, and n N. Then a matrix A M n (F ) is called invertible or nonsingular provided that it is a unit in M n (F ). Definition.
∈
Zero divisors split up into left ones and right ones.
∈
3. ABSTRACT LINEAR ALGEBRA
∈
141
Let R be a ring, and a R. Then a is a left zero divisor provided that there is a nonzero element b R so that ab = 0R . We say a is a right zero divisor provided that there is a nonzero element b R so that ba = 0R . Definition.
∈
∈
0 1 0 0 2 right zero divisor since A = 0M 2 (R) . For example, the matrix A =
∈
M 2 (R) for any ring R is both a left and
Again this left/right nuance doesn’t happen in a linear algebra class. Fact 6.14.
Let F be a field. Then a matrix A only if it is a right zero divisor.
∈ M n(F ) is a left zero divisor if and
Can you prove it? (Anyone know about the case of M n (R) for general R? Drop me a line.) Facts 6.13 and 6.14 are not true for general noncommutative rings, but the examples are a little heavy. Here is a sketch of an example. Consider a real vector space V with an infinite basis of the form e1 , e2 , . . . . The set R of linear transformations L : V V forms a ring, under pointwise addition and composition. Consider the linear transformations λ, ρ, and ζ defined by
{
→
}
λ(e1 ) = 0, λ(e2 ) = e 1 , λ(e3 ) = e2 , . . . and ρ(e1 ) = e 2 , ρ(e2 ) = e 3 , ρ(e3 ) = e 4 , . . . . ζ (e1 ) = e 1 , ζ (e2 ) = 0, ζ (e3 ) = 0, . . . . In other words, λ moves the basis vectors to the “left”, ρ moves them to the “right”, and ζ sends all the basis vectors but e 1 to 0. Then the reader may check that λρ = 1R , the identity transformation, but since λ(e1 ) = 0, there is no linear transformation ρ with ρ λ = 1R . Thus λ is a “left unit” but not a “right unit”. Similarly, ρ is a “right unit” but not a “left unit”. Finally note that λζ = 0R , so λ is a left zero divisor. If there were a σ R with σλ = 0R then, applying σ to both sides of λρ = 1R we would have 0R = σ, so λ is not a right zero divisor.
∈
3.3. The 2
× 2 Case ×
We can save ourselves some headache if we focus on the 2 2 case, which is plenty big enough for our purposes. For every matrix X M 2 (R) there is an associated element det(X ) R, given by
∈
∈
det
a b c d
= ad
− bc.
Here are some properties of the determinant, which can be checked directly.
• If X, Y ∈ M 2(R), then det(XY ) = det(X ) det(Y ). • det(I ) = 1. • If a column of X is 0, then det(X ) = 0.
142
6. RINGS
The following formula is key to understanding the ring theory of M 2 (R):
a b c d
d c
−
−b a
Proposition 6.15.
=
d c
−
−b a
a b c d
=
ad
− bc 0
0 ad
− bc
= (ad bc)I
−
∈ M 2(R).
Let X
(1) X is a unit in M 2 (R) if and only if det(X ) is a unit in R. (2) X is zero or a zero divisor in M 2 (R) if and only if det(X ) is zero or a zero divisor in R. Proof.
Let X =
XY = Y X = det(X )I .
a b c d
, and Y =
−b
d c
−
a
. The main formula is
(1) If X is a unit in M 2 (R), then there is an element A M 2 (R) so that XA = I . Taking determinants of both sides shows that det(X ) det(A) = 1, thus det(X ) is a unit in R.
∈
∈
Conversely, if det(X ) is a unit, then there is an element r R so that dr br det(X )r = 1. Then the main formula shows that rY = is cr ar inverse to X .
−
−
(2) If X = 0 then det(X ) = 0.
∈
If X is a zero divisor in M 2 (R), then there is a nonzero matrix A M 2 (R) so that XA = 0. Let v = (x, y) be a nonzero column of A. By (MM2) we know that X v = 0. Suppose x = 0 (the case y = 0 is similar). x 0 Let Z be the matrix ; its determinant is x. By (MM2) the first y 1 column of XZ is 0, so the determinant of XZ is 0. Thus det(X )x = 0, and it follows that det(X ) is either 0 or a zero divisor.
If det(X ) is zero, the main formula shows that XY = 0. Therefore X is zero or a zero divisor. If det( X ) is a zero divisor, there is a nonzero element r R so that det(X )r = 0. Then X (rY ) = r det(X )I = 0. d b If rY = 0 then we see X is a zero divisor. If r = 0 then c a a b r 0 r =0 as well, so that X = 0. This shows that X is a c d 0 r zero divisor in this case as well.
∈
−
−
4. Chapter Wrap-Up 4.1. Rubric for Chapter
In this chapter you should have learned
4. CHAPTER WRAP-UP
143
• The definition of a ring, units, and zero divisors. • Ways of forming rings, including products, polynomial rings, and rings of functions. • Simple “ring-theoretic” proofs. • Methods for determining units and zero divisors for a given ring. 4.2. Toughies
(1) There are four rings which contain exactly 4 elements. Find them, and write out their addition/multiplication tables. (2) Find all the subrings of Q.
CHAPTER 7
Polynomials
145
146
7. POLYNOMIALS
1. Polynomials
Polynomials are ubiquitous in mathematics, appearing for example as Taylor polynomials in calculus, characteristic polynomials in linear algebra, knot polynomials in topology, generating functions in combinatorics, and characteristic equations in differential equation theory. In calculus they are usually the first functions studied as they are the most amenable to differentiation and integration. Indeed, they are “closed” under these operations. It is often of interest in the above examples to know the roots of polynomials, and how they factor. This leads to the study of the arithmetic of polynomials, which we pursue in this section.
1.1. Basics of Polynomials over Rings
Let R be a commutative ring. We will define polynomials with coefficients in R. [Author’s Note: I need to replace “arbitrary commutative ring” with something conceptually easier.] Definition.
A polynomial is an expression of the form f (x) = a d xd + ad−1 xd−1 +
where ai ai = 0.
∈ R.
·· · + a0x0,
The zero polynomial is the polynomial in which all
Remarks: For convenience, we will often write x 0 = 1. Let f be a nonzero polynomial. The degree of f , or deg(f ), is the highest power of x with a nonzero coefficient. Definition.
Thus in the previous definition, if a d = 0, then deg(f ) = d. Remark: If f = 0 is the zero polynomial, we do not define a degree. Some call the degree and assume a calculus for such a symbol. We, however, feel this gives an undue mysticism for and choose to deal with the zero polynomial separately.
−∞
−∞
If deg(f ) = 0, f is a “constant” polynomial; of the form f (x) = a, with a = 0.
0. If deg(f ) = 1, f is “linear” polynomial; of the form f (x) = ax + b, with a = If n > deg(f ), the convention will be that the coefficient a n = 0. If f is a nonzero polynomial, write LT(f ) for the highest degree (“leading”) term of f . Definition.
Thus f = LT(f ) + f < , where f < is a polynomial with degree less than f . Note that LT(f ) = 0.
1. POLYNOMIALS
147
Definition. Addition of polynomials is performed in f (x) = ad xd + + a 0 , and g(x) = be xe + + b0 , n (f + g)(x) = (an + bn )x + + (a0 + b0 ).
· ··
· ··
· ··
the obvious way: If and n d, e, then
≥
It is easy to see that the additive ring axioms for R impose the same axioms for this addition. 7.1. (Degree Estimate for Addition) If f and g are nonzero polynomials, with g = f , then deg(f + g) max deg(f ), deg(g) . If deg(f ) > deg(g) then this is exactly deg(f ), and LT(f + g) = LT(f ) Lemma
−
≤
{
}
Now we turn to multiplication.
1.2. Multiplication of Polynomials
In this section, we discuss the delicacies of polynomial multiplication. The idea is very simple: use induction to whittle it down to monomial multiplication. A monomial is an expression of the form ax n , with a and n a nonnegative integer. Definition.
Definition. Monomials Proposition
∈R
multiply via axm bxn = abx m+n .
·
7.2. Monomial multiplication is commutative and associative.
Proof: This follows from commutativity and associativity of multiplication in the ring R, and commutativity and associativity of addition of integers. Definition. Monomials
and polynomials multiply via
axm f (x) = aa d xd+m + aad−1 xd+m−1 +
·
·
Proposition 7.3 .
·
·· · + aa0xm.
If f is a monomial and g and h are polynomials, then f (g+h) =
·
f g + f h.
l
ax
·
Proof. This is perhaps best done with n i i , g = m i=0 bi x , h = i=0 ci x , and let N
f (g +h) = ax
m
l
·
i
≥
n
bi x +
i=0
“summation notation”: Write f = m, n. Then
i
ci x
N
= ax
i=0
l
·
N
i
(bi + ci )x
i=0
=
a(bi + ci )xi+l .
i=0
By distributivity of R, a(bi + ci ) = ab i + aci , so this is N
l
ax
·
i
bi x
i=0
N
l
+ ax
·
ci xi
i=0
·
·
= f g + f h.
148
7. POLYNOMIALS
Definition. Since
every polynomial is a sum of its leading term and its lower degree terms, we can define polynomial multiplication “recursively” via 0 if f = 0, f g = LT(f ) g + f < g if f = 0
·
·
·
The induction will stop when f is a monomial, since then f < = 0. Note that if deg(f ) = 0, then f is a monomial, so the induction process will certainly end. Note that, by induction, f 0 = 0.
·
Remark: Throughout this section when we say “induction” we are referring to strong induction, since deg(f < ) may be less than deg(f ) 1.
−
Proposition
7.4. If f is a monomial and g is a polynomial, then f g = g f .
·
·
Proof. Clear
if g = 0. Otherwise, use induction on deg(g). If deg(g) = 0, then g is a (constant) monomial, and the result follows from commutativity of monomials. By the previous proposition, f g = f LT(g)+f g< . We must compare this to LT(g)f + g< f . But these are equal by commutativity of monomials and the induction hypothesis.
·
·
·
·
Corollary 7.5 .
·
·
If f is a monomial and g and h are polynomials, then (g +h) f =
·
g f + h f . Proof. Using
previous results, (g + h)f = f (g + h) = f g + fh = gf + hf .
Finally, we get some theorems.
·
7.6. (Distribution) Let f, g, and h be polynomials. Then f (g + h) = f g + f h. Theorem
·
·
Proof. Left Theorem
as exercise.
·
·
7.7. (Commutativity) Let f and g be polynomials. Then f g = g f .
Proof. If f = 0 both sides are zero. Otherwise use induction on deg(f ). If deg(f ) = 0 this is monomial-polynomial commutativity. By definition of f g, we have f g = LT(f ) g + f < g = g LT(f ) + g f < , using monomial-polynomial commutativity and the induction hypothesis. By dis tribution of g, this is equal to gf .
·
·
·
·
·
·
We finish with associativity. The idea is simply three inductions, one for each term. The reader should fill in the details. Proposition 7.8 .
f (g h).
· ·
If f and g are monomials, and h is a polynomial, then (f g) h =
· ·
1. POLYNOMIALS
149
Proof. Induction on deg(h). Use distributivity, and associativity of mono mono-mono multiplication. Proposition 7.9 .
· ·
If f is a monomial and g and h are polynomials, then (f g) h =
· ·
f (g h). Proof. Induction
on deg(g). Use mono-mono-poly associativity and distribu-
tivity.
Theorem 7.10. (Associativity) Proof. Induction
· ·
· ·
If f , g , h are polynomials, then (f g) h = f (g h).
on deg(f ). Use mono-poly-poly associativity and distribu-
tivity.
1.3. The Polynomial Ring R[x]
By the previous section, the addition and multiplication laws on the set of polynomials make it into a ring. Definition. The set of polynomials with coefficients in a ring R thus forms a ring itself. We call this new ring R[x].
(Obviously the constant polynomial f 1 (x) = 1 is the multiplicative identity.) Remark: R may be viewed as a subset of R[x] via the constant polynomials. It is then a subring. In this section we will try to find the units and zero divisors of R[x] and succeed when R is an integral domain. We start with a lemma. Lemma 7.11.
Let R be an integral domain, and f R[x] a nonzero monomial. If g = 0, then f g = 0 and LT(f g) = LT(f ) LT(g).
·
Proof.
∈
·
This follows from a simple computation.
Corollary 7.12. In
the above situation, deg(f g) = deg(f ) + deg(g).
Theorem 7.13.
Let R be an integral domain. Let f and g be nonzero polynomials. Then f g = 0, deg(f g) = deg(f ) + deg(g), and LT(f g) = LT(f ) LT(g).
·
·
Proof. Induction
on deg(f ). If f is a monomial the theorem follows from the previous lemma. So we are done if deg(f ) = 0 or if f < = 0. We have LT(fg) = LT(LT(f ) g+f < g) by the definition of multiplication. By our inductive hypothesis,
·
·
deg(f < g) = deg(f < ) + deg(g) < deg(f ) + deg(g) = deg(LT(f ) g).
·
·
Therefore we may apply Lemma 7.1, to conclude that
·
·
·
LT(f g) = LT(LT(f ) g + f < g) = LT(LT(f ) g).
·
By the previous lemma again this is LT(LT(f )) LT(g). Since LT(f ) is monomial, this yields LT(f ) LT(g), as desired. It follows that deg(fg) = deg(LT(f g)) = deg(f ) + deg(g).
·
150
7. POLYNOMIALS
Corollary 7.14.
If R is an integral domain, R[x] is an integral domain.
Corollary 7.15.
If R is an integral domain, the units of R[x] are the constant polynomials equal to units in R. Proof. If f g = 1, then deg(f ) + deg(g) = deg(1) = 0. Thus f and g must both be constants, and the result follows.
Example: This fails if R is not an integral domain. For example, if R = (Z/4Z), then (1 + 2x)2 = 1, so 1 + 2x is a unit in R[x]. Corollary 7.16. In
|
the integral domain situation, if f g, and g = 0, then deg(f )
deg(g).
≤
1.4. Roots of Polynomials
It is often of importance to determine the roots of polynomials. In linear algebra, the eigenvalues of a square matrix are the roots of its characteristic polynomial. Definition. If f (c) = a n cn +
f (x) = an xn + + a1 c + a0 .
··· + a1x + a0, and c ∈
·· ·
R, we define
We will need the following:
∈ R[x]. If f +g = h, then f (c)+g(c) =
Proposition 7.17.
Let R be a ring, and f, g h(c). If fg = h, then f (c)g(c) = h(c). Proof. We
leave this as an exercise, with a hint. The first fact is easily seen with a direct calculation. For the second, use the “LT-method”. Thus make a direct calculation when f is monic. When f is a polynomial, use induction. Definition. An
element c
∈ R is a root of f if f (c) = 0.
Note that if (x c) f (x), then c is a root of f . The converse is true; we will later prove it in the case where R is a field.
− |
1.5. Exercises
(1) Theorem 7.6, Lemmas 7.8, 7.9, Theorem 7.10, Lemma 7.11, and Proposition 7.17. (2) Let R be an integral domain. Let f, g R[x] with f nonconstant and i g = 0. Prove that the set i N; f g is bounded above. (3) In the above situation, define ordf (g) = max i N; f i g . Prove that ordf (g) = i if and only if there is an h F [x] so that g = f i h and f h. (4) How many functions are there from Z/3Z to Z/3Z? Find two different polynomials in (Z/3Z)[x] which give the same function on Z/3Z. Also try this exercise for other Z/nZ’s. (5) R may not be an integral domain. Prove that in this case we still have deg(f g) deg(f ) + deg(g) if f g is nonzero.
{ ∈
|}
∈
∈
· ≤
·
{ ∈
|}
1. POLYNOMIALS
151
(6) Find a quadratic polynomial in (Z/8Z)[x] with 4 roots. Are there any with 5 roots? For the next three problems fix a ring R, and consider polynomials in R[x]. Let f R[x] be a nonzero polynomial. Define the order of f , written ω (f ), to be the lowest power of x with a nonzero coefficient.
∈
For example, ω(x5 + 2x3 ) = 3. If f = 0 we define ω(f ) = (7) (8) (9) (10)
∞.
Show that ω(f ) = ordx (f ) when f is nonconstant. Prove that if f , g R[x] then ω(f + g) min(ω(f ), ω(g)). Let R be an integral domain. Prove that ω(f g) = ω(f ) + ω(g). Let L = Q[x, x−1 ] be the set of rational “Laurent polynomials”, which are of the form
∈
f (x) = a −m x−m +
≥
·
· · · + a−1x−1 + a0 + a1x + ·· · + anxn,
with a i Q. Then L is a ring under the usual addition and multiplication rules. Write ω(f ) for the least integer i so that a i = 0, and deg(f ) for the greatest integer i so that a i = 0. For example, if f = x −4 + 2x−3 4x−1 , then ω(f ) = 4 and deg(f ) = 1. Explain why if f, g L are nonzero, then ω(f g) = ω(f ) + ω(g) and deg(f g) = deg(f ) + deg(g). (11) Use the previous problem to show that the units of L are all of the form α xi , where α is a nonzero real number and i Z.
∈
−
·
−
∈
∈
−
152
7. POLYNOMIALS
2. Polynomials over a Field 2.1. Some More Ring Theory
Much of the material from the first chapter can be generalized to an arbitrary commutative ring R. Definition. Write Div(a, b) for
the set of common divisors of a and b, and Mult(a, b) for the set of common multiples of a and b. Definition. We
say a divisor d of x is a proper divisor of x provided that it is neither a unit nor the product of x and a unit. say an element x R is irreducible provided that it is nonzero, not a unit, and has no proper divisors. Definition. We
∈
Example: When R = Z, the irreducible elements are the primes of N and their negatives. Every irreducible element of Z ( p) is associate to p. 2.2. Another Division Algorithm
The arithmetic of R[x] is much nicer when R is a field, and is very similar to that of the ring Z. In this section we will show that, up to constants, any nonzero polynomial factors uniquely into irreducible polynomials. Let F be a field. By the work in the previous section, we know that F [x] is an integral domain, and the units are exactly the polynomials of degree 0. Up to these units, we will have unique factorization. We will follow the same basic path as with N. We have a Division Algorithm for Polynomials.
∈
Theorem 7.18. (Division
Algorithm, Weak Form) Let f, g F [x] be polynomials with g = 0. Then there are polynomials p, r with r = 0 or deg(r) < deg(g), so that f = pg + r.
Proof: If f = 0 we put q = r = 0. Otherwise we use induction on deg(f ). First note that if deg(f ) < deg(g) we may put q = 0, r = f , so we are done in that case. So we may assume below that deg(f ) deg(g). Secondly, note that if deg(g) = 0, then g is a unit so we may put p = f g −1 and r = 0. In particular, these remarks settle the case of deg(f ) = 0. As usual consider the decompositions f = LT(f ) + f < and g = LT(g) + g< . Let LT(f ) = axm and LT(g) = bx n . By the assumption above, m n Thus we may take p 0 = ab xm−n . (This will be LT(q ).) What is left? Since LT(f ) = LT( p0 g), either deg(f p0 g) < deg(f ) or this difference is 0. In the latter case, we may take q = q 0 and r = 0. In the former we apply the induction hypothesis to f p0 g and g. We find p1 , r as in the statement of the theorem, satisfying f p0 g = p 1 g + r. In this case, we may take p = p 0 + p1 and the same r.
≥ ·
≥
−
−
−
−
Corollary 7.19. (Strong
determined.
Form) In the above situation, the p and r are uniquely
2. POLYNOMIALS OVER A FIELD
15 3
Proof: Suppose pg Suppose pg + + r = p g + r , with deg(r deg(r), deg(r deg(r ) < deg( < deg(gg ). Then ( p ( p p )g = (r r). If the right hand side is not 0, its degree is less than the degree of g of g.. But it is divisible by g , a contradictio contradiction. n. So r = r and the RHS RHS is 0. Since Since g is not a zero divisor we conclude that p = p = p p and are done.
−
−
Corollary 7.20. (Root
Test) Let c Let c
c is a root of f of f ..
∈ F . F . The linear polynomial x polynomial x − c divides f f iff
Proof: We have already mentione mentioned d one direction. direction. Apply the division algorithm algorithm to f and x c. If x c does not divide f , f , then f ( f (x) = p(x)(x )(x c) + r( r (x), with deg(r deg(r) = 0. Then f Then f ((c) = r = 0.
−
−
−
2.3. Syntheti Synthetic c Division Division
In this section, F denotes F denotes a field. Long division division of polynom p olynomials ials is somewhat cumbersome because one has to do a great deal of redundant writing, with x’s x’s and bars. With some practice you can omit the variables and only work with the coefficients. If you are merely dividing a polynomial f ( f (x) by the polynomial x c with c with c F , F , you can further simplify your work to a table with three rows of numbers, which will quickly give you all the information of the division algorithm. The first row is simply the coefficients coefficients of f of f ,, and the third row will be the coefficients of p of p and r. r .
−
Let f (x) = an xn + a n−1 xn−1 + an an−1 a0
···
· · · + a0 ∈
F [ F [x], and c
∈ F . F .
∈
Set Set up the the table table::
c an Here are the rules:
• Every number in the third row should be multiplied by c and the result should be put in the upper right entry of the second row. • Every number in the second row should be added to the entry above it and the result should be put in the row below it.
So you wind up getting: an c an
an−1 an c an c + an−1
an−2 an c + an−1 c 2 an c + an−1 c + an−2 2
··· ··· ···
a0 an c + an−1 cn−1 + + a1 c n n−1 an c + an−1 c + + a1 c + a 0 n
··· ···
Note that the last entry is f ( f (c), which which is the remainder. remainder. The other entries entries in the third row are the coefficients of the quotient p, which p, which will be one degree less than f . f . Here is an example, with f with f ((x) = x4 + 5x 5x3 + x + 3 and c and c = = 1
5
0
−2 −6 3 −6
1 12 13
3
−26 −2 1 −23 Thus f Thus f ((x) = (x3 + 3x 3x2 − 6x + 13)(x 13)(x + 2) − 23 23..
−2.
15 4
7. POLYNOMIALS
Here’s why this works. Following the rules sets up (the coefficients of) a polynomial p and p and a number r so that the second row is (the coefficents of) cp, and cp, and the third row is xp + xp + r r.. (The (The factor factor of x account accountss for the shift to the left.) Moreover Moreover,, the third row row is the sum of the first two. two. Thus, Thus, f + cp = cp = xp + xp + r r.. Regrou Regroup p this to get f = (x c) p + r.
−
∈
Synthetic division is a fine way to determine whether c F is F is a root of f . It is a root only if the last entry is 0. Of course, another way is to plug c directly into f into f .. One advantage of synthetic division is that if c if c is a root, then p is the quotient of f by x by x c. Any further roots of f of f will will also be roots of p, p , which which has smaller degree.
−
Here is another benefit of synthetic division when F is F is the real numbers R (or Q). Q ). You can often use the p to zoom in on possible roots of f . . F or instance consider con sider f f ( f (x) = x 3 5x + 6 and c and c = = 3:
− 0 −5
1
6 9 12 4 18
3 3 1 3
So 3 is not a root, and f ( f (x) = (x2 + 3x 3x2 + 4)(x 4)(x 3) + 18. Can f Can f have have any roots d 2 2 greater than 3? If so, then f ( f (d) = (d + 3d 3d + 4)(d 4)(d 3) + 18. But since d since d > 3 > 3 > 0 0,, all these terms are positive, and so the result cannot be 0!
−
−
A general rule is:
∈
Proposition 7.21.
∈
Let f R[x] and c R. If the third third row of the the synth synthet etic ic division division table consists consists of positiv positivee numbers, numbers, then there there are are no positiv positivee roots roots of f greater than c.
−
Proof. Under
the given hypotheses, f ( f (x) = p( p (x)(x )(x c) + r + r,, with r with r > 0 and the coefficients of p of p positive positive.. Let d > c be positive. positive. Then f ( f (d) = p( p (d)(d )(d c) + r + r.. Since d Since d is positive, so is p( p (d). Since d Since d > c, c, d c > 0. Thus f Thus f ((d) > 0. > 0.
−
−
The condition that d be positiv positivee is importa important nt;; watc watch h what what happens happens with with f = 2 x + 4x 4x + 4, c 4, c = = 3, and d and d = 2.
−
−
Here is the rule in the other direction, which the reader should enjoy proving: Proposition 7.22.
Let f R[ R [x] and c R. R . If the entries entries of the third third row of the synthetic division table are nonzero and alternate sign, then there are no negative roots of f less f less than c. c .
∈ ∈
∈
Remark: In this and the previous proposition, the c in c in the corner is not considered part of the third row. For example example this happens with f with f ((x) = x3 1
−3
1
0 3 3
− −
−5 9 4
6 12 6
− −
− 5x + 6 and c and c = = −3:
2. POLYNOMIALS OVER A FIELD
15 5
−
Thus we know all real roots of f lie f lie between 3 and 3. In fact there is exactly one real root, about 2.68 68.. The third row of the synthetic division table for c = 1 has a negative entry; these simple tests give only “one-way” information.
−
2.4. Rational Rational Root Test Test
∈ ∈
Let’s specialize to the rational numbers Q numbers Q.. Say you’re you’re given a polynomial f polynomial f Q[x] 17 4 9 3 5 2 and need to find its rational roots. For instance, f ( f (x) = 3x 2 x + 2 x + 4x 11 1. At first it seems there are infinitely many possibilities, and most of them 2 x will not be roots. roots. In this section section we will use a little little modular modular arithmet arithmetic ic to show show that there are actually only finitely many possibilities; in this case you need to check 1, 2, 12 , 13 , 23 , 16 . Some Some fluency fluency with synthet synthetic ic divisi division on mak makes es this even easier.
−
−
−
± ± ± ± ± ±
∈
Before we begin please note that any polynomial f Q[ Q [x] can be multiplied by a constant N Z so that N that N f Z[x]. For example, example, N could N could be the product of the denominators of the coefficients of f . In the above above examp example, le, 2f 2 f = 6x5 17 17x x4 + 9x3 + 8x 8x2 11 11x x 2 Z[ Z [x]. The roots roots of f of f are are of course the same as the roots of N f . f . So we may reduce to the case of integer polynomials.
∈ ∈ · · ∈ ∈ − − ∈
· ·
−
Here’s the theorem: Root Test) Let f ( f (x) = a n xn + an−1 xn−1 + + a1 x + a0 Z[x] be a nonzero polynomial. Let pq be a reduced fraction in Q in Q,, thus gcd( gcd( p, q ) = 1. p If q is a root of f of f ,, then p p a0 and q q an .
···
Theorem 7.23. (Rational
|
∈
|
Thus there are only finitely many possibilities for p and q and q , as long as a as a 0 and a and a n are nonzero. nonzero. (What (What if they aren’t?) aren’t?) In the above example, example, a5 = 6 and a 0 = 2; this is how the list of p ossible ossible roots was made. In general you list every every number which which may be written as a divisor of a of a 0 divided by a divisor of a of a n . Proof.
a1
p q
If pq is a root of f , f , then 0 = f = f
+ a0 . Multipling by q by q n we obtain an pn + an−1 pn−1 q + +
≡
p q
= a n
p q
n
+ an−1
− p q
n 1
+
··· +
· · · + a1 pq n−1 + a0q n.
So a So a0 q n 0 mod p mod p.. Since gcd( p, gcd( p, q ) = 1, we know q know q is is a unit mod p mod p.. So we can invert it to get a0 0 mod p mod p.. This This exactly exactly says says that p a0 . Similarly Similarly,, an pn 0 mod q mod q says that q that q an .
|
≡
|
Let’s use this to find the rational roots of our example. First try c = c = 6
−1
6
−17 −6 −23
−
9 23 32
− −
8 32 24
≡
−1.
−11 −2 24 −13 13 −15
Thus 1 is not a root; too bad. Since the digits of the third row have alternating sign, by Proposition 7.22 we know that any roots must be greater than 1; this rules out the 2. Let’s check c check c = = 1.
−
−
156
7. POLYNOMIALS
6
−17 9 8 −11 −2 6 −11 −2 6 −5 −11 −2 6 −5 −7
1 6
Thus 1 is not a root. Since the third row is neither all positive nor alternating, we can’t use either Proposition 7.21 or Proposition 7.22 to rule out any more roots. Too bad. Next, c = 6
− 16 . −17 −1 −18
9 3 12
8 2 6
−11 −2 − −1 2 − 16 6 −12 0 A root! We conclude that 2f = (x − 16 )(6x4 − 18x3 +12x2 + 6x − 12) = (6x − 1)(x4 − 3x3 + 2x2 + x − 2). Any more roots of f are thus also roots of x 4 − 3x3 + 2x2 + x − 2. The rational root test now tells us that the possible roots are ±1, ±2. We already know that 1, −1, and −2 are not roots, so the only possibility is 2: 1 −3 2 1 −2 2 −2 0 2 2 1 −1 0 1 0 Thus 2 is a root, and in fact 2 f = (6x − 1)(x − 2)(x3 − x2 + 1). The only possible roots of the cubic are ± 1, and we have already checked that these are not roots. 1 So we are done; the only roots of the original f are 2 and
6.
Here are some corollaries of the theory, starting with an important definition: Let f be a nonzero polynomial in F [x]. We say f is monic if the coefficient of its highest degree term is 1. Definition.
Corollary 7.24.
If f
∈ Z[x] is monic, then any rational root is an integer. |
Indeed, if a n = 1, then q 1 so Corollary 7.25. There
p q
±
= p.
is no rational square root of 2.
Indeed, the only possible rational roots to x2 2 are 1, 2, and the squares of these are not 2. Of course there is a real root to x 2 2.
−
−
± ±
2.5. Euclidean Algorithm
As in the case of integers, the Division Algorithm leads directly to a Euclidean Algorithm. We must first discuss greatest common divisors. We will normalize them to be monic, as defined above. Here are some facts about monic polynomials we will need. Proposition 7.26.
If f is a nonzero polynomial, then f factors uniquely into a nonzero constant and a monic polynomial.
2. POLYNOMIALS OVER A FIELD
Proof. Indeed,
if f f = a n xn +
15 7
· · · , with a with a n = 0, 0 , then f then f = a n · ( af ).
n
|
Proposition 7.27.
If f f and g g are monic polynomials of the same degree, and f and f g,
then f f = g. g . Proof. Exercise.
We want to define a “greatest common divisor” of two nonzero polynomials f and g . We don’t have such a nice well-ordering as in N in N,, but we do have subtraction. So we use the following approach. Let f, g F [ F [x]
}
∈ F [ F [x] be polynomia polynomials, ls, not both 0.
Consid Consider er the set I = af + bg + bg a, b
{
| ∈
By the Min Form of Well-Ord ell-Orderi ering, ng, the set of degree degreess of polynom polynomial ialss in I has a minimum. minimum. We would would like like a polynomia polynomiall in I I of minimum degree to be called gcd(f, gcd(f, g). Ho Howe weve verr if you multipl multiply y such such a polynom polynomial ial by a nonzer nonzeroo c F , F , it will have the same degree and will still be in I . If we norma normalize lize by by asking asking for a monic polynomial polynomial of this degree, then there is only one such in I , I , by the following proposition.
∈
Proposition 7.28. There
is a unique monic polynomial d polynomial d in I in I of of smallest degree. The polynomial d polynomial d divides both f f and g. g . Moreover, if there is a polynomial e polynomial e so that e f and e e g , then e e d. Therefore Therefore d d is the unique monic polynomial of greatest degree in Div(f, Div(f, g).
|
|
|
Proof. Observe
that I I is closed under addition, and multiplication by elements ments of R[x]. Sinc Sincee f , f ,g are not both 0, there there are nonzero nonzero polynomia polynomials ls in I . Dividing one of these by its leading coefficient shows that there are monic polynomials in I . Suppose Suppose that that d and d are monic polynomials in I of I of smallest degree. By the division algorithm there are polynomials p and r so that d = pd + pd + r, r, with with r = 0 or deg(r deg(r ) < deg(d deg(d). But by the above above observation observation,, r = d pd I . I . Thus r must be 0, for otherwise its degree would be too small. Thus d d . By Proposition 7.27, d 7.27, d = = d d .
|
− ∈
Note that f and g are themselves members of I . By the division division algorithm algorithm,, f = pd + r for some p, some p, r. As in the previous proof, r proof, r I , and is therefore 0. So d f and similarly d similarly d g . Suppose e Suppose e f and e and e g . Since d Since d I , we know that d that d = = af af + + bg for bg for some a, b F [ F [x]. Therefore e Therefore e d.
∈
|
| |
|
∈
∈
|
greatest common Let f, f , g be polynomials, not both 0. 0. The greatest divisor of f f and g g , written gcd(f, gcd(f, g), is the unique monic polynomial of greatest degree in Div( in Div(f, f, g). Definition.
Corollary 7.29. (Bezout
are polynomials polynomials a, a, b
∈
Proof. Indeed,
form by definition.
Identity) Let f, f , g be polynomials, not both 0. 0 . Then there F [ F [x] so that af + af + bg = bg = gcd(f, gcd(f, g).
by Proposition 7.28, gcd(f, gcd( f, g)
∈ I , so it can be written in this
15 8
7. POLYNOMIALS
Proposition 7.30. Let f, g be nonconstant nonconstant polynomials olynomials.. Then there are polynopolynomials a, b F [ F [x] with deg(a deg(a) < deg(g deg(g) or a = 0, and deg(b deg(b) < deg(f deg(f )) or b = 0 so that af + af + bg = bg = gcd(f, gcd(f, g).
∈
|
|
Let d Let d = = gcd(f, gcd(f, g). The cases a cases a = = 0 or b or b = = 0 occur when g when g d or f d; so we may rule out these cases. Proof.
By the theorem, we know that there are a 0 , b0 F [ F [x] so that a that a 0 f + + b0 g = d. d. Note that if k( (a0 pg) = d.. By the division division algorithm, algorithm, k (x) F [ F [x], then also (a pg)f + f + (b0 + pf ) pf )g = d there is a p a p so that if a if a = a = a 0 pg, pg, then deg(a deg(a) < deg( < deg(gg ). (Note that if a a = 0 then g a0 which implies that g that g d which we have ruled out.)
∈
|
|
∈
−
−
|
|
Let b Let b = = b b 0 + pf . pf . (If b = b = 0 then f then f b0 which implies f implies f d.) We can rewrite the above as bg = bg = d d af . Since d Since d f but f but f d d, deg(af deg(af ) > deg( > deg(d d) and so deg(d deg( d af ) af ) = deg(af deg(af ). ). Thus deg(b deg(b) + deg(g deg(g ) = deg(bg deg(bg)) = deg(af deg(af )) = deg(a deg(a) + deg(f deg(f )) < deg(g deg(g ) + deg(f deg(f ), ), and therefore deg(b deg( b) < deg( < deg(f f ). ).
−
|
−
This leads to the method method of Undetermin Undetermined ed Coefficients Coefficients.. Example: Example: Let us apply this to the p olynomials olynomials f (x) = x 2 + 1 and g (x) = 3x 1 in R[x]. Note Note that that g(x) is irreducible and does not divide f , Div(f, g) = α f , so Div(f, R α = 0 . We set up the equatio equation n af + bg = 1, with deg(a deg(a) = 0 and deg(b deg( b) = 1: We obtain: α (x2 + 1) = (βx (βx + γ ) (3x (3x 1) = 1, 1,
− { ∈
| }
·
· − (α + 3β 3β )x2 + (3γ (3γ − β )x + (α (α − γ ) = 0x2 + 0x 0x + 1. 1.
This is meant to be an identity of polynomials of polynomials , and so each of the three coefficients must agree: α + 3β 3β = = 0, 3γ β = = 0, α γ = = 1.
− − 9 3 1 This is easily solved to give α = 10 , β = − 10 , and γ = − 10 . And the reader reader may may
check that
9 2 (x + 1) + 10
−
3 x 10
−
1 10
(3x (3x
− 1) = 1.1.
Remark: There is a Euclidean Algorithm for polynomials analogous to the one for N, but it is computationally cumbersome in practice. greatLet f f 1 , . . . , fn be polynomials in F in F [[x], not all 0. 0 . The greatest common divisor of f f 1 , . . . , fn , written gcd(f gcd(f 1 , . . . , fn ), is the unique
Definition.
monic polynomial of greatest degree in Div( in Div(f f 1 , . . . , fn ). It was slightly bad of me to write down this definition before proving the following: Proposition 7.31. There is a monic polynomial d polynomial d so so
that Div(f Div(f 1 , . . . , fn ) = Div(d Div(d). Therefore this is the unique monic polynomial of greatest degree in Div(f Div(f 1 , . . . , fn )
A monic polynomial of smallest possible degree certainly exists; this is the Max form of Well-Order Well-Ordering ing on the set of degrees. The interesting interesting algebraic algebraic fact is that there is only one such polynomial.
2. POLYNOMIALS OVER A FIELD
15 9
Proof. Induction on n on n;; the case n case n = = 1 is obvious; just divide f 1 by its leading coefficient coefficient.. Suppose the Proposition Proposition is true for k. Let Let f 1 , . . . , fk +1 be given, and dk = gcd(f gcd(f 1 , . . . , fk ). Let d Let d = = gcd(d gcd(dk , f k+1 ); we will show that Div(f Div(f 1 , . . . , fk +1 ) = Div(d Div(d). We know know that d dk and dk f i for i = 1, . . . , k, k, so d Div(f Div(f 1 , . . . , fk +1 ), as is any divisor of d. So the the RHS RHS LHS. LHS. Next, Next, let g Div(f Div(f 1 , . . . , fk +1 ) Div(f Div(f 1 , . . . , fk ) = Div(d Div(dk ). So g So g Div(d Div(dk , f k+1 ) = Div(d Div(d).
|
|
∈
⊆
∈
∈
⊆
The following plays a vital role in the theory of canonical forms in linear algebra, so we include a proof.
∈
Proposition 7.32.
Let f f 1 , . . . , fn F [ F [x], not all 0, 0 , with gcd(f gcd(f 1 , . . . fn ) = d. d . Then there are polynomials a polynomials a 1 , . . . , an so that a a 1 f 1 + + an f n = d = d..
···
Proof. Induction on n on n;; the case n case n = 1 is again obvious. obvious. Suppose the proposition is true for k . Let f 1 , . . . , fk +1 be given, and that b1 f 1 + + bk f k = d k = gcd(f gcd(f 1 , . . . , fk ). As in the previous proof we have d have d = = gcd(d gcd(dk , f k+1 ). Let p, Let p, q F [ F [x] be polynomials so that pd k + qf k+1 = d. d. Then we may set a set a i = pb = pb i for i for i = 1, . . . , k and a and a k+1 = q = q .
···
∈ ∈
Let f 1 , . . . , fn F [ F [x] be nonzero polynomials. We say they are pairwise coprime if for all i all i = j, gcd(f gcd(f i , f j ) = 1. We say they are relatively prime if gcd(f gcd(f 1 , . . . , fn ) = 1.
∈
Definition.
Proposition 7.33.
Mult(f Mult(f 1
∈ F [ F [x] be pairwise coprime. Then Mult(f Mult(f 1 , . . . , fn ) =
Let f f 1 , . . . , fn
· · · f n).
We omit the proof because it is just like the proof of Proposition 2.49 in Section 9.2. Theorem 7.34.
∈ ∈ F [ F [x] has degree d, then f has f has at most d distinct roots.
If f
If c c 1 , c2 , . . . , cn are distinct roots of f , f , then by Proposition 7.20, f Mult(x Mult(x c1 , . . . , x cn ). Therefore by the previous proposition the product (x c1 ) (x cn ) divides f divides f ((x). This implies that n that n d.
∈
Proof.
− − − ··· −
≤
2.6. The Fundament undamental al Theorem Theorem of Arithmetic Arithmetic for Polynom Polynomials ials
This section is closely related to the section for numbers. We leave out some proofs which are identical to the proofs for polynomials. divisor g of f f is called a proper divisor provided that its degree is neither 0 nor deg(f deg(f )). Definition. A
Note that this agrees with the definition in a general ring, because polynomials of degree 0 are exactly the units. Definition. A nonzero polynomial f polynomial f is irreducible irreducible provided that its de-
gree is greater than 0 and 0 and it has no proper divisors. It is called reducible provided that its degree is greater than 0 than 0 and it has proper divisors.
160
7. POLYNOMIALS
Thus f is irreducible when Div(f ) consists only of polynomials of the form c or cf where c is a nonzero constant. Lemma 7.35.
If f is irreducible, and f g, then f and g are relatively prime.
If c = 0, then cf g either. Therefore Div(f, g) consists of only nonzero constants, and we conclude that gcd(f, g) = 1.
Proof.
Proposition 7.36.
If deg(f ) = 1 then f is irreducible.
Proof. Suppose f factored
in some way as f = gh. Then
1 = deg(f ) = deg(g) + deg(h). Since the degree of a polynomial is a whole number in N, either deg(g) or deg(h) must be 0, and therefore one of them is not a proper divisor. Lemma 7.37. A
nonzero polynomial f has a root in F if and only if it has a divisor
of degree 1. Proof. If f (c) = 0 for c F , then (x c) f (x) by Corollary 7.20. On the other hand, suppose that some polynomial ax + b divides f with a = 0. Then since a is a unit in F [x], the polynomial x + ab divides f (x), and therefore ab is a root of f by Corollary 7.20 again.
∈
− |
−
Proposition 7.38.
If deg(f ) = 2 or 3 then f is irreducible if and only if it does not have any roots in F . Proof. We
give the proof for deg( f ) = 3; the other case is similar. By the Lemma, if f is irreducible, it does not have any roots in F . Now suppose f doesn’t have any roots in F . Suppose f factored in some way as f = gh. Then 3 = deg(f ) = deg(g) + deg(h). Since these are all whole numbers, the only possibilities for the degree of g are 0, 1, 2, 3. If the degree is 0 or 3 then g is not a proper divisor. If the degree is 1 then f has a root by the Lemma, a contradiction. If deg(g) = 2 then deg(h) = 1 and so f has a root by the Lemma again. Therefore f is irreducible. Example: f = (x2 + 1)2
∈ R[x] is reducible although it doesn’t have any roots. Lemma 7.39. If f is irreducible and f |gh then f |g or f |h. Proof. Suppose
f g. By Lemma 7.35, gcd(f, g) = 1. By the Euclidean Algorithm, there are polynomials a, b so that af + bg = 1. Multiplying this by h we obtain af h + bgh = h. Since f divides each part of the left hand side, it divides h as well. Lemma 7.40.
If f and g are irreducible monic polynomials and f g then f = g.
Proof. Because g is
|
irreducible, f = cg for some c = 0. By comparing leading terms we see that c = 1 and therefore f = g.
2. POLYNOMIALS OVER A FIELD
Proposition 7.41.
If G(x)
factor.
161
∈ F [x] is nonconstant then G has a monic irreducible
Proof. Exercise. Proposition 7.42.
∈ N. Then Div(f n ) = {cf (x)e ; e ≤ n, c = 0 }.
Let f be irreducible and n
Proof. Exercise.
Corollary 7.43. Let f, g be Then gcd(f m , g n ) = 1.
distinct monic irreducible polynomials, and m, n
Proof. Exercise.
∈ N.
Recall from Exercise 3 in Section 1.5 :
∈ |}
Let f, g F [x] with f nonconstant and g = 0. Then ordf (g) = max i N; f i g . Definition.
{∈
Proposition 7.44.
If f is irreducible then g, h
ordf (h).
∈ N then ordf (gh) = ordf (g) +
There is a small issue in stating the Existence part of the Fundamental Theorem of Arithmetic. In the natural number case, it was obvious that there were only a finite number of primes dividing a given N , simply because such primes must be less than N . In the polynomial case, all we know a priori is that a polynomial dividing G must have a smaller degree . If F is infinite, there are infinitely many monic irreducible polynomials, even of degree one! So we pause for a lemma. Lemma 7.45.
Let G be a nonconstant polynomial. There are only finitely many monic irreducible divisors of G. Proof. Let n be the degree of G, and suppose f 1 , . . . , fn+1 are distinct monic irreducible divisors of G. By Lemmas 7.40 and 7.35, these are coprime. Therefore by Proposition 7.33, we see that the product ( f 1 f 2 f n+1 ) G. But the degree of this divisor is clearly greater than the degree of G, so this is impossible.
· ··
|
Theorem 7.46. (Existence)
Let G be a nonconstant polynomial, and let f 1 , . . . , fm be the monic irreducible divisors of G. Let ei = ordf i (G) for all such i. Let c be the leading coefficient of G. Then G(x) = cf 1 (x)e1 f 2 (x)e2
·
·· · f m(x)e
m
.
M (x) = f 1 (x)e1 f 2 (x)e2 f m (x)em . It is easy to see that ordf (M ) ordf (G) for all irreducibles f . We know that f iei divides G for all i, and by Corollary 7.43 we know these factors are pairwise coprime. By Proposition 7.33 we deduce that M G. Thus G = M u for some u F [x]. If u is nonconstant
·
Proof. Write
≥
|
·· ·
∈
162
7. POLYNOMIALS
then by Proposition 7.41 u is divisible by some monic irreducible f . But then ordf (u) > 0 and ordf (G) = ordf (M ) + ordf (u), contradicting that ord f (M ) ordf (G). Therefore u is a nonzero constant. By comparing the leading coefficients we conclude that u = c.
≥
Theorem 7.47. (Uniqueness) Let G be a nonconstant polynomial, and suppose G factors in some way as
G = c f 1 (x)e1 f 2 (x)e2
· · · · f r (x)e , with c ∈ F , all the f i monic irreducibles and ei ∈ N. Then c = c , the f i are all ·
r
the irreducible monic divisors of G, and e i = ordf i (G).
Proof. Obviously the f i at least form a subset of the irreducible monic divisors of G, and the definition of ord implies that e i ordf i (G). It follows that the degree of the RHS of the equation in the corollary is no bigger than the degree of the RHS of the equation in the theorem, and equality of degrees can only hold if we have equality of the ei and ei . By comparing the leading coefficients we conclude that c = c .
≤
2.7. Exercises
(1) Propositions 7.22,7.27, 7.41, 7.42 and Corollary 7.43. (2) A standard exercise in an introductory course in analysis is the following. You are given a polynomial f (x) R[x] and a number a R, and must prove directly that lim x→a f (x) = f (a). This relies on relating the quantities f (x) f (a) and x a . Prove that x a always divides f (x) f (a). (3) Let F be a field, and f (x), g(x) F [x]. Suppose that every root of f is also a root of g. Does f necessarily divide g? Give a proof or a counterexample. (4) Find all the rational roots of the following polynomials. (a) 8x3 36x2 + 54x 27 (b) 30x3 31x2 + 10x 1 3 2 21 4 (c) 23 x6 + x5 21 2 x + 2 x + x 2 (d) 23 x4 + 10x3 + 7x2 + 11 x 3 2 (5) Use the mod 2 Polynomial Sieve worksheet from the 453 website to find all irreducible polynomials in (Z/2Z)[x] of degree six or less. Note the shorthand of writing the coefficients without the variables. For instance, 11001 denotes the polynomial x 4 + x3 + 1. See any patterns? (6) Are there infinitely many irreducible polynomials in (Z/2Z)[x]? Prove or disprove. (7) Suppose F is a field and f F [x] is a polynomial of degree n. Must f have n distinct roots? Can f have more than n roots? (8) Let F3 = Z/3Z be the field with three elements. Here is a list of all the monic quadratic polynomials in F 3 [x] : x2 x2 + 1 x2 + 2 x2 + x x2 + x + 1 x2 + x + 2 x2 + 2x x2 + 2x + 1 Circle the irreducible polynomials and cross out the reducible ones. (9) Here are some quartic polynomials in F 3 [x] :
∈
−
|
− −
−
−
|
| − | ∈
− −
−
∈
−
−
∈
x2 + 2x + 2
2. POLYNOMIALS OVER A FIELD
(10) (11)
(12) (13)
163
x4 x4 + 1 x4 + 2 x4 + x x4 + x + 1 x4 + x + 2 x4 + 2x x4 + 2x + 1 Circle the irreducible polynomials and cross out the reducible ones. Prove the product of two monic polynomials is monic. Let f (x) = a n xn + + a1 x + a0 R[x], and suppose f has at least n + 1 distinct roots. Use linear algebra, notably the theory of the Vandermonde determinant, to prove that f = 0. Prove Corollary 7.20 for a general ring R. If q (x) = a n xn + + a1 x + a0 R[x], and f is a sufficiently differentiable real-valued function on R, let
·· ·
·· ·
∈
∈
q (D)f = a n f (n) (x) +
· ·· + a1f (x) + a0f (x).
Here f (n) denotes the nth derivative of f . Prove that if q = q 1 + q 2 , then q (D)f = q 1 (D)f + q 2 (D)f , and if q = q 1 q 2 , then q (D)f = q 1 (D)(q 2 (D)f ). (14) Suppose that p R[x] is the product of two relatively prime polynomials p = p1 p2 . Prove that any solution to the differential equation p(D)f = 0 is the sum of two solutions f = f 1 + f 2 , where p1 (D)f 1 = 0 and p2 (D)f 2 = 0. [Hint: Apply the Bezout Identity to p 1 and p 2 .]
·
∈
x4 + 2x + 2
164
7. POLYNOMIALS
3. Irreducibility in C[x]
We already know that if F is a field, any polynomial of degree one is irreducible. In this section we will argue that the converse is true in C[x]. We borrow the following result from complex analysis: Theorem 7.48.
(Fundamental Theorem of Algebra, Weak Form) Every non-constant polynomial in C[x] has a complex root. C[x] is a nonconstant polynomial without Proof. (Sketch) Suppose p(x) 1 any roots. Then f (z) = p(z) is a complex-differentiable function defined on all of C. Simple estimates with the triangle inequality show that p(z) diverges as z approaches infinity, and thus f is bounded. Liouville’s theorem (take a course in Complex Analysis) says that any bounded complex-differentiable function defined on all of C is constant. Thus f and therefore p is constant. This is a contradiction.
∈
|
|
| |
∈
≥ − |
∈
Thus if f C[x] and deg(f ) 1, there is some number α C so that f (α) = 0. We know this implies that x α f . So if deg(f ) > 1, then f cannot be irreducible. We conclude that the only irreducible polynomials are linear. The only monic irreducible polynomials are of the form x α, with α C.
−
∈
Together with Unique Factorization, this gives us: Theorem 7.49. (Fundamental
∈
polynomial f
Theorem of Algebra, Strong Form) Every nonzero C[x] of degree n factors uniquely as
· − α1) ·· · (x − αn),
f (x) = c (x where c
∈ C is nonzero and the α i are roots of f .
The αi are of course not necessarily distinct. There are, however, no other roots of f as one can check by evaluating the right hand side, using that C is an integral domain. [partial fractions for C]
4. Irreducibility in R[x]
In this section we will find the irreducible polynomials in R[x]. Let f R[x] be a nonconstant polynomial. Since R[x] is a subring of C[x], we know from the previous section that f has a complex root α C. If α R, then x α f and we are done.
∈
∈
∈
By a previous proposition, we know that α is also a root of f . Suppose α / R, and consider the polynomial g(x) = (x
∈
− α)(x − ¯α).
− |
5. IRREDUCIBILITY IN Q[x]
−
165
·
Note that the coefficients of g are (α + α) and α α, both real numbers. (One can compute this directly or note that they are fixed by complex conjugation.) Therefore g R[x].
∈
Apply the division algorithm in R[x] to f and g. We conclude that there are r, p R[x] so that f = pg +r, with r a constant or linear polynomial. So r = f pg. Now the right hand side of this has two complex roots, α and α. This means r can’t be linear or a nonzero constant, and we conclude that r = 0 so g f .
∈
−
|
Let us summarize. If f R[x] is a nonconstant polynomial, then it has a complex root α. If the root is real, then x α f . If the root is not real, then the quadratic polynomial g f . This means that every polynomial of degree 3 and higher is reducible!
∈
− |
|
So, any irreducible polynomials in R[x] must be linear or quadratic. All the linear ones are irreducible, and the quadratic formula tells us that if a = 0, then ax2 +bx+c is irreducible if and only if b 2 4ac is negative.
−
Theorem 7.50. The
only irreducible polynomials in R[x] are:
(1) Linear polynomials (2) Quadratic polynomials ax 2 + bx + c, where b2 Corollary 7.51. Every
− 4ac < 0.
∈
nonconstant polynomial f R[x] factors into linear and
quadratic polynomials. Can you find a factorization of x 4 + 1 in R[x]? [partial fractions for R]
5. Irreducibility in Q[x]
[This section and the next are somewhat out of order.] The field Q is algebraically much more complicated and interesting than the fields C and R. There are irreducible polynomials of every degree, for instance x n 2 is irreducible for all n. There is no good algorithm for determining whether a rational polynomial is irreducible, but we sketch one method in this section.
−
∈
First note that if f Q[x], one can multiply by a divisible enough integer N to “clear the denominators” of f to get g = N f Z[x]. Then f is associate to g , so f is irreducible if and only if g is irreducible. So we may assume that the f we started with has integer coefficients. The nice thing about this is that one can look at it mod p for various primes p. We must proceed cautiously however, because a polynomial may be reducible in Z[x] but not in Q[x], for instance f = 2x + 2 = 2(x+1) is reducible in Z[x] but not in Q[x]. Since we want to focus on irreducibility in Q[x] here is a new definition. polynomial f viewed as an element of Q[x]. Definition. A
∈
∈ Z[x] is quasiirreducible if it is irreducible when
166
7. POLYNOMIALS
Thus 2x + 2 is a quasiirreducible polynomial in Z[x]. Here is a key fact called Gauss’s Lemma, whose proof we defer until the next section: Lemma 7.52.
Let f Z[x]. If f = gh with g, h rational number c so that cg, 1c h Z[x].
∈
∈
∈ Q[x], then there is a nonzero
For example, x 2 4 = (2x 4)( 12 x + 1) is a factorization in Q[x] which c = 21 turns into a factorization x 2 4 = (x 2)(x + 2) in Z[x].
−
−
− − Corollary 7.53. If f ∈ Z[x] is irreducible, then it is quasiirreducible.
The previous lemma implies that if f factors in Q[x] it also factors in Z[x]. The next step is to study irreducibility in Z[x]. This is a very messy ring, but has lots of nice quotient rings. Consider the “modding out by p” homomorphism ϕ : Z[x] F p [x] for various primes p. Write f for f mod p. The following is a nice way to find irreducible polynomials.
→
∈
Proposition 7.54. Suppose f
Z[x] is nonzero and p does not divide the leading coefficient of f . Suppose further that f mod p is irreducible. Then f is quasiirreducible. Proof. We give a proof by contradiction, since we do not wish to untangle the three hypotheses. Suppose f is not quasiirreducible. This means there are nonconstant polynomials g0 , h0 Q[x] so that f = g 0 h0 . Gauss’s Lemma implies that there are nonconstant polynomials g, h Z[x] so that f = gh. Recall that LT(f ) = LT(g) LT(h). Since p LT(f ), we know p LT(g) and p LT(h). It follows from this that g and h are nonconstant, thus not units. Since f = gh, we conclude that f is reducible mod p. This is a contradiction, so f is quasiirreducible.
∈
∈
For example, we know that p(x) = x4 + x3 + x 2 + x + 1 is irreducible in F2 [x]. The above proposition implies that any quartic integer polynomial congruent to p(x) mod 2, thus with all odd coefficients, is irreducible in Q[x]. For example, 1017x4 + x 3 7x2 + x 33 Q[x] is irreducible. So if you look at your mod 2 polynomial sieve, you can now prove that many integral polynomials are irreducible in Q[x].
−
− ∈
However, it only goes one way. The polynomial x2 + 1, for example, factors as (x + 1)2 mod 2, but is irreducible in Q[x]. Remark: The polynomial 2x2 x is is divisible by x but irreducible mod 2. This is why it is important that p not divide the leading coefficient.
−
6. Z[x]
The arithmetic of the ring Z[x] is a rich interplay between numbers and polynomials. Since Z is not a field, there are polynomials of degree 0 which are not units in Z[x], for example f (x) = 11.
6. Z[x]
167
We no longer have a division algorithm. Proposition 7.55. If
deg(r)
there are polynomials p, r with x2 = 2 p(x) + r(x), then
≥ 2.
Proof. Modulo
2, we have x 2 = r(x).
There is no longer a nice theory of greatest common divisors. For example, the following proposition should give you pause. Proposition 7.56. There
do not exist polynomials f (x), g(x)
xg(x) = 1.
∈ Z[x] with 2f (x) +
Proof. Exercise.
Therefore we cannot expect to use a Euclidean Algorithm in the same way as in Z or, say, Q[x]. On the other hand, we still have a perfectly good function deg, which satisfies deg(f g) = deg(f ) + deg(g) since Z is a domain. We will now develop a theory of ord p for Q[x]. Let p be a prime, and f (x) = a 0 + a1 x + Then ord p (f ) = mini ord p (ai ) . Definition.
{
}
·· · + anxn ∈ Q[x].
For example, if f (x) = 21 + 6x + 47 x2 , then ord2 (f ) = 2, ord3 (f ) = ord7 (f ) = 1, and all other ords of f are 0. (Recall that ord p (0) = .)
− ∞
Let us deduce some properties of the function ord p on Q[x]. Theorem 7.57.
Let p be a prime, and f, g
ord p (g).
∈ Q[x].
Then ord p (f g) = ord p (f ) +
Proof. This
is clear if either f or g is the zero polynomial, so assume this is not the case. Let us first prove the theorem when ord p (f ) = ord p (g) = 0. Note that this happens if and only if f, g Z[x] pZ[x]. Thus neither f nor g is congruent to 0 mod p. Since the ring (Z/pZ)[x] is an integral domain, the product f g is not congruent to 0 mod p. Whence f g Z[x] pZ[x], and therefore ord p (f g) = 0.
∈
−
∈
−
More generally, let f 0 = p− ordp(f ) f and g0 = p− ordp (g) g. Then ord p (f 0 ) = 0 and ord p (g0 ) = 0, so ord p ( p− ordp (f ) f p − ordp (g) g) = 0. It is easy to see that generally ord p ( pk h) = k + ord p (h) for h Q[x], so the previous equation becomes 0 = ord p (f g) ord p (f ) ord p (g), giving the desired result.
−
· ∈
−
Theorem 7.58.
Let p be a prime. If f, g Q[x], then ord p (f +g) Moreover if ord p (f ) < ord p (g) then ord p (f + g) = ord p (f ).
∈
≥ min{ord p(f ), ord p(g)}.
Let f (x) = i ai xi and g (x) = i bi xi . By Proposition ?? we know that for all i, ord p (ai + bi ) min ord p (ai ), ord p (bi ) . By the definition of ord p (f ) Proof.
≥
{
}
168
7. POLYNOMIALS
and ord p (g), it follows that for all i, ord p (ai ) Thus for all i, ord p (ai + bi )
≥ ord p(f ) and ord p(bi ) ≥ ord p(g).
≥ min{ord p(ai), ord p(bi)} ≥ min{ord p(f ), ord p(g)}.
Since ord p (f + g) = mini ord p (ai + bi ) is equal to the LHS of these inequalities for some i, the first part of the proposition holds. Now suppose that ord p (f ) < ord p (g). Say that ord p (f ) = ord p (am ) for some m. Then ord p (f + g) = mini ord p (ai + bi ) ord p(am + b m ), which is ord p (am ) by Proposition ??. This shows that ord p (f + g) ord p (f ) = min ord p (f ), ord p (g) . But the first part of the proposition shows the other inequality, and therefore they must be equal.
{
}≤
≤
{
}
6.1. Exercises
(1) Factor the polynomial x 4 + x2 + 9 into irreducible polynomials in each of the rings Q[x], R[x], C[x], F2 [x], F3 [x], and F5 [x]. For the finite fields F p , the coefficients are considered mod p. (2) Let n N and consider the polynomial f (x) = xn 2 Z[x]. Suppose f factors into f = gh in Z[x], with deg(g), deg(h) 1 . By modding out the coefficients by 2 we get f = gh F2 [x]. (a) What can you say about g and h? (b) What does that say about g (0) and h(0)? (c) Why does this give a contradiction? (d) Prove that f Z[x] is irreducible; the above does most of the work. (3) Find an antiderivative of x41+1 . (4) Let p R[x] be a polynomial with p(α) > 0 for all α R. Prove that there are polynomials q 1 , q 2 R[x] so that p(x) = q 1 (x)2 + q 2 (x)2 .
∈
∈
− ∈ ≥
∈
∈
∈
∈
7. RATIONAL FUNCTIONS
169
7. Rational Functions
Let F be a field. In this section we construct the field of “rational functions” with coefficients in F from the polynomial ring F [x]. We leave many routine details to the reader. X for the set of pairs of polynomials (f : g) with f, g F [x] and g = 0. Consider the following addition and multiplication laws on X : (a : b) + (f : g) = (ag + bf : bg) (a : b) (f : g) = (af : bg). Definition. Write
∈
·
Proposition 7.59. The
following is an equivalence relation on X :
∼ (h : k) if f k = gh. The addition and multiplication laws on X are ∼- invariant. (f : g)
F (x) for the set of equivalence classes of X under the above relation. By the previous proposition, the addition and multiplication laws on X give addition and multiplication laws on F (x). Definition. Write
Proposition 7.60.
Definition.
F (x) is a field.
F (x) is called the field of rational functions on F .
− [0] for the pairs (f : g) ∈ X with f = 0.
Write X
Proposition 7.61. The
function deg(f : g) = deg(f )
is - invariant.
∼
As with rational numbers, we usually write Proposition 7.62.
Let
deg r < deg g so that
f g
∈ F (x).
f g for
− deg(g) from X − [0] to Z
[(f : g)] and f for
f 1.
Then there are polynomials q, r with r = 0 or f r = q + g g
Definition. A
that deg f deg g.
rational function
≥ deg g.
f g
∈ F (x) is called topheavy provided
It is called bottomheavy provided that deg f <
Note that this notion does not depend on the representative class, by Proposition 7.61.
f of the equivalence g
Thus, every rational function may be expressed as a sum of a polynomial and a bottomheavy rational function.
170
7. POLYNOMIALS
7.1. Exercises
(1) Prove the unproven statements above. (2) X does not satisfy two of the ring axioms; which ones? (3) Describe the equivalence class of X which gives the multiplicative identity 1F (x) . (4) Prove that there is no rational function in F (x) whose square is x. (5) Prove that the function ω(f : g) = ω(f ) ω(g) from X [0] to Z is f -invariant. Thus we may consistently define ω = ω(f ) ω(g). g
−
∼
− −
8. Composition of Polynomials
We define composition of polynomials analogously to how we defined multiplication of polynomials. Let R be a ring, and f , g R[x]. We define the composition f g. first when f is a monomial.
∈
Definition.
◦
If f (x) = ax n , then (f g)(x) = ag(x)n .
◦
Here is the recursive definition. Definition.
0 p.t. f = 0, LT(f ) g + f < g p.t. f = 0
◦
f g =
◦
◦
Here are some basic properties of composition, which the reader may verify: Proposition 7.63. We
have:
• (f + g) ◦ h = (f ◦ h) + (g ◦ h). • (f · g) ◦ h = (f ◦ g) · (f ◦ h). • (f ◦ g) ◦ h = f ◦ (g ◦ h). Here is an important point. Proposition 7.64.
and deg(g)
≥
Let R be an integral domain, and f, g R[x]. Suppose f = 0 1. Then f g is nonzero and its degree is deg(f ) deg(g).
∈
◦
·
For the rest of this section we assume that R is a field F . Proposition 7.65. Suppose
v
∈
∈
that u F [x] has degree 1. Then there is a polynomial F [x] of degree 1 so that (u v)(x) = (v u)(x) = x.
◦
◦
Write v = u−1 in the above situation. Polynomials of degree 1 play the role of units here. We now define an analogue to primality or irreducibility, but relative to composition.
8. COMPOSITION OF POLYNOMIALS
∈
171
Let f F [x] with deg(f ) > 1. We say that f is decomposable provided that we may write f = g h, with deg g, deg h < deg f . We say that f is indecomposable otherwise. Definition.
◦
We are ruling out compositional factors of degree 1 because for any degree 1 polynomial u, one always has the trivial decomposition f = (f u) u−1 . Proposition 7.66.
◦ ◦
Let f (x) = x4 + x
∈ Q[x]. Then f is indecomposable.
Here are some basic properties of indecomposability, which the reader may verify. Proposition 7.67. We
have:
• If f is indecomposable and deg u = 1, then f ◦ u and u ◦ f are indecomposable. • If deg(f ) is prime then f is indecomposable. Here is food for thought.
∈
Proposition 7.68.
Let f F [x] with deg f > 1. There are indecomposable polynomials g1 , . . . , gn with f = g1 gn .
◦···◦
It is an interesting question to ask how unique this decomposition is. Certainly, one should distill out the interference of the unit polynomials u. For instance, (g1 u) (u−1 g2 ) = g 1 g2 . Next one must deal with examples such as x 2 x3 = x3 x2 . We do not pursue this interesting question further here.
◦ ◦ ◦
◦
◦
◦
8.1. Exercises
∈
(1) Let R be a ring, and f , g , h R[x]. Let a = ω(f ), b = ω(g), and c = ω(h). Suppose that a 1, and b < c. Let d = b(a 1) + c. Prove that
≥
(2) (3) (4) (5) (6) (7)
−
◦
≡ f ◦ g mod x d .
∈ ∈ ∈ ◦ ◦ ∈
∈
f (g + h)
What happens if a = 0? Let t(x) = x + 1 Q[x]. Prove that if f (x) Q[x] with f t = t f , then f (x) = x + c, for some c Q. Prove that if f, g Q[x] are two quadratic polynomials with f g = g f , then f = g. Let t(x) = x + 1 F 2 [x]. How many polynomials f F 2 [x] can you find with with f t = t f ? Let f (x) = x4 Q[x]. Find two different ways to express f as the composition of two quadratic polynomials. List all the indecomposable quartic (degree 4) polynomials in F 2 [x]. Let f be a quadratic polynomial in Q[x]. Prove that there are degree 1 polynomials u 1 , u2 Q[x] so that (u1 f u2 )(x) = x 2 . Is the analogous fact still true in F 2 [x]? Prove that ω(f g) = ω(f ) ω(g) for nonzero polynomials f, g R[x], when R is a domain. Let R be a ring, and g R[x]. Prove that the subset f g f R[x] is a subring of R[x].
∈
(9)
◦
◦
◦
∈
∈
(8)
◦
◦ ◦
◦
·
∈
∈
{ ◦ | ∈
}
172
7. POLYNOMIALS
∈ ÷ ◦
(10) Let F be a field, and f , g , h F [x], with f , h nonconstant. Suppose that the division algorithm for f g gives a quotient of q and a remainder of r. Show that the division algorithm for (f h) (g h) gives a quotient of q h and a remainder of r h. (11) Let F be a field, f F [x] and r F (x), with both f and r nonconstant. We may define f r exactly as above. Prove that f r F (x) is not constant. (12) Let F be a field, and r, s F (x), with s nonconstant. Define the composition r s. (Define it first for two members of X , then show your definition is -invariant.) (13) (Continuing) Give counterexamples to show that deg(r s) = deg(r) deg(s), and ω(r s) = ω(r) ω(s) in general.
◦
∼
∈ ◦
◦ ÷ ◦
∈
◦ ∈
∈
◦
◦
◦
·
·
9. Chapter 6 Wrap-Up 9.1. Rubric for Chapter 6
In this chapter you should have learned
• The basics of roots and factorization of polynomials. • How to construct the field of rational functions. • Basic properties of composition. 9.2. Toughies
(1) (2) (3) (4) (5) (6)
Find the nilpotent elements in the polynomial ring R = (Z/4Z)[x]. Find all the units in R = (Z/4Z)[x]. Find all the zero divisors in R = (Z/4Z)[x]. Generalize the above three exercises to R = (Z/nZ)[x], where n > 1. What about a more general ring than Z/nZ? The polynomial f (x) = 21 x2 + 21 x Q[x] has the property that f (z) Z for all z Z. Find all polynomials in Q[x] with this property. You may have noticed when doing the mod 2 Polynomial Sieve that if a polynomial a 0 + a1 x + + an xn is irreducible, with a 0 , an = 0 and n > 1 then so is its “reverse” a n + an−1 xn−1 + + a0 xn . Prove that this is true for polynomials in any field. Prove that there is an indecomposable polynomial of every degree (greater than one) in Q[x]. Let F be a field. Any nonzero r F (x) may be written as r = f g with f and g relatively prime. In this case, we call max(deg(f ), deg(g)) the height of r, or height(r). Suppose that r, s F (x) are nonconstant. Prove that height(r s) = height(r) height(s). State and prove an analogue of Proposition 7.68 for rational function composition. In the example of the synthetic division of 6x5 17x4 +9x3 + 8x2 11x 2 by x 16 , we saw that every coefficient of the quotient 6 x4 18x3 + 12x2 + 6x 12 is an integer and divisible by 6. Prove the following general fact:
∈
∈
·· ·
(7) (8)
· ··
∈
◦
(9)
∈
− −
·
∈
−
−
−
−
9. CHAPTER 6 WRAP-UP
173
Let f Z[x] and pq Q a reduced fraction. if f ( pq ) = 0 then every coefficient of the polynomial f (x) (x pq ) is an integer divisible by q .
∈
∈
÷ −
CHAPTER 8
Real Numbers
175
176
8. REAL NUMBERS
1. Constructing R
In an analysis class one needs axioms for the real numbers R. There are various formulations of such axioms, but they all mean that R is a “complete ordered field”, which we will define in the next section. But even after stating the properties you want R to satisfy, there are still two logical quandaries. First, is there such a field? Maybe setting up so many axioms leads eventually to a logical paradox. This fear can only be assuaged if we construct an example of such a field. Second, is there more than one such field? This question is subtle and requires the notion of “isomorphism”, which we defer until later. [As of now unwritten.] In this chapter we construct the real numbers in three different ways, via decimals, Dedekind cuts, and equivalence classes of Cauchy sequences. Each of these has their merit. Decimals are practical for computation and learned at a young age. However operations with decimals are difficult to work into an axiomatic framework. Dedekind cuts give a good framework for proofs, but are a little abstract. The Cauchy sequence approach is the most abstract, but here’s something interesting. If you change the meaning of what “convergence” means, you may get an entirely new field, the p-adic numbers! All three of these approaches involve having some kind of analytic point of view. 2. Ordered Fields
In this section we define the phrase “complete ordered field”, together with other important notions. An ordered field is not simply an ordered set which happens to be a field; the ordering must interact with the ring structure. Let F be an totally ordered set which is also a field. Then F is an ordered field provided that the following properties hold: (1) If a, b, c F then a < b implies a + c < b + c. (2) If a, b, c F then a < b and c > 0 implies that ac < bc. Definition.
∈ ∈
Our only example of an ordered field so far is Q. Let F be an ordered field. Then F is a complete ordered field if every nonempty subset of F which is bounded above has a least upper bound in F . Definition.
Theorem
8.1. Q is not a complete ordered field.
Proof. Consider
{ ∈ Q | a 2 < 2 }. C is nonempty since 1 ∈ C .
the set C = a It is bounded above by 2.
Suppose some rational number q = sup C , and suppose first that q 2 < 2. By the first part of the lemma below there is a number q > q with q C , a contradiction to q being an upper bound. Now suppose that 2 < q 2 . By the second part of
∈
3. DECIMAL EXPANSIONS
177
the lemma below there is a rational number q < q with q an upper bound of C , another contradiction. The only remaining possibility is that q 2 = 2, but we know very well this is impos sible. Lemma 8.2 . If a Q+ satisfies a 2 < 2, If b Q+ satisfies 2 < b2 , then there is
∈
∈
then there is a δ Q+ so that (a + δ )2 < 2. a δ Q+ so that δ < b and 2 < (b δ )2 .
∈
∈
−
ε the first part, put ε = 2 a2 Q+ and δ = min 1, 4a Q+ . ε ε 2 Since δ 1, we have δ 2 δ . Since a 1 we have 4a δ 4ε . 4 . Therefore δ ε Meanwhile, since δ 4a , we have 2aδ 2ε . It follows that (a+δ )2 = a 2 +2aδ + δ 2 a2 + 2ε + 4ε < a2 + ε = 2, as desired.
Proof. For
≤
≤
≤
≥ ≤
− ∈
≤
{
}∈ ≤ ≤
We leave the proof of the second part to the reader.
≤
3. Decimal Expansions
In this section we will discuss the “decimal construction” of the real numbers R. The theory of place values and decimals does not lend itself to pleasant proofs, so we will not give many. We assume the reader is acquainted with basic decimal arithmetic.
D
Let be the set of decimal expansions, i.e., expressions of the form = d n dn−1 d0 .d−1 d−2 d−3 , with ai , di 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 . The “leading digit” an should not be 0 if n N. We call di the “ ith digit” of . We say is terminating if there is an Z so that if i < , then di = 0. In this case d is called the last digit of . Definition.
· ··
· ·· ∈ ∈
∈ {
}
For example, we have the decimal expansion 1.3333333 with repeating 3 s. Often one writes this as 1.3 for brevity. This example corresponds to the rational number 4 3.
· ··
Remark: These will give nonnegative real numbers, which are complicated enough...
∈
For k Z write e k for the expansion whose digits are zero except at the kth place. For example e −3 = 0.0010. (In fact e k = 10k .)
∈
·· ·
···
Let k N. If = d n dn−1 d0 .d−1 d−2 d−3 is a decimal expansion, then its k th truncation k is the expansion dn dn−1 d−k 0. Definition.
·· ·
For example, π 0 = 3 and π 2 = 3.14.
Certainly if k < , and you know , then you know k as well, by truncating sooner. In fact, k = k .
∈
Consider an expansion D. Since you can always truncate a truncation as above, knowing the digits of is the same as knowing the terminating decimals k for sufficiently large k. We will soon define addition in by describing what the k th truncations of a sum should be for k arbitrarily large.
D
178
8. REAL NUMBERS
Defining the addition of two decimal expansions is a little tricky. Suppose you have two expansions = dn dn−1 d0 .d−1 d−2 and = d n dn−1 d0 .d−1 d−2 , and want an expansion for + . The basic idea is to add the corresponding digits, but if they add up to more than 9, then carrying is involved.
· ··
·· ·
·· ·
···
If and terminate, then addition as usual by starting with the smallest place where one of them is nonzero and adding vertically as usual, possibly with carrying. We do not give more detail here, but it is commutative and associative. If neither and terminate, then we have to think a little. For example, consider the addition problem 58.793 + 41.206
· ·· · ··
The digits of the sum depend on the omitted digits to the right! If the other digits are all 0 s, for instance, then the sum is the terminating expansion 99 .9990 If the digits in the next place add to more than 10, then the expansion starts as 100.000 . . .. In reality, these two potential sums correspond to rather close numbers, so the difference is mild. But to write down a general addition rule with digits takes some willpower. Here goes. two expansions and as above. Let i Z be a place. Let s i be the sum of the digits in the ith place of and . If si 8, then i is called simple. If s i 10, then i is called enhanced. If s i = 9, then i is called precarious. Definition. Fix
≤
≥
∈
[Example] Let and , and k N. Suppose we want to specify + k . If we specify + for some > k, that’s even better.
∈ D
∈
Case I: Some place > k is simple. In this case, put
+ −1 = −1 + −1. Case II: Some place > k is enhanced. In this case, put
+ −1 = (−1 + −1) + e−1. Case III: Every place > k is precarious. In this case, put
+ k = k + k . Note that we can do all the additions on the right because the expansions involved terminate.
D
The above gives an addition law on . It is easy to see that it is commutative, since it is commutative for terminating decimals, and since the sum s i does not depend on the order. If someone has a nice argument for why the law is associative, drop me a line!
3. DECIMAL EXPANSIONS
179
What about multiplication? If we wish to multiply a decimal expansion by a single digit d in 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 , we can simply add to itself d times (Or get 0.0, if d = 0). (Alternatively one could develop a theory of multiplication of digits and carrying as taught in school.)
{
}
Multiplication of by one of the ei is performed by shifting by i places [which direction?]. If we wish to multiply a decimal expansion by a terminating expansion τ , we can write τ as a finite sum i di ei , where the di are digits. Then put τ = i (di (ei )). Note this is perfectly defined by the above. One should in principle check now various properties like distributivity, commutativity, and associativity with the appropriate inputs (for instance, if the inputs are all terminating).
× ×
×
Some difficulty comes when we consider multiplying a nonterminating expansion by a nonterminating expansion. Consider, for instance, the multiplication of 2.2 with 0.3:
× + + + +
2.22222 0.333333 0.666666 . . . 0.066666 . . . 0.006666 . . . 0.000666 . . . .. .
··· ·· ·
? Do you know any of the digits of the result? It looks like the tenths place of the result is going to be a 7, because the 6 + 6 in the hundredths place adds up to a 12, and one carries the 1. But what if there’s a massive amount of carrying before the hundredths place, and somehow an 8 gets carried, leading to 6 + 6 + 8 = 20?! As you go to lower and lower places in this multiplication, you can see that the carrying increases without bound. At the 10−100 place, for example, the sum is at least 600, which means that the carried number is 60! I hope you can see from this example that developing an algorithm where you input the (infinitely many) digits of two decimal expansions and output the digits of their product would be a chore. You know something like
× k = k · k should be true in the limit , even if it’s not exactly true as it stands. Addition was simple enough to trudge through with case analysis, but if you want to have a real number system with clearly defined arithmetic operations one can work with in an intelligible way, decimal expansions are not the way to go. Remark: By the way, the above example is comprised of repeating decimals for 1 simplicity. The clever reader knows this problem is really 20 9 and 3 and so the 20 product is 27 = 0.740. But the point was to find a rule that works for all decimal expansions.
180
8. REAL NUMBERS
Here is another problem. How do you subtract 0 .9 from 1.0? We haven’t discussed subtraction yet, but the only reasonable decimal expansion which could be the difference is 0.0. On the other hand, the rule for addition gives 0.9 + 0.0 = 0.9. At no point in our definition of did we say that 0.9 = 1.0, so they are different elements of . This is not an isolated occurence either; any time you have a terminating expansion like 0.340 there is a repeating 9s expansion like 0.339 lurking in its shadow.
D
D
∼
This is not an insurmountable problem. One can define an equivalence relation on by saying any terminating decimal is equivalent to its repeating 9s “shadow”. For example, 45.60000 . . . 45.5999 . . .. Thus equivalence classes have either 1 or 2 expansions; 1 if it’s neither terminating nor repeating 9s, and 2 if it is. This is the formal way of saying such expansions should be equal.
D
∼
Then you can define the nonnegative real numbers R ≥0 to be the set of equivalence classes. One patiently checks that addition is -invariant, and if anyone gets around to defining multiplication, subtraction, and division, they can check that those are also well-defined. The next step is probably to define the notion of <, and then define negative real numbers and their arithmetic.
∼
At the end of the day you’ve done a lot of work, but you’ve kept in touch with your roots as a student of the decimal system.
4. Dedekind Cuts 4.1. Positive Real Numbers
For simplicity we will actually construct the positive real numbers R+ first. One can then form R just as we formed Z from N. Let C be a subset of Q+ . Then an upper bound of C is an element x Q + so that c x for all c C . The set C is considered bounded above if there is an upper bound of C . If there is an element x C which is an upper bound of C , then x is called a maximum of C . Definition.
∈
≤
∈
∈
It is easy to see that a set has at most one maximum. A cut is a nonempty subset C of Q+ which is bounded above, but has no maximum, and satisfying the following property. If a C and b is a positive rational number so that b < a, then b C . Definition.
∈
∈
We occasionally refer to the last condition as C being “left closed”. In other words, a cut is an open rational line segment whose left endpoint is 0. Intuitively, the real numbers are the right endpoints of these cuts. Definition.
∈ Q+ let C q) = {a ∈ Q+ | a < q }.
If q
In fact, C q) = (0, q ), with the understanding that (0, q ) = x
{ ∈ Q | 0 < x < q }.
4. DEDEKIND CUTS
Proposition
181
8.3. C q) is a cut.
that C q) is nonempty since 2q C q) . Also, C q) is bounded above a+q a+q by q . If a C q) then the average a+q C q , 2 satisfies a < 2 < q , and therefore 2 so a is not the maximum of C q) . Thus C q) has no maximum. It is easy to see that C q) satisfies the rest of the definition of cut, by the transitivity of inequality.
∈
Proof. Note
∈
∈
Are there any other cuts of Q + ? 8.4. Let C = a Q + a 2 < 2 . Then C is a cut, but is not equal to C q) for any positive rational q .
{ ∈
Proposition
|
}
∈
C is nonempty since 1 C . It is bounded above by 2. Suppose 1 a C . By Lemma 8.2, there is a positive number δ > 0 so that a + δ C . Therefore a cannot be a maximum of C . It is easy to see that C satisfies the rest of the definition of cut. Proof.
≤ ∈
∈
Suppose C = C q) for some rational number q , and suppose first that q 2 < 2. By the first part of Lemma 8.2 there is a number q = q + δ with q / C q) but q C , a contradiction. Now suppose that 2 < q 2 . By the second part of Lemma 8.2 there is a number q = q δ Q+ with q C q) but q / C , another contradiction.
∈
− ∈
∈
∈
∈
The only remaining possibility is that q 2 = 2, but we know very well this is impos sible. As you may have guessed, this cut C will correspond to the irrational number Lemma 8.5 .
Let C be a cut and ε so that 0 < q p < ε.
−
√ 2.
∈ Q+. Then there are numbers p ∈ C and q ∈/ C ∈
Proof. Since C is
nonempty we have an element a C . Write a as a fraction m of integers . By multiplying numerator and denominator by a sufficiently large n integer, we may assume n > ε −1 . Now let S =
∈ i
i N n
| ∈ C
.
Since m S it is nonempty. Let b be an upper bound of C ; then nb is an upper bound of S . By Well-Ordering, S has a maximum element M . Then it is easy to M +1 see that p = M satisfy the conditions of the lemma. n and q = n
∈
Definition. Write R + for
the set of all cuts.
First we show how R + is an ordered set.
≤ C 2 if C 1 is a subset of C 2. Of course, we define C 1 < C 2 if C 1 ≤ C 2 but C 1 = C 2 . We also define >, ≥ in the Definition. Say
that C 1
usual way, using inclusion the other way around.
182
8. REAL NUMBERS
Proposition 8.6 . (Trichotomy)
≤ C 2 or C 2 < C 1.
If C 1 and C 2 are cuts, then C 1
Proof. If C 1 is not a subset of C 2 , there is an element x1 C 1 which is not in C 2 . Let x 2 C 2 . If x 2 x1 then x 1 would be a member of C 2 by the definition of cut. Since this is impossible we must have x 2 < x1 . So by the definition of cut, x2 C 1 . Therefore C 2 < C 1 .
∈
∈
≥
∈
We will soon define addition, multiplication and division for R + . But first we will define the “sup” operation. Then the union C ∗ =
R+ is a nonempty set of cuts which is bounded above. S C is a cut.
∈ ⊂
Theorem 8.7. Suppose S C
∗ for the union. It is obviously nonempty and bounded above. Suppose C ∗ had a maximum m ∈ C ∗ . There must be a cut C ∈ S with m ∈ C , and it is easy to see that m is a maximum of C ∗ , a contradiction. This shows that C ∗ does not have a maximum. Suppose a ∈ C ∗ , and b < a. Then a ∈ C for some C ∈ S , and therefore b ∈ C ⊆ C ∗ . This shows that C ∗ is left-closed. Proof. Write C
We will give C ∗ a better name: sup S . If S R+ is a nonempty set of cuts which is bounded above, then the union of C S is called the supremum of S . Write sup S R+ for this union.
⊂ ∈
Definition.
∈
Note that
• C ≤ sup S for all C ∈ S . • If D is an upper bound of S , then sup S ≤ D. ≤
⊆
Here is the reasoning for the second point: Recall that C D means C D as a set of rational numbers. If C D for all C S , then C ∈S C D, which translates to sup S D.
⊆
≤
∈
⊆
If C 1 and C 2 are cuts, then let C 1 + C 2 = {x1 + x2 | x 1 ∈ ∈ C 2}. Also let C 1 · C 2 = {x1 · x2 | x1 ∈ C 1, x2 ∈ C 2}.
Definition.
C 1 , x2
Proposition
8.8. If C 1 and C 2 are cuts, then C 1 + C 2 is a cut.
Proof. It
is easy to see that both sets are nonempty. If b1 and b2 are upper bounds of C 1 and C 2 then b 1 + b2 is an upper bound of C 1 + C 2 . Suppose m is a maximum of C 1 + C 2 . Since m C 1 + C 2 , it may be written as m = x 1 + x2 with xi C i (i = 1, 2). Since C 1 is a cut, x1 is not an upper bound of C 1 , and therefore there is a number y1 > x1 in C 1 . Then y1 + x2 C 1 + C 2 , contradicting the maximality of m.
∈
∈
∈
∈
∈
Let x 1 + x2 C 1 + C 2 with x i C i , and suppose y < x 1 + x2 . Then by Exercise 7 in Section 2.3, we may write y = y 1 + y2 with y 1 < x1 and y 2 < x2 . Since C 1 and C 2 are cuts, y 1 C 1 and y 2 C 2 , and therefore y C 1 + C 2 .
∈
∈
∈
4. DEDEKIND CUTS
183
∈ R +,
8.9. (Cancellation Law of Addition) Suppose C 0 , C 1 , and C 2 and that C 0 + C 1 = C 0 + C 2 . Then C 1 = C 2 . Proposition
Proof. We argue by way of contradiction. Suppose that they are not equal. By Trichotomy, we may assume that C 1 < C 2 (the case C 2 < C 1 is similar). Then there is a number a C 2 C 1 . Since a is not the maximum of C 2 , there is also a number b C 2 with a < b. Let ε = b a. By Lemma 8.5, there are numbers x, y with x + ε < y, x C 0 , and y / C 0 . Note that x + b C 0 + C 2 . Since C 0 + C 1 = C 0 + C 2 , we must be able to write x + b = x 0 + x1 , with x0 C 0 and x1 C 1 . Since a = b ε is not in C 1 , we have x 1 < b ε. Thus x + b x0 < b ε, or x + ε < x0 . However, x + ε > y > x0 , a contradiction.
∈ − ∈ −
∈
∈
∈
−
∈
−
−
∈
−
Let us check that for rational cuts C q) , we recover addition in Q + . Proposition 8.10.
∈ Q+, then C q ) + C q ) = C q +q ) .
If q 1 , q 2
1
Proof. First
2
1
2
we check that C q1 ) + C q2 ) C q1 +q2 ) . If x C q1 ) and y C q1 ) , then x < q 1 and y < q 2 . Therefore x+y < q 1 +q 2 , which shows that x+y C q1 +q2 ) . To check the other inclusion, suppose that z C q1 +q2 ) . Then z < q 1 + q 2 . By Exercise 7 in Section 2.3 again, we may write z = x + y with x < q 1 and y < q 2 . This shows that z C q1 ) + C q2 ) , as desired.
⊆ ∈
∈
∈
∈
∈
Proposition 8.11.
If C 1 and C 2 are cuts, then C 1 C 2 is a cut.
·
Remark: This Proposition would not be true if we allowed the negative numbers in our cuts. Proof. Exercise. Proposition 8.12.
∈ Q+, then C q ) · C q ) = C q ·q ).
If q 1 , q 2
1
2
1
2
Proof. Exercise. Proposition 8.13. Suppose
and C 1 C 0
· ≤ C 2 · C 0.
≤ C 2, and C 0 ∈ R+. Then C 1 + C 0 ≤ C 2 + C 0
that C 1
Proof. Exercise. Proposition 8.14.
·
If C is a cut, then C C 1) = C .
· ⊆ ∈
Proof. First we check that C C 1) C . If x Since C is left-closed, this implies that xy C .
∈ C and y ∈ C 1), then xy < x.
Next, suppose that x C . Since C does not have a maximum element, there is a x1 C with x < x 1 , and therefore xx1 C 1) . Thus x = x 1 xx1 C C 1) . It follows that C C C 1) .
∈
∈
⊆ ·
∈
· ∈ ·
Let C be a cut, and write S = C nonempty and bounded above. Lemma 8.15.
{ ∈ R+|CC ⊆ C 1)}.
Then S is
184
8. REAL NUMBERS
nonempty, some x C . Then S is bounded above by x −1 . Suppose C is bounded above by b. Then C b) S , so S is nonempty.
∈
Proof. Since C is
Proposition 8.16.
∈
Let C be a cut, and C ∗ = sup S , where S is the set in the above
lemma. Then CC ∗ = C 1) . Proof. It
is easy to see that CC ∗ =
other inclusion.
∈
∈ C
−
⊆ C 1) ; we must show the
S CC
∈
∈
Let q C 1) so that 0 < q < 1 and put ε = 1 q . Let x C and y / C as in the lemma below. Since y / C the cut C y 1 ) is in S . Therefore xy −1 CC y 1 ) CC ∗ . But then 1 xy−1 < 1 q implies that q < xy −1 . Since CC ∗ is a cut, we must have q CC ∗ as well.
∈
∈
−
−
−
∈
⊆
−
Lemma 8.17.
Let C be a cut, and ε y / C so that 0 < 1 xy −1 < ε.
∈
−
∈ Q+.
Proof. Since C is
Then there are elements x
∈ C and
∈
nonempty there is a number a C . By Lemma 8.5, there are numbers x C and y / C so that 0 < y x < aε. Since C is left-closed, a < y , and therefore y x < yε. Dividing this inequality by y gives the lemma.
∈ −
∈
−
Let C be a cut. Then write C −1 for the cut C ∗ from the previous proposition. Definition.
Thus C −1 is a cut so that C C −1 = C 1) . Proposition 8.18. Commutativity
and Associativity of addition and multiplication
hold for cuts, as does Distribution. 4.2. Exercises
(1) The rest of Lemma 8.2 and Propositions 8.11 and 8.12. (2) Let C 1 and C 2 be cuts with C 1 < C 2 . Prove that there is a rational number q so that C 1 < C q) < C 2 . (3) Prove that if C is a cut, then C = supq∈C C q) . (4) If C is a cut, is the set y Q+ xy < 1 for all x C necessarily a cut? −1 = C 1 . (5) Let q Q+ . Prove that C q) q ) (6) Let C be a cut. Prove that there is a cut D so that D 2 = C . (7) Definition 5 of Book V of Euclid’s Elements reads, “Magnitudes are said to be in the same ratio, the first to the second and the third to the fourth, when, if any equimultiples whatever are taken of the first and third, and any equimultiples whatever of the second and fourth, the former equimultiples alike exceed, are alike equal to, or alike fall short of, the latter equimultiples respectively taken in corresponding order.” Explain how this is essentially the notion of a cut. (Remark: Online commentary on this topic is confusing and irrelevant to this problem, so don’t bother reading it.)
∈
{ ∈ |
−
∈ }
4. DEDEKIND CUTS
185
4.3. Additive Identity
In this section we add the element 0 to R+ . At this point you can forget the definition of cuts; all we need is the properties we’ve been accumulating.
·
set P with operations + and is called a prefield, if it satisfies the following properties: (1) Associativity, Commutativity, and Distributivity of Addition and Multiplication (2) Cancellation Law of Addition (3) Existence of Multiplicative Identity 1 (4) Existence of Multiplicative Inverse (5) Nontriviality (there is an element which is not 1) Definition. A
For example, Q+ and R+ are prefields. If F is an ordered field, then the positive elements form an ordered prefield. prefield P with a total ordering < is called an ordered prefield if the following properties hold. Let x, y,z P . If x < y then x + z < y + z. If x < y then xz < yz. Definition. A
∈
If F is an ordered field, then the positive elements form an ordered prefield. One of the properties a prefield is missing is an additive identity. Actually the definition precludes such an element: Lemma 8.19.
Let P be a prefield, and x, y
∈ P . Then x + y = x.
Proof. Suppose 1 1
this were the case. Multiplying by the inverse of x gives
1 + x− y = x − y. Since P is nontrivial, there is an element 1 = z ∈ P . Adding z 1 1 − − to both sides of the equation gives 1 + x y + z = x y + z. By the Cancellation Law we obtain (3)
1 + z = z.
Adding 1 to both sides of this equation and cancelling zs gives 1 + 1 = 1. By Distribution it follows that z + z = z 1 = z. Substituting z + z into the right hand side of Equation (3) gives 1 + z = z + z. By Cancellation we obtain 1 = z, a contradiction.
·
Let P be a prefield. Then write P = P 0 for the set obtained from P by adding a new element 0 . Extend the addition, and multiplication from P to P as follows. (1) 0 + x = x + 0 = x for all x P . (2) 0 x = x 0 = 0 for all x P . Definition.
{ }
·
·
Proposition 8.20.
∈
∪ { }
∈
If P is a prefield (resp. an ordered prefield), then the new set P has all the properties of a prefield (resp. and ordered prefield), except that 0 does not have a multiplicative inverse.
186
8. REAL NUMBERS
Proof. Let us check the Cancellation Law of Addition. If x + 0 = y + 0, then clearly x = y. Suppose that 0 + x = y + x. Then by the above lemma, y / P and therefore y = 0. The other properties the reader should check, with or without a pencil.
∈
If P is an ordered prefield, we may extend the ordering on P to P simply by saying 0 < x for all x P . Definition.
∈
It is easy to see that this is an ordering on P . 4.4. Adding the Negatives
In this section we complete the construction of R. We are working in analogy with the construction of the integers Z from N; the reader may wish to review that section for motivation and proofs. Let P be a prefield. Denote by (P ) the set of “arrows” a, b], where a, b P . Arrows x, y] and a, b] (P ) are congruent if x + b = a + y.
A ∈ A
Definition.
∈
Proposition 8.21. Congruence
is an equivalence relation.
Proof. Exercise.
Your proof should involve the Associative and Commutative Laws for Addition, and the Cancellation Law. For convenience we have again the following proposition. Proposition 8.22.
Every arrow is congruent to an arrow where at least one of the
components is 0. Proof. Exercise.
Every arrow is thus equivalent to x, 0], 0, x], or 0, 0], for some x the arrows in the first case positive and in the second case negative. We have the same rules for adding and multiplying arrows in
∈ P .
We call
A(P ):
For a,b,c,d P , put a, b] + c, d] = a + c, b + d] and . c, d] = ac + bd, ad + bc]
∈
Definition.
a, b] ·
Proposition 8.23. Addition Proof. Omitted
A(P ) is congruence-invariant.
and multiplication in
for now.
A
∼
set of equivalence classes of (P ) under = is denoted by P [Z]. We give P [Z] the addition and multiplication laws coming from those in (P ). Definition. The
A
Theorem 8.24.
If P is a prefield then P [Z] is a field.
4. DEDEKIND CUTS
Proof. Omitted
187
for now.
For x
∈ P , write x for the equivalence class of x, 0] and −x for the class of 0, x]. As in the case of Z, P [Z] is the union of the xs, the −x’s, and 0. Therefore as a set, P [Z] = P ∪ −P ∪ {0}. This is how we would usually think of it. Definition. Suppose P is
an ordered prefield. We define an ordering on P [Z] as follows: Let x, y P [Z]. If x, y P then use the ordering on P . If x, y P then say x < y if and only if y < x. If x P and y P then x < y .
∈
∈ −
∈ −
−
∈
∈
Theorem 8.25.
If P is an ordered prefield then P [Z] is an ordered field. If P is a complete ordered prefield then P [Z] is a complete ordered field. Proof. Omitted Definition.
for now.
Let P = R+ . Then we denote by R the complete ordered
field P [Z]. 4.5. Exercises
{ ∈ F (X, R)|f (x) > 0 for x ∈ X }.
(1) Let X be a nonempty set, and P = f Show that P is a prefield.
CHAPTER 9
Miscellaneous
189
190
9. MISCELLANEOUS
1. An ODE Proof
If a function is equal to its own derivative, what is it? Certainly f (x) = ex is such a function, but are there any others? Of course f (x) = 0 will do, and if you think about it, any multiple of ex is also equal to its own derivative. How about f (x) = ex+1 ? Is that another example? Can clever people forever produce more examples you haven’t thought of ? It is a very important and applicable fact that every such function is a multiple of e x . The simplest differential equations given to us by the real world are all of the form y = ky for k a constant, and it is crucial to pin down solutions to this kind of differential equation. Let us write our statement formally as a proposition: 9.1. Let f : R R be a differentiable function so that f (x) = f (x) for all real numbers x. Then there is a real number C so that f (x) = Cex for all x. Proposition
→
We shall call this two-sentence assertion a “Proposition”. The proposition divides into two parts: the first sentence is the hypothesis and the second statement is the conclusion. The hypothesis describes the assumption we expect to use to prove the conclusion. f (x) . Note that this division makes ex ex f (x) f (x)ex sense because ex is never zero. By the quotient rule, g (x) = . 2x e Since f (x) = f (x) for all x, we also have e x f (x) f (x)ex = 0 for all x. Therefore f (x) g (x) = 0; therefore g is a constant function. Call the constant C ; thus x = C , e and it follows that f (x) = Cex , as claimed. Proof. Consider
the function g(x) =
−
−
This puts the issue to rest. We do not need to worry about any clever people thinking up other solutions, or more importantly whether our real-world phenomenon modeled by y = y can be anything other than a multiple of the exponential function. (What about f (x) = e x+1 ? If you’re really stuck try using the proof to find the C .) I want to now present a “bad proof” of the proposition above. “Proof:” Let y = f (x). We have
dy = y, dx
so
1 dy = dx. y
(4) So,
1 dy = y So, for some integration constant C 1 ,
dx.
log y = x + C 1 .
||
1. AN ODE PROOF
191
Thus,
|y| = ex+C . 1
So y =
±exeC , 1
and therefore y = Cex , where C =
±eC . 1
This “argument” is well-known to many students, although at a key step it the argument is nonsense. That step is at Equation (4), where we have multiplied both sides by dx and divided both sides by y. Dividing both sides by y is a mild sin, since it could be zero. But multiplying both sides by dx is the real sham. What is “dx”? Is it a number? Is it a function? There is no answer to this question. Proponents of this proof will tell you something like, “It is an infinitesimal quantity that approaches 0.” They are usually confused about the 0/0 indeterminate form of a limit, and are essentially multiplying both sides of the equation by 0. This confusion is passed on to the students. Is it harmful to treat the dx as a mathematical object that you can multiply or divide by? Definitely; once we learn about partial derivatives, we encounter formulas of the following kind. Suppose that f (x, y) is a real-valued function of two variables, and x(t), y(t) are real-valued functions of one variable. View f = f (x(t), y(t)) as a function of the variable t. Then: ∂f ∂f ∂ x ∂ f ∂ y = + . ∂t ∂x ∂t ∂y ∂t
·
·
If ∂x and ∂y were independent mathematical quantities, then one could cancel them and be left with the paradoxical and very incorrect ∂f ∂f = 2 . ∂t ∂t Thus, students who have learned the “wrong” proof have bad intuition and are now completely confused about partial derivatives. It is better not to memorize mumbo-jumbo, and best to learn real proofs.
1.1. Exercises
(1) Suppose that f : R R is a differentiable function so that f (x) = f (x) for all real numbers x. Prove that there is a real number C so that f (x) = Ce−x for all real numbers x. (2) Suppose that f is a real-valued differentiable function with domain (0, ) so that xf (x) = f (x) for all x > 0. Prove that there is a real number C so that f (x) = Cx for all positive real numbers x.
→
−
∞
192
9. MISCELLANEOUS
2. Pythagorean Triples
Around 1800 B.C.E. a Babylonian clay tablet was found recording triples of numbers such as (3, 4, 5), (8, 15, 17), (6, 8, 10), (5, 12, 13). Mathematicians recognize these as integer solutions to the equation a 2 +b2 = c 2 . The triples of integers (a,b,c) solving this equation are called “pythagorean triples”. I want to allow negative solutions as well, so notice that any solution (a,b,c) also gives solutions ( a, b, c) and also ( b, a, c).
± ± ±
± ± ±
How do you get all of the solutions? We will use a technique called “algebraic geometry” to study this problem. This section will bring in more powerful ideas from than you may be comfortable with. Some facts about integers will be proved rigorously much later. Don’t get too nervous, just sit back and enjoy the ride. It is good to be occasionally exposed to deeper mathematical thought.
The first observation is that if (a,b,c) is a pythagorean triple with c = 0, then (x, y) = ( ac , bc ) is a rational point on the circle C : x 2 + y 2 = 1. (What happens if c = 0?) By a “rational point” P on the plane we mean a point whose coordinates are both rational numbers. Proposition 9.2 .
If P , Q are rational points in the plane, with different abscissas, then the line connecting them has rational slope. Proof.
Let P = (x1 , y1 ) and Q = (x2 , y2 ). The slope of the line connecting
them is
− −
y2 y1 . x2 x1 Since x 1 , x2 , y1 , y2 are all rational numbers, so is m. m =
−
∈
∈
Let P = ( 1, 0) C . If Q = P , and Q C has rational coordinates, then the slope of P Q is rational. The winning geometric idea is to go backwards: Let t Q and consider the line through P with slope t. Then will intersect C in some other point Q. Our calculation will show that Q is also a rational point, and all rational points on C , besides P , are obtained this way. [Read this paragraph a few times if you don’t get it at first.]
∈
Here is the calculation. A line through P with slope t is given by the equation y = t(x + 1). If (x, y) satisfies both this equation and also x2 + y 2 = 1, then x2 + t2 (x + 1)2 = 1. Solving this with the quadratic equation gives
−t2 ± 1 , meaning (x = −1) ∨ x = 1 + t2
Of course x =
1 t2 x = 1 + t2
−
.
−1 gives the point P which we already know. The other value gives 1 − t2 2t Q = , .
1 + t2 1 + t2
Let us record our work as a proposition.
2. PYTHAGOREAN TRIPLES
Proposition 9.3. All
for t
193
rational points on the unit circle C are of the form
∈ Q, or ( −1, 0).
−
1 t2 2t 1+t2 , 1+t2
Any rational number is of the form t = m Z. More precisely, we n for m, n may take m, n to be in lowest terms, so that gcd( m, n) = 1. (We will talk more carefully about gcds later, but you should have an idea what this means.) When we substitute this for t in the formula we get
∈
Q =
m2 n2 m2 n2
− 1
1+
,
2m n 1+
m2 n
=
2mn n2 m2 , n2 + m2 n2 + m2
−
.
At this point it should be obvious to you how to find some ( a,b,c) so that a n2 m2 b 2mn = 2 and = 2 ; 2 c n +m c n + m2
−
surely one takes a = n2 m2 , b = 2mn, and c = m2 + n2 . And indeed you can and should check that (n2 m2 )2 + (2mn)2 = (m2 + n2 )2 . Do we have a proof that all Pythagorean Triples are of the form ( a,b,c) = (n2 m2 , 2mn,m2 + n2 ), with m, n integers? If you reflect on this for a moment, you’ll notice something is amiss. To start with, we’re not getting the solution (4, 3, 5), because 3 is odd and 2mn obviously has to be even. We’re also neither getting (3, 4, 5) because m2 + n 2 is obviously positive. Thirdly, notice that this doesn’t either give the solution (9, 12, 15) since you can’t write 15 as the sum of two integer squares.
−
−
−
−
None of these exceptions is insurmountable, but we need to think more carefully about our argument. Our “logic debugging” strategy will be to take these examples and trace through the logic with these numbers. Let’s start with (9, 12, 15). If we 9 12 convert this to a rational point on the circle, we get ( 15 , 15 ) which is equal to ( 35 , 54 ), which is also the point corresponding to (3, 4, 5). And this forces us to notice that (9, 12, 15) = (3 3, 3 4, 3 5). Now if (a,b,c) is a pythagorean triple, then so is (n a, n b, n c) for any integer n. Let us say that a pythagorean triple (a,b,c) is primitive if the only common (positive) integer divisor of the three numbers a,b,c is 1. Then every pythagorean triple (except (0, 0, 0)) is an integer multiple of a primitive pythagorean triple. So nothing is lost if we find all primitive ones. Similarly let’s now just look for ones with positive integers, since we can anyway just later include all the possible signs.
·
·
·
·
·
·
We are still bothered by the “counterexample” (4 , 3, 5), because it was supposed to be covered by this geometric method. Let’s go through the program with this example. It corresponds to the point ( 45 , 53 ) in C . The slope of the line connecting this point to P is t = 31 (right?). So we were supposed to get this point from m = 1 and n = 3. Here’s what happened: Plugging these values into the last expression for Q gives 32 1 2 3 8 6 Q = , 2 = ( , ). 2 3 +1 3 +1 10 10
−
·
So you see what happened? These fractions both reduce further to ( 45 , 53 ), and then we get our primitive pythagorean triple (4, 3, 5). The point is that even though
,
194
9. MISCELLANEOUS
t= m n was a reduced fraction, the formula for the coordinates of Q did not give reduced fractions. So here’s what happens in general. There are two cases. Take t = fraction.
m n a
reduced
Case I: If m or n is even, then the fractions
n2 m2 2mn , 2 2 2 n + m n + m2 are in lowest terms. These correspond to the primitive pythagorean triples ( (n2 m2 ), 2mn, (m2 + n2 )).
−
± −
±
±
Case II: If m and n are both odd, then these fractions are in lowest terms after
dividing by 2. In other words, Q =
1 2 2 (n 1 2 2 (n
− m2 ) ,
mn 1 2 2 + m ) 2 (n + m2 )
.
These correspond to the primitive pythagorean triples ( 12 (n2 m2 ), mn, 12 (m2 + n2 ))
±
−
Exercise: 2
(2uv,u
−
±
±
Show that by substituting m = u v , u2 + v 2 ). 2
− v, n = u + v we get the solutions