Integral Calculus

MATH 10A – METHODS OF MATHEMATICS: CALCULUS, STATISTICS AND COMBINATORICS L. Pachter, B. Sturmfels and L.C. Evans Department Department of Mathematics Mathematics University University of California, California, Berkeley Berkeley

October 25, 2015

Overview of Part 2: Integral calculus

The main references for this part are  Sebastian J. Schreiber, Karl Smith and Wayne Getz, Calculus for Life Sciences 1E for UC Berkeley , Wiley  J. Stewart, Calculus Stewart, Calculus , 7th edition, Cengage  C. Neuhauser, Calculus Neuhauser, Calculus for Biology and Medicine , 3rd edition, Prentice Hall

1. Histograms 2. Integrals and area 3. Approximation methods 4. Applications of integration 5. Antiderivatives, Fundamental Theorem of Calculus 6. Integration techniques

Section 1 Histograms

A. Displaying data DEFINITION A histogram is a graphical representation providing a visual impression of the distribution of data. It consists of adjacent rectangles, erected over given intervals, with areas with areas equal to the proportion of the observations in each interval . A Histogram

0 3 . 0 5 2 . 0

y t i s n e D

0 2 . 0 5 1 . 0 0 1 . 0 5 0 . 0 0 0 . 0

−3

−2

−1

0

1

2

3

4

We will sometimes also think of the intervals as bins into which our data points are distributed.

Example 1.1 (Birth weight and smoking)

Mothers who did not smoke

0.025 0.020 y t i 0.015 s n e 0.010 D 0.005 0.000 60

80

100

120

140

160

180

160

180

Birth weight in ounces

Mothers who smoked

0.020

y t i 0.015 s n 0.010 e D

0.005 0.000 60

80

100

120

140


How to draw histograms 



First, choose the consecutive intervals (or bins) I 1 , I 2 , . . . I m into which the data points are distributed. Calculate the number of data points nk within each interval I k k . Then N = n1 + n2 +

· · · + n

m

is the total number of points. nk  We want the area of the rectangle R above the interval I to be . k k k k N Since the area of a rectangle equals its height times its width, we take s k k = height of R k k = 

nk

(length of I k k ) N (length

Then the total area of the histogram equals m



(area of R k k ) =

k =1

m

 k =1 =1

nk = 1. N

Area (Percent) = height x width So, height = percent/width

Bin width

Example 1.2 (Calculating percentiles using histograms) What percentage of women who smoked had children with birth weights less than 90 ounces?

Mothers who smoked

0.020

0.015 y t i s n e D 0.010

0.005

0.000 60

80

100

120

140


160

180

We see that 8.68% of mothers who smoked had a child weighing less than 90 ounces (5.63 lbs). The red lines represent the 25th, 50th (median), and 75th percentiles.

B. Partitioning an interval

When we decide upon the intervals/bins into which to sort our data points for a histogram, we are in effect creating a partition of an interval.

DEFINITION If a = x 0 < x 1 < < x m−1 < x m = b , we call P = x 0 , x 1 ,..., x m a partition of an interval [a, b ].

···

{

}

The partition P divides the interval [a, b ] into the m closed subintervals I 1 = [x 0 , x 1 ], I 2 = [x 1 , x 2 ],..., I m = [x m−1 , x m ].

Example 1.3 Let Y = 1.2, 1.5, 1.5, 2.2, 2.2, 2.7, 5.5, 5.7 be the data we want to graph. The minimum is 1.2 and the maximum is 5.7. We round 1.2 down to the nearest integer and round 5.7 up to the nearest integer.

{

}

We choose our partition of [1, [1, 6] to 6] to be P = 1, 2, 3, 5, 6 .

{

}

C. Step functions

To calculate and plot the height of the rectangles, we were actually defining a piecewise constant function

  

s 1

s (x ) =

s 2

.. .

s m

if x 0 x if x 1 < x

≤ ≤≤ x 1 ≤ x 2 ≤

if x m−1 < x

≤ x , ≤ m

where s k k is the height of the rectangle over the k th th subinterval.

Example 1.4 For our data, the percentages in the intervals (areas of the rectangles) are 37.5, 37.5, 0, and 25. We divide each of these percentages by 100 (width of the interval).

·

The function is then defined as

s (x ) =

  

0.375 0.375 0 0.25

if if if if

1 2 < 3 < 5 <

≤ x ≤ ≤ 2 ≤ 3 x ≤ ≤ 5 x ≤ ≤ 6 x ≤

DEFINITION Let P = x 0 , x 1 ,..., x m be a partition of [a, b ]. A step function is a function s : [a, b ] R that is constant on the open subintervals of P . Denote s k k as the constant value that s takes in the k th th open subinterval I k k : if x k k −1 < x x k k , (k = 1, 2, ..., m). s (x ) = s k k

{

→

}

≤ ≤

.4

.3

.2

.1

.0 1

2

3

4

5

6

Concerning the breakpoints we assume s (x k k ) = s k k for k = 1, 2,..., m.

Histograms are step functions Remember: you can always think of histograms as step functions. 0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.0

0.0 1

2

3

4

5

6

1

2

3

4

5

6

As we collect more data, we might make the partition of [a, b ] finer and finer. What happens then?

Section 2 Integrals and area

A. Integral of a step function Suppose s and t are step functions on [a, b ]. ] . Let P 1 and P 2 be partitions of [a, b ] such that s is is constant on the subintervals of P 1 and t is is constant on the open subintervals of P 2 . Define the sum u = s + t by the rule u (x ) = s (x ) + t (x )

a

x1

b

a

x2

if a

≤ x ≤ ≤ b .

b

a

x2

x1

To show that u is is actually a step function, we must find a partition P such that u is is constant on the open subintervals of P .

DEFINITION The common common refinement of P 1 and P 2 is the union P = P 1

∪ P 2.

b

DEFINITION The integral is the number integral of a step function s from a to b is



m

b

s (x ) dx :=

a



s k k (x k k

− x −1 ).

k =1 =1

k k

s4 s2 s5

s1 s6 s3

a

x1

x2

x3

x4

x5

b

If each s k k 0, the integral is the area between the graph of the step function and the x axis.

≥

−

THEOREM (Additive Property)



b



b

s (x ) + t (x ) dx =

a



b

s (x ) dx +

a

t (x ) dx

a

s+t

t s

a

x1

b a

x2

b a

x2

x1

b

THEOREM (Homogeneous Property)

 · · b



b

c s (x )dx = c

a

s (x )dx

a

2s

s

a

x1

b

a

x1

b

We can combine the previous two assertions:

THEOREM (Linearity)



b



b

c 1 s (x ) + c 2 t (x ) dx = c 1

a



b

s (x ) dx + c 2

a

t (x ) dx

a

THEOREM (Invariance under translation)





b +c

b

s (x ) dx =

s (x

a+c

a

− c ) dx −

for for every every real number number c

s(x)

a

x1

s(x−c)

b a+c

x1+c

b+c

THEOREM (Comparison) If s (x )

≤ t (x ) for every x ∈ ∈ [a, b ] then



 ≤ ≤

b

b

s (x ) dx

a

t (x ) dx .

a

THEOREM (Expansion or contraction of the interval)

   kb

ka

x s k



b

dx = k

s (x ) dx

for every k > 0

a

Next, we turn to the problem of computing integrals of more general functions. To do so, we will need to take limits.

B. Riemann integrals Our next goal is finding the area under a curve:

A=?

a

b

We instead find the area of a collection of rectangles that approximate the desired area. That is, we is, we approximate f by by a step function.

x0

A

x1

x2

≈ (x 1 − x 0)f (x 0) + (x 2 − x 1)f (x 1 )

Using 10 subintervals makes the approximation even better:

x0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x10

10

A

 ≈ k =1 =1

f (x k k −1 )(x k k

− x −1) k k

Notation Suppose P is a partition, dividing our interval [a, b ] into m subintervals I 1 , . . . , I m . (i) Let ∆x k k = x k k x k k −1

−

denote the length of the k -th -th subinterval I k k . (ii) Let x k ∗ be any point in the k -th -th subinterval I k k . We will build a rectangle of height f (x k ∗ ) above I k k . The area of this rectangle is f (x k ∗ )∆x k k ; and so the total area is m

 k =1 =1

m

f (x k ∗ )∆x k k =

 k =1 =1

f (x k ∗ )(x k k

− x −1). k k

This is an approximation to the area under the curve, called a Riemann a Riemann sum.. sum

To find the actual area, we want to let m get bigger and bigger and ∆ x k k get smaller and smaller. If we then send m , we should get the actual area.

→∞

DEFINITION The Riemann Riemann integral of f from a to b is



m

b

a

f (x ) dx = lim m

→∞



f (x k ∗ )∆x k k ,

k =1 =1

provided this limit exists, irrespective exists, irrespective of the choice of the partition or the choice of the points x k ∗ . For nonnegative functions f , the integral f between a and b .



b b f (x ) dx gives a

the area under

A useful fact is that the Riemann integral always exists for continuous functions:

THEOREM If f : [a, b ] and thus

→ R is continuous, then the limit on the previous slide exists;



b

f (x ) dx is defined .

a



b b

Remark: It can also be shown that a f (x ) dx is defined if f is piecewise continuous , meaning that we can subdivide [ a, b ] into finitely many subintervals I 1 , . . . , I m , such that f restricted restricted to each interval l k k = [x k k −1 , x k k ] is continuous (after possibly being redefined at the endpoints). But how can we actually compute integrals?

Useful formulas for Riemann sums When calculating Riemann sums, the following rules will be helpful: m



   

i =

m(m + 1)

i =1 m



2

i =

i =1 m



i =1 m



i =0

3

i =

2 m(m + 1)(2m + 1)

6 m2 (m + 1) 2

4

r m+1 1 r = r 1 i

− −

−

(r = 1).

 

We will discuss in Math 10B how to use mathematical induction to establish the first three of these formulas.

Example 2.1 Find

 2 0

x 2 dx .

SOLUTION: For simplicity, lets choose our m subintervals to all be the same size. 0 2 Then ∆x k k = 2− = . m m Also for simplicity, let’s choose x k ∗ to be the left endpoint of our subintervals. Then x ∗ = 0, x ∗ = 2 , x ∗ = 2 2 , . . . , x ∗ = (k 1) 2 1

2

·

3

m

− · −

k

m

We must therefore compute

 m

lim

m

→∞

k =1 =1

2(k

− 1)

m



2

2 m

m 1

= lim m

→∞

  −

i =0

and for this will use formulas from the previous slide.

2i m

2

2 m

,

m



2

2

x dx = lim m

0

=

8(1

i

m2

m 1

−

i 2

i =0

− 1)m(2m − 1)

8 m3 (1

→∞ m3

4

2

i =0

8 (m

→∞ m3

= lim m

8

−

 

→∞ m3

= lim m

m 1

→∞ m

= lim m

2

−

− 0)(2 + 0)

6 16 8 = = . 6 3

6 1 )(2 m 6

5

− 1) m

A

−1

1

=

8 3

2

Integrals of powers of x

Calculations similar to those in the previous example show that



b

a

b j +1 a j +1 x dx = j + 1 j

−

for all b > a and each positive integer j . We will later learn simpler ways to derive these formulas.

Example 2.2



10

Find

e x dx .

2

SOLUTION: Let us take ∆x = Then



=

m

m

10

2

10−2

e x dx = lim m

→∞

= lim m



e 2+(k −1)8/m

k =1 =1

8

→∞ m

2

m 1

−

2

e

8 m

= li m m

8

→∞ m

8/m

→∞

−

 i =0

i

e

− e 2 )

8/m

− 1) e 8/ − 1 m

li m

8/m

→∞ e 8/m

1

= e 10

  −  m

=1

m

m 1

i =0

= e lim (e = (e 10

m

− 1) 8 . −

and x k ∗ = 2 + (k

 

8

m

8

− e 2.

e 2 e 8i /m

·

The foregoing calculation used the fact that 8/n

lim

n

→∞ e 8/n

−1

= 1.

To confirm this, observe that lim

n

8/n

→∞ e 8/n

−1

h

= =

→0 e h

1

limh→0 1 e 0

= 1, since (e x ) = e x .

h

= lim

−1 e h 1 h

−

Properties of Riemann integrals We earlier identified various properties for the integrals of step functions. By approximation, the same properties hold for the integrals of any function:

THEOREM (Linearity) If the functions f , g : [a, b ] then



→ R have integrals and if c 1, c 2 are constants,

b



b

c 1 f (x ) + c 2 g (x ) dx = c 1

a

b

But in general,

b

f (x ) dx + c 2

a

 a

 

b

f (x )g (x )dx =

a



g (x ) dx ,

a

 · ·

b

f (x )dx

a

g (x )dx

THEOREM (Invariance under translation)





b +c

b

f (x ) dx =

f (x

a+c

a

− c ) dx −

for for every every real real number number c

THEOREM (Comparison) If f (x )

≤ g (x ) for every x ∈ ∈ [a, b ], then



 ≤ ≤

b

b

f (x ) dx

a

g (x ) dx .

a

THEOREM (Expansion or contraction of the interval)

   kb

ka

x f k



b

dx = k

a

f (x ) dx

for for every k > 0

THEOREM (Additivity of integrals over different intervals) If a < b < c , then



b

a



c

f (x ) dx +



c

f (x ) dx =

b

f (x ) dx

a

a

b

c

Negative area When f < 0, then we regard the area above the graph of f and and below the -axis as negative. x -axis

Example 2.3

 − 2

2

x dx =

0

−1

−

8 3

1

8 A

= −

3

2

DEFINITION (Exchanging limits of integration) If a < b , then we define



a

b

 −

b

f (x ) dx =

a

f (x ) dx

C. Improper Integrals If the function f is is integrable on [a, b ] for each real number b > a, then we define:



+∞



b

f (x ) dx = lim b

→∞ →∞

a

f (x ) dx

a

provided the limit exists. Likewise, if for the real number b the the function f is is integrable on [a, b ] for each real number a < b , we then define



b

−∞

if the limit exists.



b

f (x ) dx = lim a

→−∞

a

f (x ) dx

Finally, we define



+∞



c

f (x ) dx =

−∞

f (x ) dx +



+∞

f (x ) dx

c

−∞

where c R is arbitrary, assuming each of the integrals in the right hand side is defined.

∈ ∈

In other words, we are assuming that the right hand side above is not of the form “( ) + ( )” or “( ) + ( )”.

∞

−∞

−∞

∞

Integrals of the type defined on this and the previous slide are called improper improper integrals integrals.

Example 2.4 We will learn later that



∞ 1 x 2

1

Example 2.5 Using the rule

  

∞

0 0

−∞ ∞ −∞



b b 1 a x 2

dx = b

dx = lim b

→∞ →∞



 1

a

x 2

−

b

dx = lim b

x dx = lim b

→∞ →∞

x 3 dx = lim a



0

−∞

3

b

→∞ →∞

0

→−∞

b

= 1.

 − ∞  −  −∞

x 3 dx = lim a

a

x 3 dx +

1

for j = 1, 2, . . . , we see that

x dx = lim

0

−  1

→∞ →∞

b

3

x 3 dx =

− 1 for b > a > 0. 0. Therefore

1

+1 b b j b j +1 a j +1 = x dx j +1 a

 

1



∞

0

b 4

→−∞

0 =

4

0

a4

4

,

=

undefined. x 3 dx is undefined.

,

Tolstoy on integration (from War and Peace)

“The movement of humanity, arising as it does from innumerable arbitrary human wills, is continuous. To understand the laws of this continuous movement is the aim of history. But to arrive at these laws, resulting from the sum of all those human wills, man’s mind postulates arbitrary and disconnected units. . . . Only by by taking infinite infinitesima simally lly small units units for observat observation ion (the differential of history, that is, the individual tendencies of men) and attaining to the art of integrating them (that is, finding the sum of these infinitesimals) can we hope to arrive at the laws of history.”

Section 3 Approximation methods

A. Approximating integrals numerically



b b

In order to numerically approximate the value of the integral a f (x ) dx , ∗ we can compute k n =1 k with a large value of n. To simply, we =1 f (x k )∆x k use equal sized subintervals, each of width



∆x =

b

− a. n

Let x k ∗ = a + ( k

− 1) −

b

−a n

denote the left endpoint of each subinterval.

DEFINITION The left left endpoint rule approximates the integral Ln =

b

−a n

 n

f a + ( k

k =1 =1

− 1) −

b



b b f (x ) dx by a

−a n



.

the sum

Example 3.1 Fix n = 5. The step size is ∆x = 3−5 1 = 0.4. The left endpoints are x 0 = 1, x 1 = 1.4, x 2 = 1.8, x 3 = 2.2, x 4 = 2.6.

1.0



3

1

2

e −x dx

≈ L5 =

1.4

1.8

2.2

2.6

3.0

2 ( f (1) + f (1. (1.4) + f (1. (1.8) + f (2. (2.2) + f (2. (2.6)) 5

There’s nothing particularly special about the left endpoints, so we could just as easily use the right endpoints

−

b a ∗ = + . x k a k n

DEFINITION

    − −

The right right endpoint rule approximates the integral sum R n =

b

a

n

n

f a + k

k =1 =1

b

a

n

b b f (x ) dx by a

.

the

Example 3.2 For our example

 3 1

2

e −x dx , this now gives

1.0



3

1

− x 2 e dx

≈ ≈

1.4

1.8

2.2

2.6

3.0

2 (1.4) + f (1. (1.8) + f (2. (2.2) + f (2. (2.6) + f (3)) R 5 = ( f (1. 5

Left versus right

If we compare the formulas for Ln and R n , we see they only differ in two terms out of the entire sum:

− a (f (x 0) + f (x 1 ) + f (x 2) + · · · + f (x −1) ) n b − a = ( f (x 1 ) + f (x 2 ) + · · · + f (x −1 ) + f (x ))

Ln = R n

b

n

n

n

n

So the only real difference is whether we include f (a) = f (x 0 ) or f (b ) = f (x n ) in the sum.

Which is better?

Example 3.3

 3

2

For the particular example of 1 e −x dx , we can see graphically that the left endpoint rule gives an overestimate and the right endpoint rule gives an underestimate:

1.0

1.8

2.6

1.0

1.8

2.6

More accurate methods We will see now that some surprisingly simple modifications of the formulas above give us much better approximations. One idea is to compromise between the left- and right-endpoints, by choosing instead the midpoint of each subinterval,

−− 12 )

x k ∗ = a + ( k

b a . n

−

DEFINITION The midpoint midpoint rule approximates the integral M n =

b

−a n

n



f a + ( k

k =1 =1



b b f (x ) dx a

1 2)

− · −

b a n

−



.

by

1.2

1.6

2.0

2.4

2. 8

Trapezoid rule Another way to improve the accuracy is not to approximate by a rectangle in each subinterval, but rather to approximate by a trapezoid, gotten by drawing a diagonal line from ( x k k −1 , f (x k k −1 )) to (x k k , f (x k k )):

1.0

1.4

1.8

2.2

2.6

3.0

In this case, we get on each subinterval I k k = [x k k −1 , x k k ] a small trapezoid, the area of which is h1 +2 h2 ∆x = f (x k k 1 2)+f (x k k ) ∆x : −

1

2

∆x

DEFINITION The trapezoid trapezoid rule approximates the integral T n =

b

−a

2n

n



f a + ( k

k =1 =1

− 1) · −

b



b b f (x ) dx a

  − a

n

+ f a + k

by

· ·

b

 − a

n

.

We can also write f (x 0 ) + f (x 1 ) f (x 1 ) + f (x 2 ) + ∆x + T n = ∆x

f (x −1 ) + f (x ) · · · + ∆x 2 2 2 ∆x = ( f (x 0 ) + f (x 1 ) + f (x 1 ) + f (x 2 ) + · · · + f (x −1 ) + f (x ) ) 2 ∆x = ( f (x 0 ) + 2 f (x 1 ) + 2f (x 2 ) + · · · + 2f (x −1 ) + f (x ) ) 2 n

n

n

n

n

n

Notice also T n =

1 (Ln + R n ). 2

The trapezoid rule is thus the average of the left and right endpoint rules. We will see that this averaging process makes the errors for T n much smaller than for either Ln or R n !

Example 3.4

1.0

T 5 =

2

 ·

2 5

·

1.4

1.8

2.2

2.6

3.0

(1.4) + 2f (1. (1.8) + 2f (2. (2.2) + 2f (2. (2.6) + f (3) f (1) + 2f (1.



Error bounds Relatively straightforward calculus methods, omitted in these notes, let us estimate the accuracy of our approximations:

THEOREM (Error estimates for midpoint and trapezoid rules) Assume the function f is is twice differentiable on the interval [a, b ], with

|f (x )| ≤ C

(a

≤ x ≤ ≤ b )

for some constant C . Then

   

b

a

and

− M −

f dx

n

b

a

− T −

f dx

 ≤  ≤

n

C (b

− a)3

24n2

C (b

− a)3

12n2

Interpretation We say that midpoint midpoint and trapezoid rules are of order n12 . Since the a step size is ∆x = b − , we can equivalently say that these methods are of n order (∆x )2 . This means, loosely speaking, that if we double the number of points from n to 2n the error should drop by at least 14 . It turns out that the left- and right-endpoint rules are only of order n1 (equivalently, of order ∆x ). Since n12 is much, much smaller than n1 for large n, the midpoint and trapezoid rules are much more accurate . More sophisticated approximations are of even higher order:

Simpson’s rule DEFINITION If n is an even integer, Simpson’s rule approximates the integral b b f (x ) dx by a



S n =

∆x (f (x 0 ) + 4 f (x 1 ) + 2f (x 2 ) + 4f (x 3 ) + 3

It turns out that S n =

4T n

· · · + 4f (x −1 ) + f (x )) . n

n

− T , n

2

3

and it can be shown that

 

b

a

provided f (4) (x )

|

− S −

f dx

n

 ≤

K (b

− a)5

180n4

| ≤ K for all a ≤ x ≤ ≤ b . So Simpson’s So Simpson’s rule is of order

1

n4

.

Section 4 Applications of integration

A. Defining new functions Many important functions used in the theoretical and applied sciences are defined via integrals.

Example 4.1 (Logarithms as integrals) We earlier reminded you about the natural logarithm ln, ln, a key formula for which is ln(xy ) = ln x + ln y (x , y > 0). 0) . But how to we know that a function with this useful property even exists? A systematic approach is to define define the natural logarithm by the formula



x

ln x =

1

1 t

dt

(x > 0);

and then to prove that the natural log, so defined, really does satisfy ln(xy ) = ln x + ln y . When we have later developed the relevant calculus skills, we will do this.

The foregoing also provides an interesting geometric interpretation of the number e . It is that value of the upper limit of integration for which



e

1

1 t

dt = 1.

Example 4.2 (The Gamma function) The Gamma Gamma function is Γ(x ) =



∞

t x −1 e −t dt .

0

This improper integral exists for all positive real numbers x . Later, after we have developed more integration techniques, we will derive some interesting formulas for the Gamma function. In particular, Γ(n) = (n 1)! for 1)! for all positive integers n.

−

B. Length of curves If f : [a, b ] R is a function given by some explicit formula, then all the geometric properties of the curve determined by graph of f must somehow be contained within the formula. How can we extract this information?

→

One important use of calculus is providing ways for us to to compute various geometric properties, for instance the length of curves:

Example 4.3 (Length of curves) The length is given by length L of the curve determined by the graph of f is

  b

L =

1 + ( f  )2 dx

a

C. Approximating functions by polynomials We discussed earlier the problem of approximating a given function f by a simpler polynomial of the form g (x ) = an x n + an−1 x n−1 +

· · · + a1x + a0 .

One solution is to use the Taylor polynomial n

g (x ) = T n (x ) =

 k =0 =0

f ( k ) (a) (x ! k !

k

− a) , −

introduced earlier. However we observed earlier that this approximation requires that we have available very detailed information about the function f at the specific point x = a . We need to know f (a), f  (a), f  (a), . . . , f ( n) (a), and these would be essentially impossible to find if, say, f were were determined by experimental data. We need another, more robust way to approximate by polynomials.

One very useful idea is to use integrals to measure the error of our approximations For this, let us assume that f : [a, b ] integral error function

→ R is given, and define then the

E (a0 , a1 , . . . ,an−1 , an )

 

b

=

(f (x )

a

− g (x ))2 dx

b

=

a

(f (x )

n

− (a x n

+ an−1 x n−1 +

· · · + a1x + a0 ))2 dx .

The idea now is to select the coefficients a0 , a1 , . . . an−1 , an to minimize this error. This however requires that we know how to minimize the function E (a0 , a1 , . . . , an−1 , an ) depending on n + 1 variables, and this is beyond the scope of Math 10. But in practice computers can quickly compute the answers to high precision.

D. Integrating densities Example 4.4 (Chemical concentration) Suppose that some chemical (say, an insecticide) is spread unevenly along a thin strip of land. We may for simplicity assume the region to be one-dimensional, lying along the x axis. Let ρ(x ) = concentration of the chemical at x . What is the total amount of insecticide spread in the region a The total amount of the chemical between a and b is



b

a

(ρ = rho.)

ρ(x ) dx .

≤ x ≤ ≤ b ? ?

Example 4.5 (Mass density) Suppose that a straight piece of wire is made of a mixture of two metals, the proportion of which changes along the wire. Assume for simplicity the wire is one dimensional and that ρ(x ) = mass density of the wire at x . What is the total mass of the wire for a The total mass is



≤ x ≤ ≤ b ? ?

b

ρ(x ) dx .

a

These two examples illustrate the point that the total amount of any quantity between the points a and b is is the integral of its density over the interval [a, b ].

E. Integral test for series convergence THEOREM Suppose that f : (0, (0, Set

∞) → [0 [0,, ∞) is a nonnegative, decreasing function. function. ak = f (k )

Then



(k = 1, 2, . . . ).

∞ a converges if and only if k =1 =1 k



∞

f (x ) dx <

∞.

1

To see why this is true, look at the pictures on the next slide, which show geometrically that

 ≥  ∞

k =1

∞

ak

1

∞

 ≥ ≥

f (x ) dx

k =2 =2

ak .

Example 4.6 Show that

∞

 k =1 =1

converges if p > 1. 1 .

1 k p

> 0, then SOLUTION: We will learn later that if b > a > 0,



b

a

a1−p dx = x p p

1

− b 1− −1

p

.

Therefore



∞ 1

1

dx = lim p b →∞ →∞ x



b

1

1

dx = lim p b →∞ →∞ x

1

− b 1− p − 1

is finite. Note that limb →∞ →∞ b 1−p = 0, since p > 1.

p

=

1 p

−1

F. Integration and probabilities

In this section we will learn how integration can help us compute the probabilities of certain random events. We provide first some introductory motivation for the idea that areas (and therefore integration) are somehow related to probabilities.

Example 4.7 (Simulating coin tosses) Flip a fair coin 200 200 times. times.  Record the number of heads out of the 200 200 flips. flips.  Repeat the process N times. 

Histogram for N = = 100 times: 200 Coin Tosses 0 1 . 0

8 0 . 0

y t i s n e D

6 0 . 0

4 0 . 0

2 0 . 0

0 0 . 0

80

90

100 Number of Heads

110

120

Histogram for number of heads in 200 coin tosses repeated 1,000 times: 200 Coin Tosses 6 0 . 0 5 0 . 0 4 0 . 0 y t i s n e D

3 0 . 0 2 0 . 0 1 0 . 0 0 0 . 0

80

90

100 Number of Heads

110

120

Histogram for number of heads in 200 coin tosses repeated 10,000 times: 200 Coin Tosses

5 0 . 0

4 0 . 0 y t i s n e D

3 0 . 0

2 0 . 0

1 0 . 0

0 0 . 0

80

90

100 Number of Heads

110

120

200 Coin Tosses

5 0 . 0

4 0 . 0 y t i s n e D

3 0 . 0

2 0 . 0

1 0 . 0

0 0 . 0

80

90

100

110

120

Number of Heads

This function in blue looks like a smooth version of our step function! What is this function?

DEFINITION A Gaussian function is a function having the formula (x 1 − f (x ) = e 2 σ 2π

√

)2

−µ

σ2

.

Gaussian functions comprise a family of bell-shaped curves, each determined by the parameters µ R and σ > 0.

∈

As we see in the picture on the next slide µ gives the center of the bell-shaped curve. The parameter σ determines the thickness and height of the curve. We call µ the mean and σ the standard deviation, deviation, and will later explain the probabilistic meaning of these terms. (µ = mu, σ = sigma)

0.04

0.03

) x ( f

0.02

0.01

σ µ

0.00 70

80

90

100

110

120

130

x

(x − 1 The graph of f (x ) = σ√ 2π e 2

)2

−µ

σ2

.

Examples Gaussian Functions

0.8

µ=

0, σ = 0.5

0.6

0.4

0.2

0.0 −15

−10

−5

0

5

10

15

20

Gaussian Functions

0.8

µ = 0, σ = 0.5

0.6

0.4

µ=

0.2

0, σ = 1

0.0 −15

−10

−5

0

5

10

15

20

Gaussian Functions

0.8

µ = 0, σ = 0.5

0.6

0.4

µ = 0, σ = 1

0.2

µ=

0, σ = 7

0.0 −15

−10

−5

0

5

10

15

20

Gaussian Functions

0.8

µ = 0, σ = 0.5

0.6

0.4

µ = 0, σ = 1

0.2

µ=

8, σ = 3

µ = 0, σ = 7

0.0 −15

−10

−5

0

5

10

15

20

We will see later that a Gaussian function f corresponds to a normal a normal (or Gaussian)) pro Gaussian probabilit babilityy distribu distribution tion.. In particular, Total area under the curve is always 1  The graph of f is is symmetric around µ: f (µ (µ + x ) = f (µ (µ 

Normal Distribution

0.4

0.3

0.2

Area = 1 0.1

0.0

−4

−2

0

2

4

− x )

DEFINITION The standard standard normal distribution has mean µ = 0, standard deviation σ = 1 and is therefore f (x ) =

1 −x 2 /2 . e 2π

√

The area to right of 0 equals 12 , and the area to left of 0 equals Standard

Standard

Normal Distribution

Normal Distribution

4 . 0

4 . 0

3 . 0

3 . 0

2 . 0

2 . 0

Area = 0.5

1 2.

Area = 0.5

1 . 0

1 . 0

0 . 0

0 . 0

For the standard normal distribution, the area between -1 and 1 equals 0.68,  the area between -2 and 2 equals 0.95,  the area between -3 and 3 equals 0.997. 

Standard

Standard

Normal Distribution

Normal Distribution

4 . 0

4 . 0

3 . 0

3 . 0

2 . 0

2 . 0

Area

Area

=

=

0.68

1 . 0

0.95

1 . 0

0 . 0

0 . 0

−4

−2

0

2

4

−4

−2

0

2

4

Standard

Standard

Normal Distribution

Normal Distribution

4 . 0

4 . 0

3 . 0

3 . 0

2 . 0

2 . 0

Shaded area

Shaded area

=

=

1−0.68

1 . 0

.5*0.32

1 . 0

0 . 0

0 . 0

−4

−2

0

2

4

−4

−2

0

2

4

We can use the standard normal to calculate areas under the curve for any Gaussian distribution.

Example 4.8 Suppose we have a normal distribution with µ = 50 and σ = 5. What is the area under the curve to the left of 40 40? ? subtracting the SOLUTION: We first convert 40 to standard standard units, by subtracting mean and dividing by the standard deviation: 40

− µ = 40 − 50 = −2.

σ

5

We now need to find the area to the left of 2 for the standard normal distribution.For this, we can use an online applet 1 from the UC Berkeley Statistics Department to evaluate numerically areas under the curve of the standard normal (with µ = 0, σ = 1).

− −

1 http://statistics.berkeley.edu/

stark/Java/Html/NormHiLite.htm

Using the applet, we learn that the area under the curve of the standard normal between -2 and 0 is approximately .477. Since the total area under the curve to the left of 0 is .5, it follows that the area to the left of 2 is approximately

− −

.5

= .023 023 − .477 = .

Example 4.9 (Women’s heights) Assume that US women’s heights are normally distributed with mean 63 inches and standard deviation 3 inches. About what percentage of US women are taller than 66 inches?

SOLUTION: Geometrically, we want to calculate the area to the right of 66. For our data, µ = 63 and σ = 3. We as before convert 66 66 to to standard units: 66

− µ = 66 − 63 = 1.

σ

3

Using the online applet we learn that the area under the standard normal curve between 0 and 1 is approximately .341. Hence the area to the right of 1 is about .5 .341 = . = .159 159

−

So about 16% of women are taller than 66 inches.

Introduction to computing probabilities We have introduced the idea that areas that areas under curves can be interpreted as probabilities, probabilities, and now provide more mathematical details, which will be further elaborated later. In particular we will learn in Part III of this course about the concepts of a probability space (Ω, (Ω, P ) and a random variable X : Ω

→ R.

Interpretation More precise definitions will appear later, but for now think of the probability space as some sort of mathematical model for random occurrences, for which “P” means the probability. And think of X as giving the random outcomes of experiments or measurements.

DEFINITION The cumulative cumulative distribution function (cdf) of a random variable X is the function F (x ) = P (X x ) ,

≤ ≤

defined for

−∞ < x < ∞. In other words, −∞ F (x ) is the probability that X

≤ x . ≤

[0, 1]: 1]: F maps real numbers to a probability value in [0, F :

R

[0,, 1]. 1]. → [0

The cumulative distribution function is increasing and satisfies x

lim F (x ) = 0, lim F (x ) = 1

→−∞

x

→∞

DEFINITION The probability probability density function (pdf) of a random variable X is a nonnegative nonnegative function function f that that has the following properties: ∞  −∞ f (x ) dx = 1  The probability that X falls in the interval (a, b ) is the area under the density function between a and b :





b

P (a

≤ X ≤ ≤ b ) =

f (x ) dx .

a

So when a random variable X has a pdf f , we can calculate probabilities by integrating f .

 c c

In particular, P (X = c ) = c f (x ) dx = = 0. And since P (X = c ) = 0, we don’t need to worry about endpoints: P (a

≤ X ≤ ≤ b ) = P (a < X ≤ ≤ b ) = P (a ≤ X < b ) = P (a < X < b ).).

Example 4.10 As noted earlier, the normal normal distribution has as its probability density function the Gaussian function (x 1 − f (x ) = e 2 σ 2π

)2

−µ

√

σ2

.

0.4

0.3

0.2

0.1

0.0

µ − 4σ

µ − 3σ

µ − 2σ

µ−σ

µ

µ+σ

µ + 2σ

µ + 3σ

µ + 4σ

Example 4.11 The uniform uniform distribution gives probabilities for a continuous random variable that takes values in the interval (a, b ) and each value is equally likely. The probability density function is f (x ) =



1

if a < x < b 0 otherwise

b a

−

.

Uniform Distribution for (−3,3)

0.15

0.10

0.05

0.00

−4

−2

0

2

4

Using the pdf to find the cdf If we let a =

−∞ and b = x , we can use the pdf to find the cdf:



x

F (x ) = P (X

≤ x ) = P (−∞ < X ≤ ≤ ≤ x ) = Normal Distribution

0.4

0.3

f(x)

0.2

F(1)

0.1

0.0 −4

−2

0

2

4

−∞

f (y )dy

Using the cdf to find the pdf Now if we focus on area under the curve, we can use the cdf to find the pdf. Namely, f can can be recovered from F in the following sense: F (b )

− F (a) = P (X ≤ ≤ b ) − P (X ≤ ≤ a) = P (a ≤ X ≤ ≤ b ) = − f (x ) dx = f (x ) dx f (x ) dx −





b



a

−∞

b

a

−∞

Normal Distribution

0.4

F(1)−F(−1) 0.3

f(x)

0.2

0.1

0.0 −4

−2

0

2

4

Mathematical relationship between pdf and cdf Our discussion thus far shows that for continuous random variables, we have x F (x ) =



f (y ) dy

−∞

and



b

f (y ) dy = F (b )

a

− F (a)

There is a very important relationship between the functions F and f that can explain both of these properties: f is the derivative of F : f = F  .

The properties above follow from the Fundamental Theorem of Calculus, which we discuss next.

Section 5 Antiderivatives, Fundamental Theorem of Calculus

A. Antiderivatives When you learn how to add, you then learn how undo the addition via subtraction.  When you learn how to multiply, you then learn how to undo multiplication via division. 

So far this semester we have learned how to take derivative of a function. Now we ask the reverse: can we “undo” a derivative? Yes, using antidifferentiation.

Example 5.1

1 3 Output: F (x ) = x . 3

2

Input: f (x ) = x

1 Input: f (x ) = + sin x x

Output: F (x ) = ln x

− cos x + 5 −

DEFINITION Given the function f , a function F is called an antiderivative of f on the interval (a, b ) if F  (x ) = f (x ) for all x in (a, b ).

Example 5.2 If f is is a function which describes how some quantity is changing over time t , an antiderivative F determines the amount of the quantity at any time, up to an additive constant. The location of a car is an antiderivative of its velocity. The velocity of a car is an antiderivative of its acceleration.

THEOREM If F is an antiderivative of f and G is an antiderivative of g , then F + G is an antiderivative of f + g .

Proof. This follows directly from the corresponding property for derivatives, since (F + G ) = F  + G  = f + g .

THEOREM If F is an antiderivative of f and c is a constant, then c F is an antiderivative of c f .

· ·

· ·

Proof. By the constant multiple rule for differentiation,

 · ·

c F ) = c F  = c f

·

· ·

If F and G are antiderivatives of f and g , respectively, it is in general NOT true that F G is an antiderivative of f g

· ·

·

Example 5.3 F (x ) =

x 2

(x ) = x , and G (x ) = 2 is an antiderivative of f antiderivative of g (x ) = x 2 , but F (x )G (x ) =

x 2

2

·

x 3

3

=

x 5

6

is NOT an antiderivative of f (x )g (x ) = x x 2 = x 3 .

· ·

F f Similarly, is generally NOT an antiderivative of . G g

x 3

3 is

an

THEOREM (Antiderivatives differ by a constant) Suppose that f is is a function whose domain contains the interval (a, b ), and assume that F is an antiderivative of f on (a, b ). Then another function G is also an antiderivative of f on (a, b ) if and only if G = F + C , for some constant C .

Proof. If F  = f , then (F + C ) = F  + (C ) = F  = f . Consequently, if F is an antiderivative of f , then so is F + C . Conversely, if F is an antiderivative of f on (a, b ), ), then any other antiderivative G must satisfy

− F ) = G  − F  = f − f = 0. −

(G This means that G

− F = C is constant. −

Example 5.4 (Difference of antiderivatives) Consider the two functions F (x ) =

x 1 x + 1

− −

and

G (x ) =

− x +2 1

Differentiate:

  −   − x 1 x +1

−

2 x +1







=

(x +1)(x −1) −(x −1)(x +1) +1) 2 (x +1)

=

(x +1)−(x −1) (x +1)2

=

=



2 (x +1) +1)2



 ( 2)(x + 1)−1 =

2 (x +1) +1)2

Thus F and G are antiderivatives of the same function. According to the theorem on the previous slide, they must differ by a constant. Check:

F (x )

− 1 − −2 = x − − 1 + 2 = 1. − G (x ) = x x − + 1 x + 1 x + 1

Example 5.5 (From pdf to cdf) Consider the Gaussian function (x 1 − f (x ) = e 2 σ 2π

√

)2 2

−µ

σ

.

This, as we have seen, is the probability distribution function of the normal distribution. What is the probabilistic meaning of an antiderivative F of f ? One antiderivative of the probability distribution function (pdf) is the cumulative distribution function (cdf):



x

F (x ) = P (X

≤ x ) = ≤

−∞

f (y ) dy

Given a function f , an antiderivative F , if it exists, must be unique up to additive constant. But do antiderivatives actually exist?

THEOREM (Existence of antiderivatives) Suppose a < b and f is continuous on (a, b ). Then, there exists a function F with domain (a, b ) such that F  (x ) = f (x ) for x ( a, b ).

∈ ∈

Antiderivatives always Antiderivatives always exist exist for the functions we will encounter in this course, even though it can be difficult (or impossible!) to find simple formulas for them. Here is a particularly tantalizing instance of this:

Example 5.6 x 2 − The antiderivative F of f (x ) = e has no simple formula.

Notation (Indefinite integral notation) We use the notation



f (x ) dx = F (x ) + C

to indicate that f is is a function whose antiderivates are all of the form F (x ) + C for some function F (x ) and an arbitrary constant C . The antiderivative symbol



f (x ) dx

is also called the indefinite indefinite integral of f .

Remark. Right now, there is no reason to assume that this symbol has any connection to the notation introduced earlier for area under the curve:



b

f (x ) dx

a

However, we shall see later why this makes sense.

Example 5.7 Find all the antiderivatives of f (x ) = x 1 on the domain , 0) (0, (0, ). In other words, determine the indefinite R 0 = ( integral x 1 dx .

\{ } −∞ ∪ ∞



SOLUTION: The domain consists of two intervals, which we will analyze separately. (0, The interval (0,

∞). We need to think of a function F such that 1  F (x ) = . x

Remembering our earlier discussion, we recall for the natural logarithm that  1 ln x = .

 

x

Thus we know that the antiderivatives of f (x ) = (0, (0,

1

on the interval

x

∞) are the functions of the form ln x + C 1 on that interval. The interval (−∞, 0). We want to use the same idea as before, but we can’t use the function ln x because we can’t take logs of negative numbers. If x is negative, then −x is positive, so consider:   ln(−x ) = ln(−x ) = −1 · (−x )



  =

So, the antiderivatives of x 1 on (

x



1

x

−∞, 0) are the functions ln(−x ) + C 2.

1/x on the domain Conclusion. A function F is an antiderivative of 1/ ( , 0) (0, (0, ) if and only if there are constants C 1 and C 2 such that

−∞ ∪ ∞

F (x ) =



ln x + C 1 ln( x ) + C 2

−

if x is in (0, (0,

∞) if x is in (−∞, 0)

In practice, many people (mathematicians included!) only think about the case where C 1 = C 2 and they write



1 x

dx = ln x + C .

||

We can convert our rules for differentiation into rules for antidifferentiation.

Example 5.8 (Antiderivatives of powers) Suppose that p is is a real number, p =

 −1. Then the antiderivative of 

f (x ) = x p

on the interval (0, (0,

∞) are exactly the functions of the form +1 x p +1 + C . F (x ) = + 1 p +

Check the derivative:

  +1 x p +1 + 1 p +







1 1 +1  +1)−1 = = (p + + 1)x (p +1) = x p . x p +1 + 1 + 1 p + p +

 

Example 5.9 There exists a unique function F on the interval ( (1) = 7 and F is an antiderivative of x 2 . Find F . F (1)

−∞, ∞) such that

SOLUTION: We know that F (x ) = To find C , we plug in x = 1:



2

x dx =

F (1) =

1 + C = 7 3

C = 7

1 20 = . 3 3

This implies

−

Hence the solution is the function x 3

20 + . F (x ) = 3 3

x 3

3

+ C .

Example 5.10 Find all antiderivatives of f (x ) = ln x .

SOLUTION: After playing around with this for a while, we make the guess ln x x . F (x ) = x ln

− −

Thereafter, we simply check its derivative:



   − − −  

ln x x ln

x



=

ln x x ln



(x )



x ln x + ln x (x )

=

= x ( x 1 ) + ln x

− 1 −

−

1 (product (product rule)

= ln x So on the domain (0, (0,

∞), we have



ln x dx = x ln ln x

− x + C . −

Important antiderivatives You should learn the following antiderivatives:



+1 x p +1 + C x dx = + 1 p + p

   

1

x

if p =

dx = ln x + C

||

e x dx = e x + C

sin x dx =

− cos x + C

cos x dx = = sin x + C

 −1 

Example 5.11 





   −  √  1

dx =

x 3

3

5x

x −2 − 3 + C = x dx =

5x 4 3x dx = 4

x dx =

−2

−

x 1/2 dx =

3x 2 + C 2 x 3/2

3/2

+ C .

−

1 + C 2 2x

Fun on the internet Go to Google and search for Integral Integral Calculator Calculator

or

Antiderivative Calculator

This will give you several options such as integrals.wolfram.com

If you type in sqrt(e sqrt(e^x) ^x) + sin(x) sin(x)/co /cos(x s(x) ),

then you will learn that

 √

e x

sin x + dx cos x

=

√ 2 e − ln(cos x ) + C . x

Next try sqrt(e sqrt(e^x) ^x) * sin(x)/ sin(x)/cos cos(x) (x), ,

and also e^(-x^2)

B. Fundamental Theorem of Calculus We have defined the area under f between a and b to to be



n

b

f (x ) dx = lim n

a

→∞



f (x k ∗ )∆x k k .

k =1

Even for very simple functions, calculating these definite integrals using the Riemann sum definition can be very difficult. We now introduce the Fundamental Theorem of Calculus, Calculus, which ties together integration and differentiation. This will allow us to compute the area under the curve by the formula



b

a

f (x ) dx = F (b )

− F (a).

THEOREM (Fundamental Theorem of Calculus) (i) Suppose that f is is a continuous function on [a, b ]. If F is any antiderivative of f on (a, b ), then



b

f (x ) dx = F (b )

a

− F (a).

Since F  = f , we can rewrite this to read



b

F  (x ) dx = F (b )

a

− F (a).

(ii) If f is continuous on [a, b ], then for a < x < b , d dx



x

a

f (t ) dt = f (x ).

Area as a function We can view the area under the curve y = f (x ) between 0 and b as a function of the unknown b :

F(b)

b

Let F (b ) equal the shaded area under y = f (x ) between 0 and b as a function of b , as shown. The formula for that function is



b

F (b ) =

0

f (x ) dx .

Derivative of the area function We compute the derivative F  of the area function. By definition, + h ) F (b + h→0 h

F  (b ) = lim

Forr h > 0, F (b + Fo + h)

− F (b ) is the area under f (x ) between b and b + + h .

  

b +h

+ h ) F (b +

− F (b ) .

− F (b ) =

0

0 b +h

f (x ) dx +

0

b +h

=

b

f (x ) dx

b

=

 − −  b

f (x ) dx

 − −

b

f (x ) dx

f (x ) dx

0

f (x ) dx .

b

Now, divide this by h and make h smaller and smaller. What do you get?

Example 5.12 Let’s consider a concrete example with f (x ) = x 2 .



b b +h

We can estimate b f (x ) dx using a single rectangle. The left endpoint rule gives an underestimate and the right endpoint rule gives an overestimate: + h ) F (b ) R 1 L1 F (b +

≤

−

≤

b

b+ h

Forr f (x ) = x 2 , we get Fo h b 2

+ h ) − F (b ) ≤ h · (b + + h)2 · ≤ F (b +

Consequently, b 2

≤

+ h ) F (b + h

− F (b ) ≤ (b + + h )2

Now, evaluate the limit for h > 0: + h) F (b + lim h→0 h

− F (b ) = b 2

A similar calculation for h < 0 yields the same limit.

Conclusion. For every b , we have F  (b ) = b 2 = f (b ).

Thus the area function F is an antiderivative of f (x ) = x 2 .

Example 5.13 (Area under a parabola) We know that



2

x dx =

x 3

3

+ C .

So there is a constant C such that F (x ) =

x 3

3

+ C .



0

We must have C = 0, because F (0) =

x 2 dx = 0.

0

Remarkable Conclusions: The area under the curve y = x 2 between 0 3 and b is is equal to F (b ) = b 3 . Fo For r 0 a < b , the area under the curve equals y = x 2 between a and b equals

≤

F (b )

b 3

− F (a) = 3 −

a3

3

.

We find the area by simply evaluating evaluating an antiderivative at the endpoints.

Example 5.14 (Using cdf to find pdf) Recall that for probability distributions the integral of the cdf is the pdf. That is, the cdf F (x ) is the antiderivative of the pdf f (x ).



x

F (x ) = P (X

≤ x ) = P (−∞ < X ≤ ≤ ≤ x ) =

pdf of Normal Distribution

f (y ) dy

−∞

cdf of Normal Distribution

0.4

1.0 F(1)

0.8

0.3

0.6

F(0)

f(x)

0.2

0.4

F(1)

0.1

0.2

0.0

0.0 −4

−2

0

2

4

−4

−2

0

2

4

Example 5.15 Compute



2 5 ( 1 x

− x 3) dx .



+1 x p +1 + C , SOLUTION: Since x dx = + 1 p + x 6 x 4 the function F (x ) = is an antideriva antiderivativ tivee of f (x ) = x 5

−

6



2

1

(x 5

− x 3)dx = = = =

  

6

x

−

6

26 6 64 6 27 4

− −

p

4

 − −   −  − 

4

2

4

1

x

24 4 16 4

16 6 1 6

− x 3.

14 4 1 4

1

2

Notation We will often write



F (x )

b a

= F (b )

− F (a).

Example 5.16 Compute



π 0

sin x dx .

SOLUTION: We know that So we have

− cos x is an antiderivative of sin x . − 1



π

0

− |π = (− cos(π cos(π )) − (− cos(0)) = −(−1) − (−1) = 2. 2.

sin xdx = ( cos x ) 0

π

−1

Example 5.17

 1

Compute −1 x 3 dx . x 4

SOLUTION: We know that 4 is an antiderivative of x 3 . So we have



1

x 3 dx =

−1 =

 

1

x 4

4

14

4 = 0.

(

−1

1)4

− − 4

−1

1

Example 5.18 d Find dx



x

sin(t 2 ) dt .

3

SOLUTION: By the Fundamental Theorem of Calculus, this is just sin(x 2 ).

Example 5.19 d Find dx



3

sin(t 2 ) dt .

x

SOLUTION: We can’t apply the Fundamental Theorem directly, but we can do the following. d dx



3

sin(t 2 ) dt =

x

=

d dx

− 

x

3

− sin(x 2).



sin(t 2 ) dt

Example 5.20 Find

d dx

x 2



f (t ) dt .

x

SOLUTION: Since x appears in both the upper and lower bounds of integration, we split up the integral: d dx

x 2



f (t ) dt =

x

=

  − d dx

=

=

− −

f (t )dt +



f (t )dt

0

x

d dx



x 2

0

x 2



x

d f (t )dt + f (t )dt dx 0 0 d 2 2 ( ) + ( ) f x f x x dx

·

  

Chain Rule f (x ) + f (x 2 ) (2x ) = 2x f (x 2 )

·

− f (x ).

Section 6 Integration techniques

The limits of antidifferentiation



b b

We’ve now seen that in order to compute a f (x ) dx , we need only find an antiderivative of f .  Recall that every continuous function f has an antiderivative, 



x

F (x ) =

f (t ) dt .

0

Finding antiderivatives explicitly can be extremely challenging, however.  Next we’ll see how to invert the chain rule and the product rule we learned for computing derivatives.  However, many functions just do not have simple antiderivatives. In particular, there are no elementary formulas for the following:

    x 2

e dx

e x dx x

2)

sin(x dx sin x x

dx

 

cos(x 2 ) dx cos x x

dx

A. Substitution, changing variables The Chain Rule states F (g (x )) = f (g (x ))g  (x ),

whenever F  = f , and therefore



b

 

b

f (g (x ))g  (x ) dx =

a

)) dx F (g (x ))

a

= F (g (b )) )) g (b )

=

− F (g (a))

f (u ) du .

g (a)

This gives the substitution formula

 a



g (b )

b

f (g (x ))g  (x )dx =

g (a)

f (u ) du .

We can think of the substitution formula as giving us a way to change variables from x to u = g (x ), in which case we have the very useful mnemonic: du = g  (x )dx , although strictly speaking the symbols “du ” and “dx ” are not defined by themselves. We can then write the substitution formula as



))g  (x ) dx = f (g (x ))



f (u ) du .

Our main purpose in finding antidervatives is to evaluate definite integrals. When using u -substitution, -substitution, we can follow two routes:  Find an antiderivative as usual, and evaluate at end points.  An alternative (and usually easier) method is to replace the bounds of integration when we change variables.

Example 6.1 Find



x 2

x e dx .

SOLUTION: If we set u = x 2 , then du = 2xdx . We obtain

 · ·

2

 

2 1 e x 2x dx 2 1 = e u du 2 1 = e u + C 2 1 x 2 = e + C . 2

x e x dx =

·

We must always check our work: d dx





1 x 2 1 x 2 d 2 x 2 e + C = e x + 0 = x e 2 2 dx

· ·

Example 6.2 Find



cos(ln x ) x

dx

SOLUTION:This one looks pretty awful, but if we make the substitution 1 u = ln x , then du = dx and we have x



cos(ln x ) x

dx =



cos u du

= sin u + + C = sin(ln x ) + C .

Again, we should check our work by computing the derivative of F (x ) = sin(ln x ).

Many integrals can be solved in multiple ways. By a previous theorem, we know all antiderivates will differ from each other by a constant.

Example 6.3 For example, we can find 



√ x 2

x +1

dx in two different ways:

1, so du = 2x dx and x dx = 12 du : Method 1. Set u = x 2 + 1,

 √

 √ 

1 x du dx = 2 u x 2 + 1 1 = u −1/2 du 2 1 u 1/2 = + C 2 1/2

·



=

√ 1. Then du = Method 2. Set u = x 2 + 1.

 √

x

x 2 + 1

dx =



x 2 + 1 + C .

√ 2x 2 dx , and we get 2 x +1

+ C = du = u +

Both methods give the same answer.

 

x 2 + 1 + C .

Example 6.4 Find

  x 5

1 + x 2 dx

SOLUTION: Let’s try the substitution u = 1 + x 2 . Then du = 2x dx , so x dx = du 2 :

  x 5

1 + x 2 dx = =

   − √  −  −

1 2 1 = 2 =

(x 2 )2

1 + x 2 x dx

(u

2

1)

u 5/2

1 u du 2

2u 3/2 + u 1/2 du

7/2

u

7/2

(1 + x 2 )7/2 = 7

2

5/2

u

5/2

−

3/2

+

u

3/2



+ C

2(1 + x 2 )5/2 (1 + x 2 )3/2 + + C . 5 3

Example 6.5



e

Find

1

ln x x

dx

SOLUTION: We’ll use u = ln x , so du =

dx . x

Note that u (1) (1) = 0 and u (e ) = 1. Thus we have



e

1

ln x x



dx =

0



2 1

1

u du =

u

2

= 0

1 2

− 0 = 12 .

Example 6.6 (Normalizing constant for a cdf) We wish to define a continuous probability distribution on the interval Ω = (1, (1, e ), by means of a probability distribution function of the form f (x ) =

1 Z

·

ln x x

.

How should the constant Z be chosen?

SOLUTION: We want

  e

f (x ) dx = 1.

1

Equivalently,

e

1

Z

ln x x

1

dx = 1.

Therefore the previous example implies



e

Z =

1

ln x x

dx =

1 . 2

Example 6.7 Find



5 dx . 3 (2−3x )2

SOLUTION: We use u = 2 3x , so du = 3dx and thus dx = Also u (3) (3) = 7, u (5) (5) = 13 13,, and so we have

−

−



5

3

−

−

1 −13 1 du 2 3 −7 u −13 1 1 = 3 u −7

1 dx = 2 (2 3x )

−

 −  ·  

=

1 3

1 13

1 − − −7



=

2 . 91

−du /3.

Example 6.8 (More on logarithms) Recall that we have defined



x

ln x =

1 t

1

0) . dt (x > 0).

Let us now compute for x , y > 0: 0 :



xy

ln(xy ) =

1

1 t

= ln x +

x

dt =

 

1

xy

1

1

t

x y

= ln x +



1 u

du

= ln x + ln y , where we substituted u = x t , du =

dt

dt . x

1 t



xy

dt +

x

1 t

dt

Consequently, if we define the natural logarithm by the integral formula



x

ln x =

1

1 t

dt ,

we can then deduce the standard formula ln(xy ) = ln x + ln y . It is an interesting exercise to use the definition to show also that ln(x y ) = y ln ln x (x > 0, 0 , y

∈ ∈ R).

B. Symmetry: even and odd functions We now turn our attention to the very special case of definite integrals of a the form −a f (x ) dx for functions f that have special symmetries:



DEFINITION The function f is called even even if f ( x ) = f (x ).  The function f is called odd if f ( x ) = f (x ).

− −



−

The terms even and odd come from the power functions: x 2 , x 3 , x 4 , etc. Even: f ( x ) = f (x )

−

Odd: f ( x ) =

−

 a

 a

−f (x )

It looks like we should have −a f (x )dx = 2 0 f (x ) dx for even functions a and −a f (x ) dx = = 0 for odd functions. This is true.



THEOREM (Using symmetry)

  a

If f is an odd function, then −a f (x ) dx = 0. a If f is an even function, then −a f (x ) dx = 2

Proof.



  −  −  −  0

a

f (x ) dx =

−a =

−a

f (x ) dx +

0

a



a

f (x ) dx

)( 1) du + + f ( u )(

− −

a

=

f (x ) dx .

f (x ) dx

0

a

0

0

0

0

=

a

a

f (x ) dx +

−a





  a

+ f (u ) du +

The proof that −a f (x ) dx = 2



a

f (x ) dx

0

(u =

−x , du = −dx )

f (x ) dx = 0.

0

a

0

f (x ) dx for even functions is similar.

Example 6.9



2

sin x Calculate dx . 2 4 4 + 3 + 2 x x −2 SOLUTION: Attempting to find an antiderivative would be a nightmare. Luckily, the integrand is odd: f ( x ) =

−

−f (x ).

So, without calculating anything at all, we can conclude



2

sin x dx = 0. 2 4 −2 4 + 3 x + 2x

C. Integration by parts Recall from the Product Rule that (fg ) = f  g + fg  . Now integrate and use the Fundamental Theorem of Calculus, to learn that



b



b

f  (x )g (x ) dx +

a



b

f (x )g  (x ) dx =

a

(fg ) (x ) dx

a

= f (b )g (b ) b

− f (a)g (a)

= (fg ) a .

|

Rearranging gives the formula for integration by parts:



b

a

 | −

b f  (x )g (x ) dx = (fg ) a

b

a

f (x )g  (x ) dx

Let us now write u = g (x ) and v = f (x ). Recalling the useful (but mathematically mathematically imprecise) imprecise) expressions expressions du = g  dx , dv = f  dx ,

we can rewrite the integration by parts formula as



 − −

u dv = uv

v du .

Whichever form of it we use, the point is that the integration by parts formula gives us a way to move a derivative from one function onto another within an integral. This quite often converts a difficult integral into a simpler one, as we will see in subsequent examples.

Example 6.10



Find x sin sin x dx . = sin x dx , then du = dx and SOLUTION: If we use u = x and dv = v = cos x . So we have:

−



sin x dx = x sin

 − − − 

cos x −x cos

=

( cos x )dx

−x cos cos x + cos x dx = −x cos cos x + + sin x + C We should check our work: d ( x cos cos x + + sin x + C ) = dx

−

− cos x − − x (− sin x ) + cos x = x sin sin x .

Example 6.11



Find ln x dx .

SOLUTION: This one isn’t obviously a candidate problem for integration by parts. But let us try u = ln x and dv = dx . Then we get du =

1 x

dx and v = x . Consequently,



 − −  − −

ln x dx = x ln ln x = x ln ln x = x ln ln x

1

x dx x

1 dx

− x + C −

Check this answer!

Example 6.12



Find x ln ln x dx . 1

SOLUTION: We choose u = ln x and dv = x dx . Then du = dx , x

2

v =

x

2

, and therefore



2

x ln x dx = ln x x ln

Again, confirm this answer.

2 x 2 ln x = 2 x 2 ln x = 2

 −  −

x 2 1 dx 2 x x dx

−

x 2

4

2

+ C .

Example 6.13 Find



x 2 e 3x dx .

SOLUTION: Let’s try u = x 2 , dv = e 3x dx . Then du = 2xdx and v = 13 e 3x . Thus



1 2 3x 2 3x x e dx = x e 3

−

2 3



xe 3x dx .

We can now integrate integrate by parts again, again, this time using u = x , dv = e 3x dx and thus du = dx , v = 13 e 3x :



1 2 3x x e 3 1 2 3x = x e 3 1 = x 2 e 3x 3

x 2 e 3x dx =

− 23 −

2 3x

=

x e

3

−

xe 3x dx

    

1 3x 1 xe e 3x dx 3 3 1 3x 1 1 3x xe e + C 3 3 3 2xe 3x 2e 3x 2C + + C Question: why not ? 9 27 9

−

2 3 2 3

  

− −



Repeated integration by parts

 

To find x 2 e 3x dx , we had to integrate by parts twice .  With a little work, you could find x 3 e 3x dx by integrating by parts 3 times .  To find x n e 3x dx , we would have to integrate by parts n times. 



In statistics, these kind of integrals are very useful for computing moments of probability distributions.

Example 6.14 (Another trick)



Find e x cos x dx . = cos x , dv = e x dx . Then du = SOLUTION: Try u = and



e x cos x dx = e x cos x +

x

− sin x dx , v = e ,



e x sin x dx .

We integrate by parts again with u = = sin x , dv = e x dx , and du = = cos x and v = e x . We now compute



e x cos x dx = e x cos x +



e x sin x dx

 − −

= e x cos x + (e x sin x This implies 2

 

e x cos x dx )

e x cos x dx = e x cos x + e x sin x , and so x

e cos x dx =

e x cos x + e x sin x

2

+ C

Example 6.15 (More on the Gamma Function) Recall that the Gamma function is defined by the integral Γ(x ) =



∞

t x −1 e −t dt .

0

THEOREM (i) The Gamma function satisfies Γ(x + + 1) = x Γ(x )

for all x (0, (0 ,

∈ ∈ ∞).

(ii) In particular, the Gamma function is an extension of the factorial function: Γ(n + 1) = n!

for nonnegative nonnegative integers integers n.

Proof. To prove this, we use integration by parts:



b

u dv = (uv )

a

 | − b a

b

v du .

a

Take u = t x and dv = e −t dt ; so that du = xt x −1 dt and v =



t

−e − . Then

−  −  −  −

b

t x e −t dt =

e −t t x

a

b

b a

( e −t ) x t x −1 dt

a

b

= e −a ax

t x −1 e −t dt .

e −b b x + x

a

Now take the limit a 0 and b . Then the left hand side converges to Γ(x + + 1), and the right hand side becomes x Γ( Γ( x ). Since

→

Γ(1) =

→ → ∞



∞

0

e −t dt =

 −  e −t

∞

0

= 1,

the formula stated in (ii) follows from Γ(x + + 1) = x Γ( Γ(x ). ).

The Gamma function and statistics Certain other values of the Gamma function will turn out to be important in statistics. In particular, Γ



1 = 2

√ π = 1.77245 . . . ;

although this calculation requires tools beyond Math 10. (Take Math 53!) It follows from the rule Γ( x + + 1) = x Γ( Γ(x ) that Γ



√

3 1 = π, Γ 2 2



√

5 3 = π, Γ 2 4



√

7 15 = π,... 2 8

Integral Calculus

Recommend Documents