6/16
Webpage:
http://www.cs.purdue.edu/homes/cs314
Goals:
- you will learn how to use the computer to approximate the solution of nonlinear and linear equations
- can't always get the exact solution
- numerical integration and differentiation
- solution of differential equations
- the potential and limitations of the computer to solve mathematical problems
Textbook:
Numerical Analysis Using Matlab
Mailbox List:
login to CS machine (lore): mailer add me to cs314
Grade Distribution:
- 25% Midterm
- 25% Final
- 50% Projects and Homework
Syllabus:
- floating and fixed point representation and also computational error
- solutions of non-linear equations of the form f(x) = 0
- solution of linear systems of form Ax = B
- interpolation
- curve fitting
- numerical differentiation
- numerical integration
- numerical optimization
- solution of differential equations
Class Time:
MTWTF 11:25A - 12:25P
Binary Numbers:
- computers use binary numbers
- this is because it was easier to implement circuits with 2 states "on" and "off"
- Example: 537.510 (subindex 10 means base)
- Decimal Base: (5x102) + (3x101) + (7x100) + (5x10-1)
- Binary Base: (1x29) + (0x28) + (0x27) + (0x26) + (0x25) + (1x24) + (1x23) + (0x22) + (0x21) + (1x20) + (1x2-1) = 1000011001.12
Converting Decimal To Binary:
- seperate integer part from the fraction: N.F
- N = b0 + (b1x21) + (b2x22) + ... + (bkx2k)
- we wish to find bi (0 or 1)
- therefore, we divide by 2
- N/2 = b0/2 + (b1x20) + ... + (bkx2k-1)
- b0/2 = remainder
- (b1x20) + ... + (bkx2k-1) = integer part
- the remainder determines if b0 is 0 or 1; repeat until integer part is 0
- Example: 3410
- 34/2 = 17 remainder 0
- 17/2 = 8 remainder 1
- 8/2 = 4 remainder 0
- 4/2 = 2 remainder 0
- 2/2 = 1 remainder 0
- 1/2 = 0 remainder 1
- therefore, 3410 = 100010
- 0.12 = 1x2-1 = 1/2
- 0.012 = 1x2-2 = 1/4
- 0.112 = 1x2-1 + 1x2-2= 3/4
- Example: What about 1/3?
- 1/3 = (1/2 x 0) + (1/4 x 1) + (1/8 x 0) + ... = .01010101....2
6/17
send questions to cs314-ta@cs.purdue.edu
Representation of numbers in the machine
- fixed point
- number of bits for integer part and fraction is fixed
- Example: NNNN.FF
- 4 bits integer part
- 2 bits fraction
- Largest number: 1111.11 = 1x23 + ... + 1x2-2 = 15.75
- Smallest number: 0
- Usually you give more bits for both integer part and fraction
- Example: 2 bytes integer part and 2 bytes fraction
- Advantages:
- Fast arithmetic (use the same integer arithmetic)
- 1.012 + 10.102 = 1.2510 + 2.510 = 11.112 = 3.7510
- Used in signal processors and signal processor program that require real time operations
- Disadvantages:
- very restricted range
- bad precision: smallest number is restricted to number of bits used in the fraction
- generates alot of error during operations
- floating point
- the decimal point moves around to keep the desired precision
- using scientific notation (Q x 10N) where Q is called mantissa and the precision of the number depends on the number of bits used to represent Q
- the numbers are represented in the computer as x = +/- q * 2n where q and n are represented with a fixed number of bits
- to have a unique representation for each number, numbers are adjusted so that q is going to be .5 <= q < 1 except for 0
- the uniqueness is useful for some operations such as checking for equality
- Example:
- absolute error = | 1 - 1.125 | = .125
- relative error = ( | 1 - 1.125 | / 1 ) x 100 = 12.5%
Modern architectures give 2 options to represent real numbers
- single precision (float)
- uses 4 bytes (32 bits)
- 24 bits mantissa (1 bit for sign)
- 8 bits exponent (1 bit for sign)
- range is 2-128 to 2127
- accuracy is the smallest difference between 2 numbers with exponent 0 = .10000...1 x 2-23 => about 6 decimal digits of precision
- double precision
- uses 8 bytes (64 bits)
- 53 bits mantissa (1 bit for sign)
- 11 bits exponent (1 bit for sign)
- range is 21024
- accuracy is 2-52 => about 15 digits of accuracy
6/18
Absolute error
|P - P*| where P* is the approximation of P
Relative error
|P - P*|/|P| for P not equal to 0
Chop-off vs. Round-off
- pi = 3.141592654
- using 3 digits for fraction
- chop-off: 3.141
- round-off: 3.142
|
abs. error |
rel. error |
chop-off |
.59 x 10-3 |
1.9 x 10-4 = .02% |
round-off |
.41 x 10-3 |
1.3 x 10-4 = .01% |
Therefore, round-off better than chop-off
Propagation errors
Assume p and q are approximated by p* and q* such that
- p* = p + ep
- q* = q + eq
- sum: p* + q* = p + q + (ep + eq)
- error = ep + eq
- error = sum of errors
- product: p*q* = (p + ep)(q + eq) = pq + peq + epq + epeq
- error = peq + epq + epeq
- error is amplified by magnitude of p and q
No algorithm is:
- stable - if small initial errors stay small
- unstable - if small initial errors get large after several iterations
Solution of non-linear equations of form f(x) = 0
- we will use "iteration" to solve these equations
- starting with a value P0 and a function Pk+1 = g*PK evaluate P1, P2, P3
- we use a stopping criteria to stop computation such as | Pk - Pk+1 | < E
Definitions:
- fixed point - a fixed point of g(x) is a real umber p such that p = g(p)
- geometrically - the fixed point of a function y = g(x) are the points of intersection between y = g(x) and the line y = x
- fixed point iteration - an equation of the form Pn+1 = g*Pn for n = 0,1,2....
- Example: x2 - 2x + 1 = 0
- get a fixed point iteration equation
- x2 + 1 = 2x => x = (x2 + 1)/2 => xk+1 = (xk2 + 1)/2
- assume x0 = 0
- x1 = 1/2, x2 = 5/8, .... as k->infinity, x-> 1
- fixed point theorem
- if | g'(x) | <= k < 1 for x element of [a,b] and g(x) element of [a,b], then the iteration Pn = g(Pn-1) will converge to a unique fixed point P element of [a,b]. In this case, P is said to be an attractive fixed point
- if | g'(x) | > 1 then the iteration then the iteration Pn = g(Pn-1) will diverge
- 2 cases for convergence
- 0 < g'(P) < 1 => monotone convergence (we expect the slop of g(x) to be below 45 degrees)
- -1 < g'(P) < 0 => oscillating convergence (we expect the slope of g(x) to be below 45 degrees)
- what kind of convergence did we have in previous example?
- xk+1 = (xk2 + 1)/2
- g(x) = (xk2 + 1)/2
- g'(x) = x
- as long as | g'(x) | < 1 and y = g(x), y element of [a,b] and x element of [a,b], then g(x) will converge
6/19
Fixed Point Theorem
- If | g'(x) | <= 1 for all x element of [a,b] and g(x) element of [a,b], then g(x) converges
- If | g'(x) | > 1, then g(x) will diverge
- Example: x2 - 2x + 1 = 0
- xk+1 = ( xk2 + 1 )/2
- x0 = 0
=> x1 = .5 => ... => xk+1 = 1
- g(x) = ( x2 + 1 )/2
- g'(x) = x
- g'(0) = 0
=> therefore by fixed point theorem, this converges
- Example: now let x0 = 2
- g'(x) = x => g'(2) = 2
- therefore by fixed point theorem, this diverges
Solution of nonlinear equations ( f(x) = 0 )
- Bisection Method
- similar to binary search
- starting with two values a,b (an interval) such that a < b and f(a) < 0 and f(b) > 0 or f(a) > 0 and f(b) < 0, this means there is an x such that a <= x <= b and f(x) = 0
- get middle point of [a,b] => c = (a + b)/2
- if f(a) and f(c) have opposite signs then a zero lies in the interval [a,c] and substitute c for b
- if f(b) and f(c) have opposite signs then a zero lies in the interval [c,b] and substitute c for a
- if f(c) = 0, then c is a zero
- Example: sin(x) = 1/2 ( x is in degrees )
- f(x) = sin(x) - 1/2 = 0
- start with a = 0, b = 50
- f(0) = 0 - .5 = -.5 , f(50) = .266
- f(a) < 0 and f(b) > 0 so we are ok
a |
b |
c |
f(c) |
0 |
50 |
25 |
-.077 |
25 |
50 |
37.5 |
.1087 |
25 |
37.5 |
31.25 |
.018 |
- c is approaching 30 degrees as noticed in graph below
- Analysis of algorithm
- at each step the range a-b is reduced by half, therefore at each step n | an - bn | = ( |a0 - b0 | )/2n
- if we want an approximation error < E, how many steps do we need?
- | an - bn | = ( |a0 - b0 | )/2n < E
- ( |a0 - b0 | )/E < 2n
- log{ ( |a0 - b0 | )/E } < log( 2n )
- log{ ( |a0 - b0 | )/E } < n * log(2)
- [ log{ ( |a0 - b0 | )/E } ]/ log(2) < n
- Disadvantage: must choose a and b close so you must know what you are looking for
Going back to fixed point theorem
- Two cases for convergence
- monotone convergence: 0 < g'(P) < 1
- oscillating convergence: -1 < g'(P) < 0
- Two cases for divergence
- monotone divergence
- 1 < g'(P)
- divergent oscillation
- g'(P) < -1
6/20
Solving f(x) = 0
- False Position (Regula Falsi)
- similar to bisection method
- needs an interval [a,b] where a zero is found
- converges faster than bisection
- graphically
- a line is subtended between ( a, f(a) ) and ( b, f(b) )
- we use the same rules as in bisection
- if f(a) and f(c) have different signs, assign b as c
- if f(b) and f(c) have different signs, assign a as c
- how to compute c?
- get slope: m = [f(a) - f(b)]/[a-b] and m = [f(b) - 0]/[b - c]
- [f(a) - f(b)]/[a-b] = [f(b) - 0]/[b - c]
- c = b - [f(b) * { (a - b)/( f(a) - f(b) )}]
- example: sin(x) = 1/2
- f(x) = sin(x) - 1/2 = 0
- a = 0, b = 50
i |
a |
f(a) |
b |
f(b) |
c |
f(c) |
0 |
0 |
-1/2 |
50 |
.266 |
32.637 |
.0393 |
1 |
0 |
-1/2 |
32.637 |
.0393 |
30.259 |
.0039 |
- c approaches 30 and f(c) approaches 0
- when to stop the iterations?
- when the method converges to the desired precision
- horizontal convergence - instead of expecting f(x) = 0, expect f(x) to be between some small error element of ( |f(x)| < E )
- for example, if E = .001 in the previous example, then we would stop at iteration i = 2 ( | .000375 | < .001 )
- geometrically, this means that the zero is in some horizontal band
- vertical convergence - stop when x is at a distance G from P (the solution), therefore stop when | x - p | < G
- however, this is not possible to know since this implies that we already know p
- this is approximated by determining when | Pn - Pn-1 | < G
- use both horizontal and vertical convergence
- stop when | Pn - Pn-1 | < G and | f(x) | < E
Troublesome functions
- functions where obtaining the solution of f(x) = 0 is difficult
- if the graph y = f(x) is steep near the root (p,0) then the root finding is "well conditioned", i.e. a solution with good precision is easy to obtain
- if the graph y = f(x) is shallow near p(0) then the root finding is "ill conditioned", i.e. the root finding may only have a few significant digits