CS 314 Notes ( 6/16

6/16

Webpage:

http://www.cs.purdue.edu/homes/cs314

Goals:

you will learn how to use the computer to approximate the solution of nonlinear and linear equations
- can't always get the exact solution
numerical integration and differentiation
solution of differential equations
the potential and limitations of the computer to solve mathematical problems

Textbook:

Numerical Analysis Using Matlab

Mailbox List:

login to CS machine (lore): mailer add me to cs314

Grade Distribution:

25% Midterm
25% Final
50% Projects and Homework

Syllabus:

floating and fixed point representation and also computational error
solutions of non-linear equations of the form f(x) = 0
solution of linear systems of form Ax = B
interpolation
curve fitting
numerical differentiation
numerical integration
numerical optimization
solution of differential equations

Class Time:

MTWTF 11:25A - 12:25P

Binary Numbers:

computers use binary numbers
this is because it was easier to implement circuits with 2 states "on" and "off"
Example: 537.5₁₀ (subindex 10 means base)

Decimal Base: (5x10²) + (3x10¹) + (7x10⁰) + (5x10^-1)
Binary Base: (1x2⁹) + (0x2⁸) + (0x2⁷) + (0x2⁶) + (0x2⁵) + (1x2⁴) + (1x2³) + (0x2²) + (0x2¹) + (1x2⁰) + (1x2^-1) = 1000011001.1₂

Converting Decimal To Binary:

seperate integer part from the fraction: N.F
N = b₀ + (b₁x2¹) + (b₂x2²) + ... + (b_kx2^k)

we wish to find b_i (0 or 1)
therefore, we divide by 2

N/2 = b₀/2 + (b₁x2⁰) + ... + (b_kx2^k-1)

b₀/2 = remainder
(b₁x2⁰) + ... + (b_kx2^k-1) = integer part
the remainder determines if b₀ is 0 or 1; repeat until integer part is 0

Example: 34₁₀

34/2 = 17 remainder 0
17/2 = 8 remainder 1
8/2 = 4 remainder 0
4/2 = 2 remainder 0
2/2 = 1 remainder 0
1/2 = 0 remainder 1
therefore, 34₁₀ = 100010

0.1₂ = 1x2^-1 = 1/2
0.01₂ = 1x2^-2 = 1/4
0.11₂ = 1x2^-1 + 1x2^-2= 3/4
Example: What about 1/3?

1/3 = (1/2 x 0) + (1/4 x 1) + (1/8 x 0) + ... = .01010101....₂

6/17

send questions to cs314-ta@cs.purdue.edu

Representation of numbers in the machine

fixed point

number of bits for integer part and fraction is fixed
Example: NNNN.FF

4 bits integer part
2 bits fraction

Largest number: 1111.11 = 1x2³ + ... + 1x2^-2 = 15.75
Smallest number: 0
Usually you give more bits for both integer part and fraction
Example: 2 bytes integer part and 2 bytes fraction
Advantages:

Fast arithmetic (use the same integer arithmetic)

1.01₂ + 10.10₂ = 1.25₁₀ + 2.5₁₀ = 11.11₂ = 3.75₁₀

Used in signal processors and signal processor program that require real time operations

Example: MP3 Player

Disadvantages:

very restricted range
bad precision: smallest number is restricted to number of bits used in the fraction
generates alot of error during operations

floating point

the decimal point moves around to keep the desired precision
using scientific notation (Q x 10^N) where Q is called mantissa and the precision of the number depends on the number of bits used to represent Q
the numbers are represented in the computer as x = +/- q * 2ⁿ where q and n are represented with a fixed number of bits
to have a unique representation for each number, numbers are adjusted so that q is going to be .5 <= q < 1 except for 0
the uniqueness is useful for some operations such as checking for equality
Example:

absolute error = | 1 - 1.125 | = .125
relative error = ( | 1 - 1.125 | / 1 ) x 100 = 12.5%

Modern architectures give 2 options to represent real numbers

single precision (float)

uses 4 bytes (32 bits)
24 bits mantissa (1 bit for sign)
8 bits exponent (1 bit for sign)
range is 2^-128 to 2¹²⁷
accuracy is the smallest difference between 2 numbers with exponent 0 = .10000...1 x 2^-23 => about 6 decimal digits of precision

double precision

uses 8 bytes (64 bits)
53 bits mantissa (1 bit for sign)
11 bits exponent (1 bit for sign)
range is 2¹⁰²⁴
accuracy is 2^-52 => about 15 digits of accuracy

6/18

Absolute error

|P - P*| where P* is the approximation of P

Relative error

|P - P*|/|P| for P not equal to 0

Chop-off vs. Round-off

pi = 3.141592654
using 3 digits for fraction

chop-off: 3.141
round-off: 3.142

	abs. error	rel. error
chop-off	.59 x 10^-3	1.9 x 10^-4 = .02%
round-off	.41 x 10^-3	1.3 x 10^-4 = .01%

Therefore, round-off better than chop-off

Propagation errors

Assume p and q are approximated by p* and q* such that

p* = p + e_p
q* = q + e_q
sum: p* + q* = p + q + (e_p + e_q)

error = e_p + e_q
error = sum of errors

product: p*q* = (p + e_p)(q + e_q) = pq + pe_q + e_pq + e_pe_q

error = pe_q + e_pq + e_pe_q
error is amplified by magnitude of p and q

No algorithm is:

stable - if small initial errors stay small
unstable - if small initial errors get large after several iterations

Solution of non-linear equations of form f(x) = 0

we will use "iteration" to solve these equations

starting with a value P₀ and a function P_k+1 = g*P_K evaluate P₁, P₂, P₃
we use a stopping criteria to stop computation such as | P_k - P_k+1 | < E

Definitions:

fixed point - a fixed point of g(x) is a real umber p such that p = g(p)
geometrically - the fixed point of a function y = g(x) are the points of intersection between y = g(x) and the line y = x
fixed point iteration - an equation of the form P_n+1 = g*P_n for n = 0,1,2....

Example: x² - 2x + 1 = 0

get a fixed point iteration equation
x² + 1 = 2x => x = (x² + 1)/2 => x_k+1 = (x_k² + 1)/2
assume x₀ = 0
x₁ = 1/2, x₂ = 5/8, .... as k->infinity, x-> 1

fixed point theorem

if | g'(x) | <= k < 1 for x element of [a,b] and g(x) element of [a,b], then the iteration P_n = g(P_n-1) will converge to a unique fixed point P element of [a,b]. In this case, P is said to be an attractive fixed point
if | g'(x) | > 1 then the iteration then the iteration P_n = g(P_n-1) will diverge
2 cases for convergence

0 < g'(P) < 1 => monotone convergence (we expect the slop of g(x) to be below 45 degrees)
-1 < g'(P) < 0 => oscillating convergence (we expect the slope of g(x) to be below 45 degrees)

what kind of convergence did we have in previous example?

x_k+1 = (x_k² + 1)/2
g(x) = (x_k² + 1)/2
g'(x) = x
as long as | g'(x) | < 1 and y = g(x), y element of [a,b] and x element of [a,b], then g(x) will converge

6/19

Fixed Point Theorem

If | g'(x) | <= 1 for all x element of [a,b] and g(x) element of [a,b], then g(x) converges
If | g'(x) | > 1, then g(x) will diverge
Example: x² - 2x + 1 = 0

x_k+1 = ( x_k² + 1 )/2
x₀ = 0

₁

_k+1

g(x) = ( x² + 1 )/2
g'(x) = x
g'(0) = 0

Example: now let x₀ = 2

g'(x) = x => g'(2) = 2
therefore by fixed point theorem, this diverges

Solution of nonlinear equations ( f(x) = 0 )

Bisection Method

similar to binary search
starting with two values a,b (an interval) such that a < b and f(a) < 0 and f(b) > 0 or f(a) > 0 and f(b) < 0, this means there is an x such that a <= x <= b and f(x) = 0
get middle point of [a,b] => c = (a + b)/2
if f(a) and f(c) have opposite signs then a zero lies in the interval [a,c] and substitute c for b
if f(b) and f(c) have opposite signs then a zero lies in the interval [c,b] and substitute c for a
if f(c) = 0, then c is a zero
Example: sin(x) = 1/2 ( x is in degrees )

f(x) = sin(x) - 1/2 = 0
start with a = 0, b = 50
f(0) = 0 - .5 = -.5 , f(50) = .266
f(a) < 0 and f(b) > 0 so we are ok

a	b	c	f(c)
0	50	25	-.077
25	50	37.5	.1087
25	37.5	31.25	.018

c is approaching 30 degrees as noticed in graph below

Analysis of algorithm

at each step the range a-b is reduced by half, therefore at each step n | a_n - b_n | = ( |a₀ - b₀ | )/2ⁿ
if we want an approximation error < E, how many steps do we need?

| a_n - b_n | = ( |a₀ - b₀ | )/2ⁿ < E
( |a₀ - b₀ | )/E < 2ⁿ
log{ ( |a₀ - b₀ | )/E } < log( 2ⁿ )
log{ ( |a₀ - b₀ | )/E } < n * log(2)
[ log{ ( |a₀ - b₀ | )/E } ]/ log(2) < n

Disadvantage: must choose a and b close so you must know what you are looking for

Going back to fixed point theorem

Two cases for convergence

monotone convergence: 0 < g'(P) < 1
oscillating convergence: -1 < g'(P) < 0

Two cases for divergence

monotone divergence

1 < g'(P)

divergent oscillation

g'(P) < -1

6/20

Solving f(x) = 0

False Position (Regula Falsi)

similar to bisection method
needs an interval [a,b] where a zero is found
converges faster than bisection
graphically
a line is subtended between ( a, f(a) ) and ( b, f(b) )
we use the same rules as in bisection

if f(a) and f(c) have different signs, assign b as c
if f(b) and f(c) have different signs, assign a as c

how to compute c?

get slope: m = [f(a) - f(b)]/[a-b] and m = [f(b) - 0]/[b - c]
[f(a) - f(b)]/[a-b] = [f(b) - 0]/[b - c]
c = b - [f(b) * { (a - b)/( f(a) - f(b) )}]

example: sin(x) = 1/2

f(x) = sin(x) - 1/2 = 0
a = 0, b = 50

i	a	f(a)	b	f(b)	c	f(c)
0	0	-1/2	50	.266	32.637	.0393
1	0	-1/2	32.637	.0393	30.259	.0039

c approaches 30 and f(c) approaches 0
when to stop the iterations?

when the method converges to the desired precision

horizontal convergence - instead of expecting f(x) = 0, expect f(x) to be between some small error element of ( |f(x)| < E )

for example, if E = .001 in the previous example, then we would stop at iteration i = 2 ( | .000375 | < .001 )
geometrically, this means that the zero is in some horizontal band

vertical convergence - stop when x is at a distance G from P (the solution), therefore stop when | x - p | < G

however, this is not possible to know since this implies that we already know p
this is approximated by determining when | P_n - P_n-1 | < G

use both horizontal and vertical convergence

stop when | P_n - P_n-1 | < G and | f(x) | < E

Troublesome functions

functions where obtaining the solution of f(x) = 0 is difficult
if the graph y = f(x) is steep near the root (p,0) then the root finding is "well conditioned", i.e. a solution with good precision is easy to obtain

if the graph y = f(x) is shallow near p(0) then the root finding is "ill conditioned", i.e. the root finding may only have a few significant digits