ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 867
Скачиваний: 1
CIRCLE 1. FALLING INTO THE FLOATING POINT TRAP
It didn’t take many iterations before I realized that it only ever told me
about the
first
error it came to. Finally on the third day, the output featured
no messages about errors. There was an answer—a
wrong
answer. It was a
simple quadratic equation, and the answer was clearly 2 and 3. The program
said it was 1.999997 and 3.000001. All those hours of misery and it can’t even
get the right answer.
I can write an R function for the quadratic formula somewhat quicker.
>
quadratic.formula
function (a, b, c)
{
rad <- b^2 - 4 * a * c
if(is.complex(rad) || all(rad >= 0))
{
rad <- sqrt(rad)
}
else
{
rad <- sqrt(as.complex(rad))
}
cbind(-b - rad, -b + rad) / (2 * a)
}
>
quadratic.formula(1, -5, 6)
[,1] [,2]
[1,]
2
3
>
quadratic.formula(1, c(-5, 1), 6)
[,1]
[,2]
[1,]
2.0+0.000000i
3.0+0.000000i
[2,] -0.5-2.397916i -0.5+2.397916i
It is more general than that old program, and more to the point it gets the
right answer of 2 and 3. Except that it doesn’t. R merely prints so that most
numerical error is invisible. We can see how wrong it actually is by subtracting
the right answer:
>
quadratic.formula(1, -5, 6) - c(2, 3)
[,1] [,2]
[1,]
0
0
Well okay, it gets the right answer in this case. But there
is
error if we change
the problem a little:
>
quadratic.formula(1/3, -5/3, 6/3)
[,1] [,2]
[1,]
2
3
>
print(quadratic.formula(1/3, -5/3, 6/3), digits=16)
[,1]
[,2]
[1,] 1.999999999999999 3.000000000000001
>
quadratic.formula(1/3, -5/3, 6/3) - c(2, 3)
[,1]
[,2]
[1,] -8.881784e-16 1.332268e-15
10
CIRCLE 1. FALLING INTO THE FLOATING POINT TRAP
That R prints answers nicely is a blessing. And a curse. R is good enough at
hiding numerical error that it is easy to forget that it is there. Don’t forget.
Whenever floating point operations are done—even simple ones, you should
assume that there will be numerical error. If by chance there is no error, regard
that as a happy accident—not your due. You can use the
all.equal
function
instead of
8
==
8
to test equality of floating point numbers.
If you have a case where the numbers are logically integer but they have
been computed, then use
round
to make sure they really are integers.
Do not confuse numerical error with an error. An error is when a com-
putation is wrongly performed. Numerical error is when there is visible noise
resulting from the finite representation of numbers. It is numerical error—not
an error—when one-third is represented as 33%.
We’ve seen another aspect of virtuous pagan beliefs—what is printed is all
that there is.
>
7/13 - 3/31
[1] 0.4416873
R prints—by default—a handy abbreviation, not all that it knows about num-
bers:
>
print(7/13 - 3/31, digits=16)
[1] 0.4416873449131513
Many summary functions are even more restrictive in what they print:
>
summary(7/13 - 3/31)
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
0.4417
0.4417
0.4417
0.4417
0.4417
0.4417
Numerical error from finite arithmetic can not only fuzz the answer, it can fuzz
the question. In mathematics the rank of a matrix is some specific integer. In
computing, the rank of a matrix is a vague concept. Since eigenvalues need not
be clearly zero or clearly nonzero, the rank need not be a definite number.
We descended to the edge of the first Circle where Minos stands guard,
gnashing his teeth. The number of times he wraps his tail around himself
marks the level of the sinner before him.
11
Circle 2
Growing Objects
We made our way into the second Circle, here live the gluttons.
Let’s look at three ways of doing the same task of creating a sequence of
numbers. Method 1 is to grow the object:
vec <- numeric(0)
for(i in 1:n) vec <- c(vec, i)
Method 2 creates an object of the final length and then changes the values in
the object by subscripting:
vec <- numeric(n)
for(i in 1:n) vec[i] <- i
Method 3 directly creates the final object:
vec <- 1:n
Table
shows the timing in seconds on a particular (old) machine of these
three methods for a selection of values of
n
. The relationships for varying
n
are
all roughly linear on a log-log scale, but the timings are drastically different.
You may wonder why growing objects is so slow. It is the computational
equivalent of suburbanization. When a new size is required, there will not be
Table 2.1: Time in seconds of methods to create a sequence.
n
grow
subscript
colon operator
1000
0.01
0.01
.00006
10,000
0.59
0.09
.0004
100,000
133.68
0.79
.005
1,000,000
18,718
8.10
.097
12
CIRCLE 2. GROWING OBJECTS
enough room where the object is; so it needs to move to a more open space.
Then that space will be too small, and it will need to move again. It takes a lot
of time to move house. Just as in physical suburbanization, growing objects can
spoil all of the available space. You end up with lots of small pieces of available
memory, but no large pieces. This is called fragmenting memory.
A more common—and probably more dangerous—means of being a glutton
is with
rbind
. For example:
my.df <- data.frame(a=character(0), b=numeric(0))
for(i in 1:n)
{
my.df <- rbind(my.df, data.frame(a=sample(letters, 1),
b=runif(1)))
}
Probably the main reason this is more common is because it is more likely that
each iteration will have a different number of observations. That is, the code is
more likely to look like:
my.df <- data.frame(a=character(0), b=numeric(0))
for(i in 1:n)
{
this.N <- rpois(1, 10)
my.df <- rbind(my.df, data.frame(a=sample(letters,
this.N, replace=TRUE), b=runif(this.N)))
}
Often a reasonable upper bound on the size of the final object is known. If so,
then create the object with that size and then remove the extra values at the
end. If the final size is a mystery, then you can still follow the same scheme,
but allow for periodic growth of the object.
current.N <- 10 * n
my.df <- data.frame(a=character(current.N),
b=numeric(current.N))
count <- 0
for(i in 1:n)
{
this.N <- rpois(1, 10)
if(count + this.N > current.N)
{
old.df <- my.df
current.N <- round(1.5 * (current.N + this.N))
my.df <- data.frame(a=character(current.N),
b=numeric(current.N))
my.df[1:count,] <- old.df[1:count, ]
}
my.df[count + 1:this.N,] <- data.frame(a=sample(letters,
this.N, replace=TRUE), b=runif(this.N))
count <- count + this.N
}
my.df <- my.df[1:count,]
13
CIRCLE 2. GROWING OBJECTS
Figure 2.1: The giants by Sandro Botticelli.
Often there is a simpler approach to the whole problem—build a list of pieces
and then scrunch them together in one go.
my.list <- vector(’list’, n)
for(i in 1:n)
{
this.N <- rpois(1, 10)
my.list[[i]] <- data.frame(a=sample(letters, this.N
replace=TRUE), b=runif(this.N))
}
my.df <- do.call(’rbind’, my.list)
There are ways of cleverly hiding that you are growing an object. Here is an
example:
hit <- NA
for(i in 1:one.zillion)
{
if(runif(1) < 0.3) hit[i] <- TRUE
}
Each time the condition is true,
hit
is grown.
Eliminating the growth of objects can be one of the easiest and most dra-
matic ways of speeding up R code.
14