ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 892
Скачиваний: 1
8.3. DEVILS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.3.20
sapply simplification
The
sapply
function “simplifies” the output of
lapply
. It isn’t always so simple.
That is, the simplification that you get may not be the simplification you expect.
This uncertainty makes
sapply
not so suitable for use inside functions. The
vapply
function is sometimes a safer alternative.
8.3.21
one-dimensional arrays
Arrays can be of any positive dimension (modulo memory and vector length
limits). In particular, one-dimensional arrays are possible. Almost always these
look and act like plain vectors. Almost.
Here is an example where they don’t:
>
df2 <- data.frame(x=rep(1, 3), y=tapply(1:9,
+
factor(rep(c(’A’, ’B’, ’C’), each=3)), sum))
>
df2
x y
A 1 6
B 1 15
C 1 24
>
tapply(df2$y, df2$x, length)
1
3
>
by(df2$y, df2$x, length)
INDICES: 1
[1] 1
>
by(as.vector(df2$y), df2$x, length)
INDICES: 1
[1] 3
tapply
returns an array, in particular it can return a one-dimensional array—
which is the case with
df2$y
. The
by
function in this case when given a one-
dimensional array produces the correct answer to a question that we didn’t
think we were asking.
One-dimensional arrays are neither matrices nor (exactly) plain vectors.
8.3.22
by is for data frames
The
by
function is essentially just a pretty version of
tapply
for data frames.
The “for data frames” is an important restriction. If the first argument of your
call to
by
is not a data frame, you may be in for trouble.
>
tapply(array(1:24, c(2,3,4)), 1:24 %% 2, length)
0 1
12 12
>
by(array(1:24, c(2,3,4)), 1:24 %% 2, length)
110
8.3. DEVILS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
Error in tapply(1:2L, list(INDICES = c(1, 0, 1, 0, 1, 0, 1,
:
arguments must have same length
In this example we have the good fortune of an error being triggered, we didn’t
have that for the problem in Circle
8.3.23
stray backquote
A stray backquote in a function definition can yield the error message:
symbol print-name too long
The backquote is sometimes (too) close to the tab key and/or the escape key.
It is also close to minimal size and hence easy to overlook.
8.3.24
array dimension calculation
There are times when the creation of matrices fails to be true to the intention:
>
mf <- matrix(runif((2 - .1) / .1 * 5), ncol=5)
Warning message: data length [94] is not a sub-multiple or
multiple of the number of rows [19] in matrix
Notice that the matrix is created—that is a warning not an error—the matrix
is merely created inappropriately. If you ignore the warning, there could be
consequences down the line.
Let’s investigate the ingredients:
>
(2 - .1) / .1
[1] 19
>
(2 - .1) / .1 - 19
[1] -3.552714e-15
>
as.integer((2 - .1) / .1)
[1] 18
When R coerces from a floating point number to an integer it truncates rather
than rounds.
The moral of the story is that
round
can be a handy function to use. In a
sense this problem really belongs in Circle 1 (page
), but the subtlety makes it
difficult to find.
8.3.25
replacing pieces of a matrix
We have two matrices:
>
m6 <- matrix(1:6, 3)
>
m4 <- matrix(101:104, 2)
>
m6
111
8.3. DEVILS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
[,1] [,2]
[1,]
1
4
[2,]
2
5
[3,]
3
6
>
m4
[,1] [,2]
[1,] 101
103
[2,] 102
104
Our task is to create a new matrix similar to
m6
where some of the rows are
replaced by the first row of
m4
. Here is the natural thing to do:
>
m6new <- m6
>
m6new[c(TRUE, FALSE, TRUE), ] <- m4[1,]
>
m6new
[,1] [,2]
[1,] 101
101
[2,]
2
5
[3,] 103
103
We are thinking about rows being the natural way of looking at the problem.
The problem is that that isn’t the R way, despite the context.
One way of getting what we want is:
>
s6 <- c(TRUE, FALSE, TRUE)
>
m6new[s6, ] <- rep(m4[1,], each=sum(s6))
>
m6new
[,1] [,2]
[1,] 101
103
[2,]
2
5
[3,] 101
103
8.3.26
reserved words
R is a language. Because of this, there are reserved words that you can not use
as object names. Perhaps you can imagine the consequences if the following
command actually worked:
FALSE <- 4
You can see the complete list of reserved words with:
?Reserved
8.3.27
return is a function
Unlike some other languages
return
is a function that takes the object meant
to be returned. The following construct does NOT do what is intended:
112
8.3. DEVILS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
return (2/5) * 3:9
It will return 0.4 and ignore the rest.
8.3.28
return is a function (still)
return
is a function and not a reserved word.
>
# kids, don’t try this at home
>
return <- function(x) 4 * x
>
# notice: no error
>
rm(return)
8.3.29
BATCH failure
Friday afternoon you start off a batch job happy in the knowledge that come
Monday morning you will have the results of sixty-some hours of computation
in your hands. Come Monday morning results are nowhere to be found. The
job fell over after an hour because of a stray comma in your file of commands.
Results can’t be guaranteed, but it is possible to at least test for that stray
comma and its mates. Once you’ve written your file of commands, parse the
file:
parse(file=’batchjob.in’)
If there is a syntax error in the file, then you’ll get an error and a location for
the (first) error. If there are no syntax errors, then you’ll get an expression (a
large expression).
8.3.30
corrupted .RData
There are times when R won’t start in a particular location because of a cor-
rupted
.RData
file. If what you have in the
.RData
is important, this is bad
news.
Sometimes this can be caused by a package not being attached in the R
session that the file depends on. Whether or not this is the problem, you can
try starting R in vanilla mode (renaming the
.RData
file first is probably a good
idea) and then try attaching the file.
In principle it should be possible to see what objects a
.RData
file holds and
extract a selection of objects from it. However, I don’t know of any tools to do
that.
8.3.31
syntax errors
Syntax errors are one of the most common problems, especially for new users.
Unfortunately there is no good way to track down the problem other than
113
8.3. DEVILS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
puzzling over it. The most common problems are mismatched parentheses or
square brackets, and missing commas.
Using a text editor that performs syntax highlighting can eliminate a lot of
the problems.
Here is a particularly nasty error:
>
lseq <- seq(0, 1, 1ength=10)
Error: unexpected input in "seq(0, 1, 1en"
Hint: the end of the error message is the important location. In fact, the last
letter that it prints is the first point at which it knew something was wrong.
8.3.32
general confusion
If you are getting results that are totally at odds with your expectations, look
where you are stepping:
•
Objects may be different than you expect. You can use
str
to diagnose
this possibility. (The output of
str
may not make much sense immedi-
ately, but a little study will reveal what it is saying.)
•
Functions may be different than you expect. Try using
conflicts
to
diagnose this.
•
Pretty much the only thing left is your expectations.
Calls to
browser
,
cat
and
debugger
can help you eliminate ghosts, chimeras
and devils. But the most powerful tool is your skepticism.
114