ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 894
Скачиваний: 1
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.29
nonstandard evaluation
There are a few functions that allow names of objects as well as a character
string of the name. For example:
help(subset)
works where
help(’subset’)
is what really makes sense.
The functions that do this are meant for interactive use. The intention of
allowing names is to be helpful. Such helpfulness seems like a mixed blessing.
It is hard to tell what the net time savings is of removing two keystrokes versus
the periodic confusion that it causes.
If the named object contains a character string of what is really wanted,
some of these functions give you what you want while others (often of necessity)
do not.
foo <- ’subset’
help(foo) # gets help on subset
getAnywhere(foo) # finds foo, not subset
do.call(’getAnywhere’, list(foo)) # finds subset
A partial list of functions that have non-standard evaluation of arguments are:
help
,
rm
,
save
,
attach
,
require
,
library
,
subset
,
replicate
.
The
require
function has a program-safety mechanism in the form of the
character.only
argument.
require(foo) # load package foo
require(foo, character.only=FALSE) # load package foo
require(foo, character.only=TRUE) # load package named by foo
The same is true of
library
.
8.2.30
help for for
The logical thing to do to get help for
for
is:
?for
That doesn’t work. Using the
help
function breaks in a seemingly different way:
help(for)
Instead do (for instance):
?’for’
help(’for’)
95
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.31
subset
The
subset
function is meant to provide convenience in interactive use. It
often causes inconvenience and confusion when used inside functions. Use usual
subscripting, not
subset
, when writing functions.
Patient: Doc, it hurts when I do this.
Doctor: Don’t do that.
Here is an example of
subset
in action:
>
xdf5 <- data.frame(R=1:2, J=3:4, E=5:6, K=7:8)
>
subset(xdf5, select=J:K)
J E K
1 3 5 7
2 4 6 8
>
subset(xdf5, select=-E)
R J K
1 1 3 7
2 2 4 8
The
select
argument allows VERY non-standard use of the
8
:
8
and
8
-
8
oper-
ators. This can be a handy shortcut for interactive use. There is a grave danger
of users expecting such tricks to work in other contexts. Even in interactive use
there is the danger of expecting
J:K
to pertain to alphabetic order rather than
order within the data frame.
Note also that
subset
returns a data frame even if only one column is
selected.
8.2.32
= vs == in subset
There is a big difference between:
subset(Theoph, Subject = 1)
and
subset(Theoph, Subject == 1)
The latter is what is intended, the former does not do any subsetting at all.
8.2.33
single sample switch
The
sample
function has a helpful feature that is not always helpful. Its first
argument can be either the population of items to sample from, or the number
of items in the population. There’s the rub.
>
sample(c(4.9, 8.6), 9, replace=TRUE)
[1] 4.9 4.9 8.6 4.9 8.6 4.9 8.6 4.9 8.6
>
sample(c(4.9), 9, replace=TRUE)
[1] 2 3 3 2 4 4 3 4 1
96
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
If the population is numeric, at least 1 and of size one (due, say, to selection
within a function), then it gets interpreted as the size of the population. Note
in the example above the size is rounded down to the nearest integer.
There is a kludgy workaround, which is to make the population character:
>
as.numeric(sample(as.character(c(4.9)), 9, replace=TRUE))
[1] 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.9
8.2.34
changing names of pieces
R does an extraordinary job of ferreting out replacement statements. For ex-
ample, the following actually does what is intended:
names(mylist$b[[1]]) <- letters[1:10]
It is possible to get it wrong though. Here is an example:
>
right <- wrong <- c(a=1, b=2)
>
names(wrong[1]) <- ’changed’
>
wrong
a b
1 2
>
names(right)[1] <- ’changed’
>
right
changed
b
1
2
What goes wrong is that we change names on something that is then thrown
away. So to change the first two names in our ridiculous example, we would do:
names(mylist$b[[1]])[1:2] <- LETTERS[1:2]
8.2.35
a puzzle
>
class(dfxy)
[1] "data.frame"
>
length(dfxy)
[1] 8
>
length(as.matrix(dfxy))
[1] 120
What is
nrow(dfxy)
97
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.36
another puzzle
If the following is a valid command:
weirdFun()()()
what does
weirdFun
return?
Write an example.
8.2.37
data frames vs matrices
A matrix and a data frame look the same when printed. That is good—they are
conceptually very similar. However, they are implemented entirely differently.
Objects that are conceptually similar but implemented differently are a good
source of confusion.
>
x %*% y
Error in x %*% y : requires numeric matrix/vector arguments
The problem here is that while
x
looks like a matrix, it is actually a data frame.
A solution is to use
as.matrix
, or possibly
data.matrix
,
In theory the actual implementation of data frames should not matter at all
to the user. Theory often has some rough edges.
8.2.38
apply not for data frames
One rough edge is applying a function to a data frame. The
apply
function
often doesn’t do what is desired because it coerces the data frame to a matrix
before proceeding.
apply(myDataFrame, 2, class) # not right
Data frames are actually implemented as a list with each component of the list
being a column of the data frame. Thus:
lapply(myDataFrame, class)
does what was attempted above.
8.2.39
data frames vs matrices (reprise)
Consider the command:
array(sample(x), dim(x))
This permutes the elements of a matrix. If
x
is a data frame, the command will
work but the result will most assuredly not be what you want.
It is possible to get a column of a data frame with a command like:
98
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
x$B
If you try this with a matrix you’ll get an error similar to:
Error in x$B : $ operator is invalid for atomic vectors
If your x might be either a data frame or a matrix, it will be better to use:
x[, ’B’]
On the other hand, if you want to rule out the possibility of a matrix then
8
$
8
might be the better choice.
Operations with data frames can be slower than the same operation on the
corresponding matrix. In one real-world case, switching from data frames to
matrices resulted in about four times the speed.
Simpler is better.
8.2.40
names of data frames and matrices
The
names
of a data frame are not the same as the
names
of the corresponding
matrix. The
names
of a data frame are the column names while the
names
of a
matrix are the names of the individual elements.
Items that are congruent are:
•
rownames
•
colnames
•
dimnames
8.2.41
conflicting column names
Here is an example where expectations are frustrated:
>
one.col.mat <- cbind(matname=letters[1:3])
>
one.col.mat
matname
[1,] "a"
[2,] "b"
[3,] "c"
>
data.frame(x=one.col.mat)
matname
1
a
2
b
3
c
>
data.frame(x=cbind(letters[1:3]))
x
1 a
2 b
3 c
99