ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 902
Скачиваний: 1
8.1. GHOSTS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
mat1 <- cbind(1:3, 7:9)
>
df1 <- data.frame(1:3, 7:9)
Now, notice:
>
mean(mat1)
[1] 5
>
mean(df1)
X1.3 X7.9
2
8
>
median(mat1)
[1] 5
>
median(df1)
[1] 2 8
>
sum(mat1)
[1] 30
>
sum(df1)
[1] 30
The example of
median
with data frames is a troublesome one. As of R version
2.13.0 there is not a data frame method of
median
. In this particular case it
gets the correct answer, but that is an accident. In other cases you get bizarre
answers.
Unless and until there is such a method, you can get what I imagine you
expect with:
>
sapply(df1, median)
X1.3 X7.9
2
8
8.1.18
first match only
match
only matches the first occurrence:
>
match(1:2, rep(1:4, 2))
[1] 1 2
If that is not what you want, then change what you do:
>
which(rep(1:4, 2) %in% 1:2)
[1] 1 2 5 6
8.1.19
first match only (reprise)
If names are not unique, then subscripting with characters will only give you
the first match:
55
8.1. GHOSTS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
x4 <- c(a=1, b=2, a=3)
>
x4["a"]
a
1
If this is not the behavior you want, then you probably want to use
8
%in%
8
:
>
x4[names(x4) %in% ’a’]
a a
1 3
8.1.20
partial matching can partially confuse
Partial matching happens in function calls and some subscripting.
The two following calls are the same:
>
mean(c(1:10, 1000), trim=.25)
[1] 6
>
mean(c(1:10, 1000), t=.25)
[1] 6
The
trim
argument is the only argument to
mean.default
which starts with
“t” so R knows that you meant “trim” when you only said “t”. This is helpful,
but some people wonder if it is too helpful by a half.
>
l1 <- list(aa=1:3, ab=2:4, b=3:5, bb=4:6, cc=5:7)
>
l1$c
[1] 5 6 7
>
l1[[’c’]]
NULL
>
l1[[’c’, exact=FALSE]]
[1] 5 6 7
>
l1$a
NULL
>
myfun1 <- function(x, trim=0, treat=1)
{
+
treat * mean(x, trim=trim)
+
}
>
myfun1(1:4, tr=.5)
Error in myfun1(1:4, tr = .05) :
argument 2 matches multiple formal arguments
The
8
$
8
operator always allows partial matching. The
8
[[
8
operator, which
is basically synonymous with
8
$
8
on lists, does not allow partial matching by
default (in recent versions of R). An ambiguous match results in
NULL
for lists,
but results in an error in function calls. The
myfun1
example shows why an
error is warranted. For the full details on subscripting, see:
?Extract
56
8.1. GHOSTS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
Here are the rules for argument matching in function calls, but first some vocab-
ulary: A
formal argument
is one of the argument names in the definition of the
function. The
mean.default
function has 4 formal arguments (
x
,
trim
,
na.rm
and
8
...
8
). A
tag
is the string used in a call to indicate which formal argument
is meant. We used “t” as a tag in a call to
mean
(and hence to
mean.default
).
There is a partial match if all the characters of the tag match the start of the
formal argument.
•
If a tag matches a formal argument exactly, then the two are bound.
•
Unmatched tags are partially matched to unmatched formal arguments.
•
An error occurs if any tag partially matches more than one formal argu-
ment not already bound.
•
(Positional matching) Unmatched formal arguments are bound to un-
named (no tag) arguments in the call, based on the order in the call
and of the formal arguments.
•
If
8
...
8
is among the formal arguments, any formal arguments after
8
...
8
are only matched exactly.
•
If
8
...
8
is among the formal arguments, any unmatched arguments in
the call, tagged or not, are taken up by the
8
...
8
formal argument.
•
An error occurs if any supplied arguments in the call are unmatched.
The place where partial matching is most likely to bite you is in calls that take a
function as an argument and you pass in additional arguments for the function.
For example:
apply(xmat, 2, mean, trim=.2)
If the
apply
function had an argument that matched or partially matched
“trim”, then
apply
would get the
trim
value, not
mean
.
There are two strategies in general use to reduce the possibility of such
collisions:
•
The apply family tends to have arguments that are in all capitals, and
hence unlikely to collide with arguments of other functions that tend to
be in lower case.
•
Optimization functions tend to put the
8
...
8
(which is meant to be given
to the function that is an argument) among the first of its arguments. Thus
additional arguments to the optimizer (as opposed to the function being
optimized) need to be given by their full names.
Neither scheme is completely satisfactory—you can still get unexpected colli-
sions in various ways. If you do (and you figure out what is happening), then
you can include all of the arguments in question in your call.
57
8.1. GHOSTS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.1.21
no partial match assignments
One of the most pernicious effects of partial matching in lists is that it can fool
you when making replacements:
>
ll2 <- list(aa=1:3, bb=4:6)
>
ll2$b
[1] 4 5 6
>
ll2$b <- 7:9
>
ll2
$aa
[1] 1 2 3
$bb
[1] 4 5 6
$b
[1] 7 8 9
This applies to data frames as well (data frames are lists, after all).
8.1.22
cat versus print
If you
a vector that does not have names, there is an indication of the
index of the first element on each line:
>
options(width=20)
>
1:10
[1]
1
2
3
4
5
[6]
6
7
8
9 10
Alternatively,
cat
just prints the contents of the vector:
>
cat(1:10)
1 2 3 4 5 6 7 8 9 10>
Notice that there is not a newline after the results of
cat
, you need to add that
yourself:
cat(1:10, ’
\
n’)
There is a more fundamental difference between
and
cat
—
cat
actually
interprets character strings that it gets:
>
xc <- ’blah
\\
blah
\
tblah
\
n’
>
print(xc)
[1] "blah
\\
blah
\
tblah
\
n"
>
cat(xc)
blah
\
blah
blah
>
58
8.1. GHOSTS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
Table 8.1: A few of the most important backslashed characters.
character
meaning
\\
backslash
\
n
newline
\
t
tab
\
"
double quote (used when this is the string delimiter)
\
’
single quote (used when this is the string delimiter)
Strings are two-faced. One face is what the string actually says (this is what
cat
gives you). The other face is a representation that allows you to see all of
the characters—how the string is actually built—this is what
gives you.
Do not confuse the two.
Reread this item—it is important. Important in the sense that if you don’t
understand it, you are going to waste a few orders of magnitude more time
fumbling around than it would take to understand.
8.1.23
backslashes
Backslashes are the escape character for R (and for Unix and C).
Since backslash doesn’t mean backslash, there needs to be a way to mean
backslash. Quite logically that way is backslash-backslash:
>
cat(’
\\
’)
\
>
Sometimes the text requires a backslash after the text has been interpreted. In
the interpretation each pair of backslashes becomes one backslash. Backslashes
grow in powers of two.
There are two other very common characters involving backslash:
\
t
means
tab and
\
n
means newline. Table
shows the characters using backslash that
you are most likely to encounter. You can see the entire list via:
?Quotes
Note that
nchar
(by default) gives the number of logical characters, not the
number of keystrokes needed to create them:
>
nchar(’
\\
’)
[1] 1
8.1.24
internationalization
It may surprise some people, but not everyone writes with the same alphabet.
To account for this R allows string encodings to include latin1 and UTF-8.
59