ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 895
Скачиваний: 1
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
c(matrix(1:4, 2))
[1] 1 2 3 4
The operation that
c
does on factors is consistent with this.
A generally good solution is:
c(as.character(myfac1), as.character(myfac2))
Or maybe more likely
factor
of the above expression. Another possibility is:
unlist(list(myfac1, myfac2))
For example:
>
unlist(list(factor(letters[1:3]), factor(LETTERS[7:8])))
[1] a b c G H
Levels: a b c G H
This last solution does not work for ordered factors.
8.2.9
ordering in ordered
You need a bit of care when creating ordered factors:
>
ordered(c(100, 90, 110, 90, 100, 110))
[1] 100 90 110 90 100 110
Levels: 90 < 100 < 110
>
ordered(as.character(c(100, 90, 110, 90, 100, 110)))
[1] 100 90 110 90 100 110
Levels: 100 < 110 < 90
The automatic ordering is done lexically for characters. This makes sense in
general, but not in this case. (Note that the ordering may depend on your
locale.) You can always specify
levels
to have direct control.
You can have essentially this same problem if you try to sort a factor.
8.2.10
labels and excluded levels
The number of labels must equal the number of levels. Seems like a good rule.
These can be the same going into the function, but need not be in the end. The
issue is values that are excluded.
>
factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:5)
Error in factor(c(1:4, 1:3), levels = c(1:4, NA), ... :
invalid labels; length 5 should be 1 or 4
>
factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:4)
85
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
[1] 1 2 3 4 1 2 3
Levels: 1 2 3 4
>
factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:5,
+
exclude=NULL)
[1] 1 2 3 4 1 2 3
Levels: 1 2 3 4 5
And of course I lied to you. The number of labels can be 1 as well as the number
of levels:
>
factor(c(1:4,1:3), levels=c(1:4,NA), labels=’Blah’)
[1] Blah1 Blah2 Blah3 Blah4 Blah1 Blah2 Blah3
Levels: Blah1 Blah2 Blah3 Blah4
8.2.11
is missing missing or missing?
Missing values of course make sense in factors. It is entirely possible that we
don’t know the category into which a particular item falls.
>
f1 <- factor(c(’AA’, ’BA’, NA, ’NA’))
>
f1
[1] AA BA <NA> NA
Levels: AA BA NA
>
unclass(f1)
[1]
1
2 NA
3
attr(,"levels")
[1] "AA" "BA" "NA"
As we saw in Circle
, there is a difference between a missing value and the
string
’NA’
. In
f1
there is a category that corresponds to the string
’NA’
. Values
that are missing are indicated not by the usual
NA
, but by
<NA>
(to distinguish
them from
’NA’
the string when quotes are not used).
It is also possible to have a category that is missing values. This is achieved
by changing the
exclude
argument from its default value:
>
f2 <- factor(c(’AA’, ’BA’, NA, ’NA’), exclude=NULL)
>
f2
[1] AA BA <NA> NA
Levels: AA BA NA NA
>
unclass(f2)
[1] 1 2 4 3
attr(,"levels")
[1] "AA" "BA" "NA" NA
Unlike in
f1
the core data of
f2
has no missing values.
Let’s now really descend into the belly of the beast.
86
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
f3 <- f2
>
is.na(f3)[1] <- TRUE
>
f3
[1] <NA> BA <NA> NA
Levels: AA BA NA NA
>
unclass(f3)
[1] NA 2 4 3
attr(,"levels")
[1] "AA" "BA" "NA" NA
Here we have a level that is missing values, we also have a missing value in the
core data.
To summarize, there are two ways that missing values can enter a factor:
•
Missing means we don’t know what category the item falls into.
•
Missing is the category of items that (originally) had missing values.
8.2.12
data frame to character
>
xdf3 <- data.frame(a=3:2, b=c(’x’, ’y’))
>
as.character(xdf3[1,])
[1] "3" "1"
This is a hidden version of coercing a factor to character. One approach to get
the correct behavior is to use
as.matrix
:
>
as.character(as.matrix(xdf3[1,]))
[1] "3" "x"
I’m not sure if it is less upsetting or more upsetting if you try coercing more
than one row of a data frame to character:
>
as.character(xdf3)
[1] "c(3, 2)" "c(1, 2)"
If the columns of the data frame include factors or characters, then converting
to a matrix will automatically get you a characters:
>
as.matrix(xdf3)
a
b
[1,] "3" "x"
[2,] "2" "y"
1
The author would be intrigued to hear of an application where this makes sense—an item
for which it is unknown if it is missing or not.
87
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.13
nonexistent value in subscript
When a subscript contains values that are not present in the object, the results
vary depending on the context:
>
c(b=1)[c(’a’, ’b’)]
<NA> b
NA 1
>
list(b=1)[c(’a’, ’b’)]
$<NA>
NULL
$b
[1] 1
>
matrix(1:2, 2, 1, dimnames=list(NULL, ’b’))[,c(’a’, ’b’)]
Error: subscript out of bounds
>
matrix(1:2, 1, 2, dimnames=list(’b’, NULL))[c(’a’, ’b’),]
Error: subscript out of bounds
>
data.frame(b=1:2)[, c(’a’, ’b’)]
Error in "[.data.frame"(data.frame(b = 1:2), , c("a", "b")) :
undefined columns selected
>
data.frame(V1=1, V2=2, row.names=’b’)[c(’a’, ’b’),]
V1 V2
NA NA NA
b
1
2
Some people wonder why the names of the extraneous items show up as
NA
and
not as
"a"
. An answer is that then there would be no indication that
"a"
was
not a name in the object.
The examples here are for character subscripts, similar behavior holds for
numeric and logical subscripts.
8.2.14
missing value in subscript
Here are two vectors that we will use:
>
a <- c(rep(1:4, 3), NA, NA)
>
b <- rep(1:2, 7)
>
b[11:12] <- NA
>
a
[1] 1 2 3 4 1 2 3 4 1 2 3 4 NA NA
>
b
[1] 1 2 1 2 1 2 1 2 1 2 NA NA 1 2
We now want to create
anew
so that it is like
a
except it has
101
in the elements
where
a
is less than 2 or greater than 3, and
b
equals 1.
>
anew <- a
88
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
anew[(a < 2 | a > 3) & b == 1] <- 101
>
anew
[1] 101 2 3 4 101 2 3 4 101 2 3 4 NA NA
There were three values changed in
anew
; let’s try again but give different values
to those three:
>
anew2 <- a
>
anew2[(a < 2 | a > 3) & b == 1] <- 101:103
Error: NAs are not allowed in subscripted assignments
Now we get an error. Since the value being assigned into the vector has length
greater than 1, the assignment with missing values in the subscripts is ambigu-
ous. R wisely refuses to do it (frustrating as it may be). There is a simple
solution, however:
>
anew2[which((a < 2 | a > 3) & b == 1)] <- 101:103
>
anew2
[1] 101 2 3 4 102 2 3 4 103 2 3 4 NA NA
The
which
function effectively treats
NA
as
FALSE
.
But we still have a problem in both
anew
and
anew2
. The 12th element of
a
is 4 (and hence greater than 3) while the 12th element of
b
is
NA
. So we don’t
know if the 12th element of
anew
should be changed or not. The 12th element
of
anew
should be
NA
:
>
anew[is.na(b) & (a < 2 | a > 3)] <- NA
>
anew
[1] 101 2 3 4 101 2 3 4 101 2 3 NA NA NA
8.2.15
all missing subscripts
>
letters[c(2,3)]
[1] "b" "c"
>
letters[c(2,NA)]
[1] "b" NA
>
letters[c(NA,NA)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA NA
[25] NA NA
What is happening here is that by default
NA
is logical—that is the most specific
mode (see Circle
) so the last command is subscripting with logical values
instead of numbers. Logical subscripts are automatically replicated to be the
length of the object.
89