ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 898
Скачиваний: 1
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
yy <- rnorm(12, xx)
sub4(sys.nframe())
}
>
scope4()
(Intercept)
xx
0.6303816
1.0930864
Another possibility is to change the environment of the formula, as
scope5
does:
>
sub5
function (data)
{
form5 <- eval(substitute(yy ~ xx), envir=data)
coef(lm(form5))
}
>
scope5
function ()
{
xx <- rnorm(12)
yy <- rnorm(12, xx)
sub5(sys.nframe())
}
>
scope5()
(Intercept)
xx
0.1889312 1.4208295
Some caution with solutions is warranted—not all modeling functions follow the
same scoping rules for their arguments.
8.2
Chimeras
“What brings you into such pungent sauce?”
There is no other type of object that creates as much trouble as factors.
Factors are an implementation of the idea of categorical data. (The name ’factor’
might cause trouble in itself—the term arrives to us via designed experiments.)
The core data of a factor is an integer vector. The
class
attribute is
"factor"
, and there is a
levels
attribute that is a character vector that pro-
vides the identity of each category. You may be able to see trouble coming
already—a numeric object that conceptually is not at all numeric.
But R tries to save you from yourself:
>
is.numeric(factor(1:4))
[1] FALSE
Factors can be avoided in some settings by using character data instead. Some-
times this is a reasonable idea.
80
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
Figure 8.2: The treacherous to kin and the treacherous to country by Sandro
Botticelli.
81
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.1
numeric to factor to numeric
While in general factors do not refer to numbers, they may do. In which case
we have even more room for confusion.
>
as.numeric(factor(101:103))
[1] 1 2 3
If you were expecting:
[1] 101 102 103
shame on you.
If your factor represents numbers and you want to recover those numbers
from the factor, then you need a more circuitous route to get there:
as.numeric(as.character(factor(101:103)))
Slightly more efficient, but harder to remember is:
as.numeric(levels(f))[f]
where
f
is the factor.
8.2.2
cat factor
Using
cat
on any factor will just give the core data:
>
cat(factor(letters[1:5]))
1 2 3 4 5>
8.2.3
numeric to factor accidentally
When using
read.table
or its friends, it is all too common for a column of data
that is meant to be numeric to be read as a factor. This happens if
na.strings
is not properly set, if there is a bogus entry in the column, and probably many
other circumstances.
This is dynamite.
The data are thought to be numeric. They are in fact numeric (at least sort
of), but decidedly not with the numbers that are intended. Hence you can end
up with data that ’works’ but produces complete garbage.
When processing the data, the construct:
as.numeric(as.character(x))
guards you against this occurring. If
x
is already the correct numbers, then
nothing happens except wasting a few microseconds. If
x
is accidentally a
factor, then it becomes the correct numbers (at least mostly—depending on
why it became a factor there may be some erroneously missing values).
82
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
8.2.4
dropping factor levels
>
ff <- factor(c(’AA’, ’BA’, ’CA’))
>
ff
[1] AA BA CA
Levels: AA BA CA
>
ff[1:2]
[1] AA BA
Levels: AA BA CA
Notice that there are still three levels even though only two appear in the vector.
It is in general a good thing that levels are not automatically dropped—the
factor then has the possible levels it can contain rather than merely the levels
it happens to contain.
There are times when you want levels dropped that do not appear. Here are
two ways of doing that:
>
ff[1:2, drop=TRUE]
[1] AA BA
Levels: AA BA
>
factor(ff[1:2])
[1] AA BA
Levels: AA BA
If
f0
is a factor that already has levels that are not used that you want to drop,
then you can just do:
f0 <- f0[drop=TRUE]
8.2.5
combining levels
Bizarre things have been known to happen from combining levels. A safe ap-
proach is to create a new factor object. Here we change from individual letters
to a vowel-consonant classification:
>
flet <- factor(letters[c(1:5, 1:2)])
>
flet
[1] a b c d e a b
Levels: a b c d e
>
ftrans <- c(a=’vowel’, b=’consonant’, c=’consonant’,
+
d=’consonant’, e=’vowel’)
>
fcv <- factor(ftrans[as.character(flet)])
>
fcv
[1] vowel consonant consonant consonant vowel vowel consonant
Levels: consonant vowel
Probably more common is to combine some levels, but leave others alone:
83
8.2. CHIMERAS
CIRCLE 8. BELIEVING IT DOES AS INTENDED
>
llet <- levels(flet)
>
names(llet) <- llet
>
llet
a
b
c
d
e
"a" "b" "c" "d" "e"
>
llet[c(’a’, ’b’)] <- ’ab’
>
llet
a
b
c
d
e
"ab" "ab"
"c"
"d"
"e"
>
fcom <- factor(llet[as.character(flet)])
>
fcom
[1] ab ab c
d
e ab ab
Levels: ab c d e
8.2.6
do not subscript with factors
>
x6 <- c(s=4, j=55, f=888)
>
x6[c(’s’, ’f’)]
s f
4 888
>
x6[factor(c(’s’, ’f’))]
j s
55 4
8.2.7
no go for factors in ifelse
>
ifelse(c(TRUE, FALSE, TRUE), factor(letters),
+
factor(LETTERS))
[1] 1 2 3
>
ifelse(c(TRUE, FALSE, TRUE), factor(letters), LETTERS)
[1] "1" "B" "3"
(Recall that the length of the output of
ifelse
is always the length of the
first argument. If you were expecting the first argument to be replicated, you
shouldn’t have.)
8.2.8
no c for factors
c(myfac1, myfac2)
just gives you the combined vector of integer codes. Certainly a method for
c
could be written for factors, but note it is going to be complicated—the levels
of the factors need not match. It would be horribly messy for very little gain.
This is a case in which R is not being overly helpful. Better is for you to do the
combination that makes sense for the specific case at hand.
Another reason why there is not a
c
function for factors is that
c
is used in
other contexts to simplify objects:
84