ВУЗ: Не указан
Категория: Не указан
Дисциплина: Не указана
Добавлен: 06.04.2021
Просмотров: 910
Скачиваний: 1
5.1. ABSTRACTION
CIRCLE 5. NOT WRITING FUNCTIONS
usage.
2. an object with no attributes (except possibly
names
). This is the definition
implied by
is.vector
and
as.vector
.
3. an object that can have an arbitrary length (includes lists).
Clearly definitions 1 and 3 are contradictory, but which meaning is implied
should be clear from the context. When the discussion is of vectors as opposed
to matrices, it is definition 2 that is implied.
The word “list” has a technical meaning in R—this is an object of arbitrary
length that can have components of different types, including lists. Sometimes
the word is used in a non-technical sense, as in “search list” or “argument list”.
Not all functions are created equal. They can be conveniently put into three
types.
There are anonymous functions as in:
apply(x, 2, function(z) mean(z[z > 0]))
The function given as the third argument to
apply
is so transient that we don’t
even give it a name.
There are functions that are useful only for one particular project. These
are your one-off functions.
Finally there are functions that are persistently valuable. Some of these
could well be one-off functions that you have rewritten to be more abstract.
You will most likely want a file or package containing your persistently useful
functions.
In the example of an anonymous function we saw that a function can be an
argument to another function. In R, functions are objects just as vectors or
matrices are objects. You are allowed to think of functions as data.
A whole new level of abstraction is a function that returns a function. The
empirical cumulative distribution function is an example:
>
mycumfun <- ecdf(rnorm(10))
>
mycumfun(0)
[1] 0.4
Once you write a function that returns a function, you will be forever immune
to this Circle.
In Circle 2 (page
) we briefly met
do.call
. Some people are quite confused
by
do.call
. That is both unnecessary and unfortunate—it is actually quite
simple and is very powerful. Normally a function is called by following the
name of the function with an argument list:
sample(x=10, size=5)
30
5.1. ABSTRACTION
CIRCLE 5. NOT WRITING FUNCTIONS
The
do.call
function allows you to provide the arguments as an actual list:
do.call("sample", list(x=10, size=5))
Simple.
At times it is useful to have an image of what happens when you call a
function. An environment is created by the function call, and an environment
is created for each function that is called by that function. So there is a stack
of environments that grows and shrinks as the computation proceeds.
Let’s define some functions:
ftop <- function(x)
{
# time 1
x1 <- f1(x)
# time 5
ans.top <- f2(x1)
# time 9
ans.top
}
f1 <- function(x)
{
# time 2
ans1 <- f1.1(x)
# time 4
ans1
}
f2 <- function(x)
{
# time 6
ans2 <- f2.1(x)
# time 8
ans2
}
And now let’s do a call:
# time 0
ftop(myx)
# time 10
Figure
shows how the stack of environments for this call changes through
time. Note that there is an
x
in the environments for
ftop
,
f1
and
f2
. The
x
in
ftop
is what we call
myx
(or possibly a copy of it) as is the
x
in
f1
. But the
x
in
f2
is something different.
When we discuss debugging, we’ll be looking at this stack at a specific point
in time. For instance, if an error occurred in
f2.1
, then we would be looking at
the state of the stack somewhere near time 7.
31
5.2. SIMPLICITY
CIRCLE 5. NOT WRITING FUNCTIONS
Figure 5.1: Stack of environments through time.
Time
Environment
0
2
4
6
8
10
1
2
3
ftop
f1
f2
f1.1
f2.1
R is a language rich in objects. That is a part of its strength. Some of
those objects are elements of the language itself—calls, expressions and so on.
This allows a very powerful form of abstraction often called computing on the
language. While messing with language elements seems extraordinarily esoteric
to almost all new users, a lot of people moderate that view.
5.2
Simplicity
Make your functions as simple as possible. Simple has many advantages:
•
Simple functions are likely to be human efficient: they will be easy to
understand and to modify.
•
Simple functions are likely to be computer efficient.
•
Simple functions are less likely to be buggy, and bugs will be easier to fix.
•
(Perhaps ironically) simple functions may be more general—thinking about
the heart of the matter often broadens the application.
If your solution seems overly complex for the task, it probably is. There may
be simple problems for which R does not have a simple solution, but they are
rare.
Here are a few possibilities for simplifying:
•
Don’t use a list when an atomic vector will do.
32
5.3. CONSISTENCY
CIRCLE 5. NOT WRITING FUNCTIONS
•
Don’t use a data frame when a matrix will do.
•
Don’t try to use an atomic vector when a list is needed.
•
Don’t try to use a matrix when a data frame is needed.
Properly formatting your functions when you write them should be standard
practice. Here “proper” includes indenting based on the logical structure, and
putting spaces between operators. Circle
shows that there is a particularly
good reason to put spaces around logical operators.
A semicolon can be used to mark the separation of two R commands that
are placed on the same line. Some people like to put semicolons at the end of
all lines. This highly annoys many seasoned R users. Such a reaction seems to
be more visceral than logical, but there is some logic to it:
•
The superfluous semicolons create some (imperceptible) inefficiency.
•
The superfluous semicolons give the false impression that they are doing
something.
One reason to seek simplicity is speed. The
Rprof
function is a very convenient
means of exploring which functions are using the most time in your function
calls. (The name
Rprof
refers to time profiling.)
5.3
Consistency
Consistency is good. Consistency reduces the work that your users need to
expend. Consistency reduces bugs.
One form of consistency is the order and names of function arguments. Sur-
prising your users is not a good idea—even if the universe of your users is of
size 1.
A rather nice piece of consistency is always giving the correct answer. In
order for that to happen the inputs need to be suitable. To insure that, the
function needs to check inputs, and possibly intermediate results. The tools for
this job include
if
,
stop
and
stopifnot
.
Sometimes an occurrence is suspicious but not necessarily wrong. In this
case a warning is appropriate. A warning produces a message but does not
interrupt the computation.
There is a problem with warnings. No one reads them. People have to read
error messages because no food pellet falls into the tray after they push the
button. With a warning the machine merely beeps at them but they still get
their food pellet. Never mind that it might be poison.
The appropriate reaction to a warning message is:
1. Figure out what the warning is saying.
33
5.3. CONSISTENCY
CIRCLE 5. NOT WRITING FUNCTIONS
2. Figure out why the warning is triggered.
3. Figure out the effect on the results of the computation (via deduction or
experimentation).
4. Given the result of step 3, decide whether or not the results will be erro-
neous.
You want there to be a minimal amount of warning messages in order to increase
the probability that the messages that are there will be read. If you have a
complex function where a large number of suspicious situations is possible, you
might consider providing the ability to turn off some warning messages. Without
such a system the user may be expecting a number of warning messages and
hence miss messages that are unexpected and important.
The
suppressWarnings
function allows you to suppress warnings from spe-
cific commands:
>
log(c(3, -1))
[1] 1.098612 NaN
Warning message:
In log(c(3, -1)) : NaNs produced
>
suppressWarnings(log(c(3, -1)))
[1] 1.098612 NaN
We want our functions to be correct. Not all functions
are
correct. The results
from specific calls can be put into 4 categories:
1. Correct.
2. An error occurs that is clearly identified.
3. An obscure error occurs.
4. An incorrect value is returned.
We like category 1. Category 2 is the right behavior if the inputs do not make
sense, but not if the inputs are sensible. Category 3 is an unpleasant place for
your users, and possibly for you if the users have access to you. Category 4 is
by far the worst place to be—the user has no reason to believe that anything is
wrong. Steer clear of category 4.
You should consistently write a help file for each of your persistent functions.
If you have a hard time explaining the inputs and/or outputs of the function,
then you should change the function. Writing a good help file is an excellent
way of debugging the function. The
prompt
function will produce a template
for your help file.
An example is worth a thousand words, so include examples in your help
files. Good examples are gold, but any example is much better than none. Using
data from the
datasets
package allows your users to run the examples easily.
34