Five useful tips that you can use to effectively improve your R code, from using seq() to create sequences to ditching which() and much more!

@drsimonj here with five simple tricks I find myself sharing all the time with fellow R users to improve their code!

1. More fun to sequence from 1

Next time you use the colon operator to create a sequence from 1 like  1:n, try  seq().

The colon operator can produce unexpected results that can create all sorts of problems without you noticing! Take a look at what happens when you want to sequence the length of an empty vector:

You’ll also notice that this saves you from using functions like  length(). When applied to an object of a certain length,  seq() will automatically create a sequence from 1 to the length of the object.

2.  vector() what you  c()

Next time you create an empty vector with  c(), try to replace it with  vector("type", length).

Doing this improves memory usage and increases speed! You often know upfront what type of values will go into a vector, and how long the vector will be. Using  c() means R has to slowly work both of these things out. So help give it a boost with  vector()!

A good example of this value is in a for loop. People often write loops by declaring an empty vector and growing it with  c() like this:

Instead, pre-define the type and length with  vector(), and reference positions by index, like this:

Here’s a quick speed comparison:

That should be convincing enough!

3. Ditch the  which()

Next time you use  which(), try to ditch it! People often use  which() to get indices from some boolean condition, and then select values at those indices. This is not necessary.

Getting vector elements greater than 5:

Or counting number of values greater than 5:

Why should you ditch  which()? It’s often unnecessary and boolean vectors are all you need.

For example, R lets you select elements flagged as  TRUE in a boolean vector:

Also, when combined with  sum() or  mean(), boolean vectors can be used to get the count or proportion of values meeting a condition:

which() tells you the indices of TRUE values:

And while the results are not wrong, it’s just not necessary. For example, I often see people combining  which() and  length() to test whether any or all values are TRUE. Instead, you just need  any() or  all():

Oh, and it saves you a little time…

4.  factor that factor!

Ever removed values from a factor and found you’re stuck with old levels that don’t exist anymore? I see all sorts of creative ways to deal with this. The simplest solution is often just to wrap it in  factor() again.

This example creates a factor with four levels ( "a""b""c" and  "d"):

R tips

If you drop all cases of one level ( "d"), the level is still recorded in the factor:

A super simple method for removing it is to use  factor() again:

This is typically a good solution to a problem that gets a lot of people mad. So save yourself a headache and  factor that factor!

5. First you get the  $, then you get the power

Next time you want to extract values from a  data.frame column where the rows meet a condition, specify the column with  $ before the rows with  [.

Say you want the horsepower ( hp) for cars with 4 cylinders ( cyl), using the  mtcars data set. You can write either of these:

The tip here is to use the second approach.

But why is that?

First reason: do away with that pesky comma! When you specify rows before the column, you need to remember the comma:  mtcars[mtcars$cyl == 4, ]$hp. When you specify column first, this means that you’re now referring to a vector, and don’t need the comma!

Second reason: speed! Let’s test it out on a larger data frame:

Worth it, right?

Still, if you want to hone your skills as an R data frame ninja, I suggest learning  dplyr. You can get a good overview on the  dplyr website or really learn the ropes with online courses like DataCamp’s Data Manipulation in R with  dplyr.

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

소스: Five Tips to Improve Your R Code (article) – DataCamp