By Milind Paradkar

모든 프로그래머는 불가피하게 일상 업무에 수많은 코드를 씁니다. 그러나 모든 프로그래머가 다른 사람들이 쉽게 이해할 수있는 깨끗한 코드를 작성하는 습관을 갖지는 않습니다. 그 이유 중 하나는 프로그래머가 프로그램을 작성하는 데 따르는 모범 사례에 대한 인식 부족 때문일 수 있습니다. 초보자 프로그래머에게는 특히 그렇습니다. 이 글에서는 코드 가독성, 일관성 및 반복성 향상으로 이어질 R 프로그래밍 우수 사례를 소개합니다. 한번 읽어 보세요!

 

R로 작성하는 모범 사례

1) 코드에 대한 설명

코딩을 시작할 때 R 코드가 맨 처음 행에서 무엇을하는지 설명하십시오. 후속 코드 블록의 경우 블록을 설명하는 동일한 방법을 따릅니다. 이렇게하면 다른 사람들이 코드를 이해하고 사용하기가 쉬워집니다.

예제:

# 이 코드는 52주간의 주가의 높은 효과를 포착 함
# OOO가 개발한 코드 임

2) 패키지 로드

첫 번째 줄에 코드를 설명하고 나면 라이브러리 함수를 사용하여 코드를 실행하는 데 필요한 모든 관련 R 패키지를 나열하고 로드 하십시오.

예제:

library(quantmod);  library(zoo); library(xts);
library(PerformanceAnalytics); library(timeSeries); library(lubridate);

3) 업데이트된 패키지를 사용 하세요

코딩하는 동안 최신 업데이트 된 R 패키지를 사용하고 있는지 확인하십시오. R 패키지의 버전을 확인하려면 packageVersion 함수를 사용할 수 있습니다.

예제:

packageVersion("TTR")
[1] ‘0.23.1’

4) 동일한 폴더에 모든 소스 파일 구성

사용될 모든 필요한 파일을 저장하세요. 각각의 상대 경로를 사용하여 액세스 할 수 있습니다.

예제:

# Reading file using relative path
df = read.csv(file = "NIFTY.csv", header = TRUE)

# Reading file using full path
df =  read.csv(file = "C:/Users/Documents/NIFTY.csv", header = TRUE)

5) Use a consistent style for data structure types – R programming language allows for different data structures like vectors, factors, data frames, matrices, and lists. Use a similar naming for a particular type of data structure. This will make it easy to recognize the similar data structures used in the code and to spot the problems easily.

Example:
You can name all different data frames used in your code by adding .df as the suffix.

aapl.df   = as.data.frame(read.csv(file = "AAPL.csv", header = TRUE))
amzn.df = as.data.frame(read.csv(file = "AMZN.csv", header = TRUE))
csco.df  = as.data.frame(read.csv(file = "CSCO.csv", header = TRUE))

6) Indent your code – Indentation makes your code easier to read, especially, if there are multiple nested statements like For-loop and If statement.

Example:

# Computing the Profit & Loss (PL) and the Equity
dt$PL = numeric(nrow(dt))
for (i in 1:nrow(dt)){
   if (dt$Signal[i] == 1) {dt$PL[i+1] = dt$Close[i+1] - dt$Close[i]}
   if (dt$Signal[i] == -1){dt$PL[i+1] = dt$Close[i] - dt$Close[i+1]}
}

7) Remove temporary objects – For long codes, running in thousands of lines, it is a good practice to remove temporary objects after they have served their purpose in the code. This can ensure that R does not into memory issues.

8) Time the code – You can time your code using the system.time function. You can also use the same function to find out the time taken by different blocks of code. The function returns the amount of time taken in seconds to evaluate the expression or a block of code. Timing codes will help to figure out any bottlenecks and help speed up your codes by making the necessary changes in the script.

To find the time taken for different blocks we wrapped them in curly braces within the call to the system.time function.

The two important metrics returned by the function include:
i) User time – time charged to the CPU(s) for the code
ii) Elapsed time – the amount of time elapsed to execute the code in entirety

 Example:

# Generating random numbers
system.time({
mean_1 = rnorm(1e+06, mean = 0, sd = 0.8)
})

user    system    elapsed
0.40      0.00       0.45

9) Use vectorization – Vectorization results in faster execution of codes, especially when we are dealing with large data sets. One can use statements like the ifelse statement or the with function for vectorization.

Example:
Consider the NIFTY 1-year price series. Let us find the gap opening for each day using both the methods (using for-loop and with function) and time them using the system.time function. Note the time taken to execute the for-loop versus the time to execute the with function in combination with the lagpad function.

library(quantmod)
# Using FOR Loop
system.time({
df = read.csv("NIFTY.csv")
df = df[,c(1,3:6)]
df$GapOpen = double(nrow(df))
for ( i in 2:nrow(df)) {
df$GapOpen[i] = round(Delt(df$CLOSE[i-1],df$OPEN[i])*100,2)
}
print(head(df))
})

# Using with function + lagpad, instead of FOR Loop
system.time({
df = read.csv("NIFTY.csv")
df = dt[,c(1,3:6)]

lagpad = function(x, k) {
c(rep(NA, k), x)[1 : length(x)]
}

df$PrevClose = lagpad(df$CLOSE, 1)
df$GapOpen_ = with(df, round(Delt(df$PrevClose,df$OPEN)*100,2))
print(head(df))
})

10) Folding codes – Folding codes is a way wherein the R programmer can fold a code of line or code sections. This allows hiding blocks of code whenever required, and makes it easier to navigate through lengthy codes. Code folding can be done in two ways:
i) Automatic folding of codes
ii) User-defined folding of codes

Automatic folding of codes: RStudio automatically provides the flexibility to fold the codes. When a coder writes a function or conditional blocks, RStudio automatically creates foldable codes.

User-defined folding of codes: 
One can also fold a random group of codes by using Edit -> Folding -> Collapse or by simply selecting the group of codes and pressing Alt+L key.

User-defined folding can also be done via Code Sections:
To insert a new code section you can use the Code -> Insert Section command. Alternatively, any comment line which includes at least four trailing dashes (-), equal signs (=) or pound signs (#) automatically creates a code section.

11) Review and test your code rigorously – Once your code is ready, ensure that you test it code rigorously on different input parameters. Ensure that the logic used in statements like for-loop, if statement, ifelse statement is correct. It is a nice idea to get your code reviewed by your colleague to ensure that the work is of high quality.

12) Don’t save your workspace When you want to exit R it checks if you want to save your workspace. It is advisable to not save the workspace and start in a clean workspace for your next R session. Objects from the previous R sessions can lead to errors which can be hard to debug.

These were some of the best practices of writing in R that one can follow to make your codes easy to read, debug and to ensure consistency.

 Next Step

 If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!

The post R Best Practices: R you writing the R way! appeared first on .

소스: R Best Practices: R you writing the R way! | R-bloggers