Categories
Mastering Development

Convert Year, Julien, & Time to POSIXct from multiple columns quickly and efficiently in base R

I have a lot of large dataframes (+500,000 rows) that come to me with datetime information stored across multiple columns. Instead of a MM/DD/YYYY format it has Year in one column, Julien calendar day in the next, and time in a third. The data is structured like this:

df<-data.frame(YEAR = sample(2000:2020,10000, replace=T), 
           JULIEN = sample(1:365,10000,replace=T),
           Time = sample(0:59,10000,replace = T),
           dataVar1 = runif(10000,1.0,10.0),
           dataVar2 = runif(10000,20.0,100.0))

So far I have been getting by with this:

timeR<-vector()
for (i in 1:dim(df)[1]){
currentTime<-paste(as.Date(df$JULIEN[i], origin=paste(df$YEAR[i]-1,"-12-31", sep = "")),formatC(df$Time[i], width = 4, format = "d", flag = "0"))
timeR<-c(timeR,currentTime)
}
df<-cbind(timeR,df[, ! names(df) %in% c("YEAR","JULIEN","Time")])
df$timeR<-as.POSIXct(df$timeR,format = "%Y-%m-%d %H%M", tz = "EST")
rm(timeR,i,currentTime)

but it takes a hefty amount of time. Any ideas on how I could make this run quicker? Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *