Combine CSV files in RStudio in 5 Easy Steps

Combine CSV files in RStudio in 5 Easy Steps

Unfortunately, with the latest update to RStudio, RWordPress has not yet been updated. In the meantime, manually posting RStudio code on how to easily combine two files in R in a few easy steps:

  • Step 1:  Read in the first file, update the data types for several columns
  • Change NA’s to 0
  • Delete extra columns
  • Step 2:  Read in the second file
  • Step 3:  Merge the two files
  • Delete extra columns with some reformatting
  • Step 4:  Add a new column to calculate number of years between two dates
  • Step 5:  Export file to csv and/or Excel with the today’s date

Code from RStudio:

title: “R Notebook – combine files”
output: html_document:

“`{r}
pkgs <- c(‘tidyverse’,  ‘dplyr’, ‘caret’, ‘psych’ ,’readxl’, ‘scales’, ‘psych’, ‘dplyr’, ‘plyr’, ‘tidyverse’)
invisible(lapply(pkgs, require, character.only = T))

Step 1:  Read in csv file

df <- read.csv(“D:/expiringnames/promo_analysis/trx_to_process/fashion_2019/fashion_trx_2019.csv”, header=TRUE, stringsAsFactors=FALSE)
view(df)

#df$Transaction <- as.Date(df$Transaction, “%Y/%m/%d %H%M”)
df <- df %>% mutate_at(vars(OrigAmt, NetAmt, Discounts, RevShare, RevShare.1, SubTotal), as.numeric)
df <- df %>% mutate_at(vars(Effective, Expiration, Billable, Transaction, InvoiceDate), as.Date, format=”%Y-%m-%d”)
tidyr::replace_na(df, list(OrigAmt=0, NetAmt=0))

#change NAs to 0
df$OrigAmt[is.na(df$OrigAmt)] <- 0
df$Discounts[is.na(df$Discounts)] <- 0
df$SubTotal[is.na(df$SubTotal)] <- 0
df$RevShare[is.na(df$RevShare)] <- 0
df$RevShare.1[is.na(df$RevShare.1)] <- 0

str(df)
#combine two columns into one
df$RevShare <- df$RevShare + df$RevShare.1

#remove extra columns
df <- subset(df, select = -c(REGREB, RWLREB, RWAREB, CVC, RevShare.1))
tail(df[,1] ) #show tail in 1st column

combine csv files in R

File to combine csv file

Step 2: Read in the file to append

df2 <- read.csv(“D:/names/analysis/trx_2019.csv”, header=TRUE, stringsAsFactors=FALSE)
view(df2)

#remove extra columns
df2 <- subset(df2, select = -c(orig_amt, discounts, NetAmt))

#check type of strings and change types as necessary
str(df2[3:25])

Step 3: Merge the two datasets – horizontally/columns

df3 <- cbind(df, df2)
str(df3)

#eliminate columns not needed
df4 <- subset(df3, select = -c(Invoice,  Billable, InvoiceDate, possible_sp_error, WordsPresent, WordsNotPresent, Status))
view(df4)

#reformat percentage values
library (‘scales’)
scales::percent(df4$Percentage/100)
dim(df4)

–> result [1] 8612 35  (reflecting the addition of the columns as originally started w/ 25)

Step 4: Calculate years

#To determine the length of the service period – subtract two dates & put into years

df4$term <- as.numeric(difftime(df4$Expiration, df4$Effective), unit=”weeks”)/52.25
df4$term

library (lubridate, plyr)
int <- interval(df4$Effective, df4$Expiration)
time_length (int, “year”) #exact
df4$term <- trunc(time_length(int, “year”)) #truncated version to remove decimal places due to small portions i.e. leap years
df4$term

calc years in R

Calc years in R

Step 5: Export file with current date

date <- Sys.Date()

#export as .csv
csvfilename <- paste(“D:/df4_file_”, date,”.csv”,sep=””)
write.csv(df4, file=csvfilename, row.names=FALSE) #row.names being false avoids extra column ref

#export as a excel
library (openxlsx)
xlfilename <- paste(“D:/df4_file_”, date,”.xlsx”)
write.xlsx (df4, file=xlfilename, row.names=FALSE)

 

Want to combine csv files in Python instead?  See 

date <- Sys.Date()

#export as .csv
csvfilename <- paste(“D:/df4_file_”, date,”.csv”,sep=””)
write.csv(df4, file=csvfilename, row.names=FALSE) #row.names being false avoids extra column ref

#export as a excel
library (openxlsx)
xlfilename <- paste(“D:/df4_file_”, date,”.xlsx”)
write.xlsx (df4, file=xlfilename, row.names=FALSE)

 

Want to combine csv files in Python instead?  See 

Leave a Reply

Your email address will not be published. Required fields are marked *