Programming language is like any other kind of language. You use it oftenly, then you are good at it. You do not use it recently, you forget it. I have been in a pain when I am facing a new type of data to read for different purpose. So I am going to collect as more as scinarios here and share them with other people. Well, of course, now basically no one knows this website. This will not be a comprehensive review of all the arguments in every function. Instead, it will be a ready-to-use code for different situations.
Let’s begin!
- Use base R to read everything from a csv file. Say we have this training.csv file. Well, it does not have header.
read.csv("training.csv", header = FALSE) #the default header is TRUE for read.csv
Play with row.names and col.names. If a single number is given, then use that column or row to be rownames or colnames. If a vector is given, then they will be the new names. If using them together, like the case below, you still need to give a extra colname, but it will not show.
read.csv("training.csv", header = FALSE, row.names = 1, col.names = c("c1","c2","c3", "c4", "c5", "c6"))
- For excel files, xls or xlsx, I am used to use readxl package.
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max))
library(readxl)
read_excel("trainingxls.xls", col_names = FALSE)
colnames will be “X1, X2…”. Specify the range to read,
xx = read_excel("trainingxls.xls", col_names = FALSE, range = cell_rows(1:4))
xx = read_excel("trainingxls.xls", col_names = FALSE, range = "C1:E7")
xx = read_excel("trainingxls.xls", col_names = FALSE, range = "R1C2:R2C5")
xx = read_excel("trainingxls.xls", col_names = FALSE, range = cell_cols("B:D"))
read_xls and read_xlsx are the same.