R读取大文件速度太慢？来试试能提速两千倍的fread --实验盒

R 的 read.table 和 read.csv 读取文件速度比较慢。尤其在读取稍微大一点的数据，需要等很长时间。

在需要读取大文件时，尤其读取列数特别多的文件，不妨试试 data.table 包（https://cran.r-project.org/web/packages/data.table）的 fread (Fast and friendly file finagler)。它的参数与 read.table 函数类似，但读取速度有非常大提升。

提速两千倍并不是标题党，而是在一个 489 行、1079796 列、1G 纯文本文件中的实测结果。测试机器配置为 2T 内存、80 核 160 线程 CPU（四路Xeon Gold 6248）、SSD 硬盘（RAID 5）。

使用 read.table 读取文件：

times.start <- Sys.time()
file.readtable <- read.table('test.file', sep = ' ', header = TRUE, row.names = 1)
time.end <- Sys.time()
time.running <- time.end-time.start
print(time.running)

读取速度非常慢，竟然花了 20.87 小时，我也懒得去研究是什么原因：

Time difference of 20.87034 hours

使用 fread 读取文件：

library("data.table")
time.start <- Sys.time()
file.fread <- fread('test.file', sep = ' ', header = TRUE)
time.end <- Sys.time()
time.running <- time.end-time.start
print(time.running)

需要 35.71 秒，还可以接受：