data.table用法解析
R語言中的data.table是一種資料處理時的高效率工具。在瞭解data.table之前先簡單說明一下管道函式的使用方法。
管道函式的使用方法
常用的管道函式有以下三種:
%in%:表示包含於
%>%:表示向右傳遞
%$%:表示向右傳遞列,並可以直接按列操作
data.table使用方法
1. 首先將資料變成data.table格式
data = data%>%
as.data.table()
2. 將資料按照以下5個變數彙總
as.data.table()%$%
.[,.(value = sum(value,na.rm = TRUE)),
by = c("period","index","area","id1","id2")]
3. 按照某類別列的變數值篩選
data = data%$%
.[index%in%c("UV","DB","XS"),]
4.改變類別列中變數名稱
.[index_name == "DB",index_name := "DB_PV"]%$%
.[index_name == "GZ",index_name := "GZ_UV"]%$%
.[index_name == "XSLZ",index_name := "XSLZ_UV"]
5. 取某列全部值的前6個字母
data = data %$%
.[,":="(id1 = substr(id1,1,6),
id2 = substr(id2,1,6))]%$%
6. 篩選列
.[,.(period,id1,id2,area,index,value)]
7. 篩選行列
N <- rank.P[ID2%in%target,.(ID2, Month, prov_ID, Province, Index_name,Value_adj_per)]
8. 變更列名稱
setnames(c("period","car_id1","car_id2","area","value"), c("Month","ID1","ID2","Province","Value"))
9. 按照某列合併表格,並篩選某些列
data = data %>%
merge(data2, by.x="Province",by.y="Province",all.x = TRUE)%$%
.[,.(Month,Province,ID,index_name,ID2,Value)]
9.1 合併後出現列名重複現象
new_data <- data%>%
merge(data2,
by.x = c("ID1","ID2","Month","prov_ID","Province","Index_name"),
by.y = c("ID2","ID1","Month","prov_ID","Province","Index_name"),
suffixes = c("_P","_N"))
10. 定義新列
new_data <- new_data[, ":="(Score = (Value_adj_per_P+Value_adj_per_N)/2)]%$%
.[, Rank := row_number(-Score),
by = c("ID2","Month","prov_ID","Province","Index_name")]%>%
merge(phase, by = "Index_name")%$%
.[, Index_name := NULL]
data.table用法解析
R語言中的data.table是一種資料處理時的高效率工具。在瞭解data.table之前先簡單說明一下管道函式的使用方法。
管道函式的使用方法
常用的管道函式有以下三種:
%in%:表示包含於
%>%:表示向右傳遞
%$%:表示向右傳遞列,並可以直接按列操作
data.table使用方法
1. 首先將資料變成data.table格式
data = data%>%
as.data.table()
2. 將資料按照以下5個變數彙總
data = data%>%
as.data.table()%$%
.[,.(value = sum(value,na.rm = TRUE)),
by = c("period","index","area","id1","id2")]
3. 按照某類別列的變數值篩選
data = data%$%
.[index%in%c("UV","DB","XS"),]
4.改變類別列中變數名稱
data = data%$%
.[index_name == "DB",index_name := "DB_PV"]%$%
.[index_name == "GZ",index_name := "GZ_UV"]%$%
.[index_name == "XSLZ",index_name := "XSLZ_UV"]
5. 取某列全部值的前6個字母
data = data %$%
.[,":="(id1 = substr(id1,1,6),
id2 = substr(id2,1,6))]%$%
.[,.(value = sum(value,na.rm = TRUE)),
by = c("period","index","area","id1","id2")]
6. 篩選列
data = data %$%
.[,.(period,id1,id2,area,index,value)]
7. 篩選行列
N <- rank.P[ID2%in%target,.(ID2, Month, prov_ID, Province, Index_name,Value_adj_per)]
8. 變更列名稱
data = data%>%
setnames(c("period","car_id1","car_id2","area","value"), c("Month","ID1","ID2","Province","Value"))
9. 按照某列合併表格,並篩選某些列
data = data %>%
merge(data2, by.x="Province",by.y="Province",all.x = TRUE)%$%
.[,.(Month,Province,ID,index_name,ID2,Value)]
9.1 合併後出現列名重複現象
new_data <- data%>%
merge(data2,
by.x = c("ID1","ID2","Month","prov_ID","Province","Index_name"),
by.y = c("ID2","ID1","Month","prov_ID","Province","Index_name"),
suffixes = c("_P","_N"))
10. 定義新列
new_data <- new_data[, ":="(Score = (Value_adj_per_P+Value_adj_per_N)/2)]%$%
.[, Rank := row_number(-Score),
by = c("ID2","Month","prov_ID","Province","Index_name")]%>%
merge(phase, by = "Index_name")%$%
.[, Index_name := NULL]