回覆列表
  • 1 # CROWN2021

    data.table用法解析


    R語言中的data.table是一種資料處理時的高效率工具。在瞭解data.table之前先簡單說明一下管道函式的使用方法。


    管道函式的使用方法


    常用的管道函式有以下三種:


    %in%:表示包含於


    %>%:表示向右傳遞


    %$%:表示向右傳遞列,並可以直接按列操作


    data.table使用方法


    1. 首先將資料變成data.table格式


    data = data%>%


    as.data.table()


    2. 將資料按照以下5個變數彙總


    data = data%>%


    as.data.table()%$%


    .[,.(value = sum(value,na.rm = TRUE)),


    by = c("period","index","area","id1","id2")]


    3. 按照某類別列的變數值篩選


    data = data%$%


    .[index%in%c("UV","DB","XS"),]


    4.改變類別列中變數名稱


    data = data%$%


    .[index_name == "DB",index_name := "DB_PV"]%$%


    .[index_name == "GZ",index_name := "GZ_UV"]%$%


    .[index_name == "XSLZ",index_name := "XSLZ_UV"]


    5. 取某列全部值的前6個字母


    data = data %$%


    .[,":="(id1 = substr(id1,1,6),


    id2 = substr(id2,1,6))]%$%


    .[,.(value = sum(value,na.rm = TRUE)),


    by = c("period","index","area","id1","id2")]


    6. 篩選列


    data = data %$%


    .[,.(period,id1,id2,area,index,value)]


    7. 篩選行列


    N <- rank.P[ID2%in%target,.(ID2, Month, prov_ID, Province, Index_name,Value_adj_per)]


    8. 變更列名稱


    data = data%>%


    setnames(c("period","car_id1","car_id2","area","value"), c("Month","ID1","ID2","Province","Value"))


    9. 按照某列合併表格,並篩選某些列


    data = data %>%


    merge(data2, by.x="Province",by.y="Province",all.x = TRUE)%$%


    .[,.(Month,Province,ID,index_name,ID2,Value)]


    9.1 合併後出現列名重複現象


    new_data <- data%>%


    merge(data2,


    by.x = c("ID1","ID2","Month","prov_ID","Province","Index_name"),


    by.y = c("ID2","ID1","Month","prov_ID","Province","Index_name"),


    suffixes = c("_P","_N"))


    10. 定義新列


    new_data <- new_data[, ":="(Score = (Value_adj_per_P+Value_adj_per_N)/2)]%$%


    .[, Rank := row_number(-Score),


    by = c("ID2","Month","prov_ID","Province","Index_name")]%>%


    merge(phase, by = "Index_name")%$%


    .[, Index_name := NULL]

  • 中秋節和大豐收的關聯?
  • 關於冬至的古詩?