再⻅Excel，你好Pandas！

首頁>科技>資料分析入門詳解2021-01-11 21:40

資料的寫⼊:

寫⼊資料時,要注意不同的⽂件格式選⽤不同的⽅法,如寫⼊csv⽂件使⽤to_csv,寫⼊ excel時使⽤to_excel,並且要注意新增編碼⽅式,下⾯建立⼀個表:

from pandas import Series,DataFrame

# 使用字典建立

index_list['001','002','003','004','005','006','007','008','009','010']

name_list = ['李白','王昭君','諸葛亮','狄仁傑','孫尚香','妲己','周瑜','張飛','王昭君','大

喬']

age_list=[25,28,27,25,30,29,25,32,28,26]

salary_list=['10k','12.5k','20k','14k','12k','17k','18k','21k','22k','21.5k']

marital_list = ['NO','NO','YES','YES','NO','NO','NO','YES','NO','YES']

dic={

'姓名': Series(data=name_list,index=index_list),

'年齡': Series(data=age_list,index=index_list),

'薪資': Series(data=salary_list,index=index_list),

'婚姻狀況': Series(data=marital_list,index=index_list)

}

df=DataFrame(dic)

# 寫入csv，path_or_buf為寫入文字檔案

df.to_csv(path_or_buf='./People_Information.csv',

encoding='utf_8_sig',index=False)

print('end')

這⾥調⽤to_csv⽅法寫⼊資料,可以指定路徑,引數encoding是指定編碼⽅式,這樣遇到中⽂不易出現亂碼,引數index=False是為了去除掉⾏索引,不然⾏索引1,2,3,4等也會放到表⾥

資料的讀取:

讀取資料時,不同的⽂件格式使⽤的⽅法也不⼀樣, 讀取csv使⽤read_csv,excel使⽤ read_excel,並且可以指定⽂件進⾏讀,另外⼀個Excel⽂件可以建立多個表，然後在不同的表中儲存不同資料，這種形式的⽂件很常⻅。但是要注意csv⽂件不存在多個sheet的問題。

如: import pandas as pd

#sheet_name指定讀取⼯作鋪中的那個sheet(sheet名稱)

sheet1 = pd.read_excel('./data/sheet.xlsx',sheet_name='sheet1')

print(sheet1.head())

sheet2 = pd.read_excel('./data/sheet.xlsx',sheet_name='sheet2')

print(sheet2.head())

當csv或者excel中資料的第⼀⾏是⼀條髒資料,可以利⽤read_excel()中的header引數進

⾏選擇哪⼀⾏作為我們的列索引。如:

import pandas as pd

#這裡將header設定為1(第一行是0),代表資料將從第2行開始讀取,第一行的資料會被

忽略

people = pd.read_csv('./data/People1.csv',header = 1)

print(people.head())

read_excel()的header引數預設是0，取第⼀⾏的值，可以根據具體的要求設定header的值來確定列索引。

如果都不滿⾜的你的要求，可以將header設定為None，列索引值會使⽤預設的1、2、 3、4，之後在⾃⾏設定。

當指定了header的值，讀出來的資料就是從該⾏開始向下切⽚，該⾏以上的資料會被忽略。

最新評論

∧ 整治雙十一購物亂象，國家再次出手！該跟這些套路說再見了

熱門排行