R文字挖掘：情感分析

首頁>技術>Codewar2021-02-13 23:50

R文字挖掘：情感分析

一、案例描述：

本案例基於python與R語言，對豆瓣某電影短評進行簡單情感分析：

實現：

（一）、基於python爬取豆瓣電影短評500條；

（爬取方法：1、使用selenium爬取；

2、複製登入後的cookies，使用requests庫爬取）;

（二）、基於R語言進行文字讀取、清洗、分詞、情感打分、視覺化；

二、實操過程：

本案例基於兩部分展開：

（一）資料獲取：

【cookies為使用者自行登入後，於google瀏覽器的netWord監督元件中獲取】

1、資料爬取程式碼如下：

（二）、基於R語言對資料進行情感打分：

實操過程中發現：繁體字對情感打分的影響、停用詞、切詞效果對語句情感得分的影響;

2、過程：資料讀取、資料清洗、詞典匯入、分詞、情感打分、詞雲圖；

2.1：資料匯入：

2.2資料清洗：

2.3詞典匯入：

2.4分詞：

2.5情感打分：

2.6繪製詞雲圖：

Wordfreq：

詞雲圖：

三、總結：

2、實操過程中發現停用詞、切詞效果對語句情感得分影響較大；

[停用詞中包含停止詞]

3、效果圖：

附：完整程式碼：

#--------------載入所需R包：library(pacman)p_load(readr,jiebaR,jiebaRD,plyr,stringr,stringi,ggplot2,wordcloud2)#-----------------步驟一：資料讀取-------------------text <- read.table("D:/a情感分析/text1.csv", dec = ",", sep = ",",stringsAsFactors = FALSE, header = TRUE,blank.lines.skip = TRUE)str(text) #檢視資料型別；#------------------步驟二：資料清洗------------------：#這裡僅僅簡單清理了下空格（包含換行符、製表符等）text$comment<- as.character(sapply(text$comment, str_replace_all, '[\\s]*', ''))#------------------步驟三：讀取情感詞典--------------:#正負詞典中包含文字和得分，負向我標記為-1，正向我標記為1.pos <- read.table("D:/a情感分析/tsinghua.positive.gb.txt",header = F,stringsAsFactors = F,strip.white = T,skip = 1,col.names = "words")pos1 <- read.table("D:/a情感分析/正面評價詞語（中文）.txt",header = F,stringsAsFactors = F,strip.white = T,skip = 1,col.names = "words")pos$weight<-1pos1$weight<-1 #對正面情感詞、評價詞打分；#合併正面情感詞、評價詞：positive<-rbind(pos,pos1)neg <- read.table("D:/a情感分析/tsinghua.negative.gb.txt",header = F,stringsAsFactors = F,strip.white = T,skip = 1,col.names = "words")neg1 <- read.table("D:/a情感分析/負面評價詞語（中文）.txt",header = F,stringsAsFactors = F,strip.white = T,skip = 1,col.names = "words")neg$weight<--1neg1$weight<--1#合併負面情感詞、評價詞：negative<-rbind(neg,neg1)#合併正、負情感詞典，賦值給mydict物件：mydict<-c(positive,negative)#-----------------------步驟四：分詞-----------------:engine<-worker(stop_word = "D:/a情感分析/chineseStopWords.txt") #設定分詞引擎；#將詞典新增進引擎new_user_word(engine, mydict$words)#分詞segwords <- llply(text$comment, segment, engine)str(segwords) #檢視分詞；#-----------------------步驟五：情感打分--------------#自定義情感函式fun <- function(x,y) x%in% ygetscore <- function(x,pwords,nwords){pos.weight = sapply(llply(x,fun,pwords),sum)neg.weight = sapply(llply(x,fun,nwords),sum)total = pos.weight - neg.weightreturn(data.frame(pos.weight,neg.weight, total))}score1 <- getscore(segwords, pos$words, neg$words)#將得分與評論合併到一起：aevalu_score1<- cbind(text, score1)#判斷得分是否大於1，賦予相應標籤：evalu.score1 <- transform(evalu_score1,emotion = ifelse(evalu_score1$total> 0, 'Pos', 'Neg'))#檢視效果:View(evalu.score1)# 計算詞頻wordfreq <- unlist(segwords)wordfreq <- as.data.frame(table(wordfreq ))wordfreq <- arrange(wordfreq , desc(Freq))#排序head(wordfreq)write.csv(wordfreq,"D:/wordart.csv")# 繪製詞雲:wordcloud2(wordfreq,size=1,shape='star')

小結

本文轉載自學習使我快樂，請支援原創！

如果你是一個大學本科生或研究生，如果你正在因為你的統計作業、資料分析、論文、報告、考試等發愁，如果你在使用SPSS,R，Python，Mplus, Excel中遇到任何問題，都可以聯絡我。因為我可以給您提供最好的，最詳細和耐心的資料分析服務。

If you are a student and you are worried about you statistical #Assignments, #Data #Analysis, #Thesis, #reports, #composing, #Quizzes, Exams.. And if you are facing problem in #SPSS, #R-Programming, #Excel, Mplus, then contact me. Because I could provide you the best services for your Data Analysis.

Are you confused with statistical Techniques like z-test, t-test, ANOVA, MANOVA, Regression, Logistic Regression, Chi-Square, Correlation, Association, SEM, multilevel model, mediation and moderation etc. for your Data Analysis...??

Then Contact Me. I will solve your Problem...

加油吧，打工人！

∨ 為什麼你總是學不會Python，入門Python的4大陷阱

熱門排行

劇多

R文字挖掘：情感分析