亞馬遜Alexa評論的主題模型與情感分析

首頁>科技>IT老周2021-01-03 00:01

亞馬遜Alexa評論的主題模型與情感分析

首先匯入庫：

with open('Saved Models/alexa_reviews_clean.pkl','rb') as read_file:    df = pickle.load(read_file)df['variation'].value_counts()

df=df[df.variation!='Configuration: Fire TV Stick']df['variation'].value_counts()

很好，現在讓我們將這些變化分成不同的Echo模型：Echo、Echo Dot、Echo Show、Echo Plus和Echo Spot。

# ECHO第二代 木炭織物，石南灰織物，# 砂岩織物，橡木，胡桃df['model']=np.where(df.variation.str.contains('Charcoal Fabric ') |                     df.variation.str.contains('Heather Gray Fabric ') |                     df.variation.str.contains('Sandstone Fabric ') |                     df.variation.str.contains('Oak Finish ') |                     df.variation.str.contains('Walnut Finish '),'echo',df['variation'])# ECHO DOT - 黑點，白點，黑，白df['model']=np.where(df.variation.str.contains('Black  Dot') |                    df.variation.str.contains('White  Dot') |                    df.variation.str.contains('Black') |                    df.variation.str.contains('White'), 'echo dot', df['model'])# ECHO SHOW - 黑色、白色df['model']=np.where(df.variation.str.contains('Black  Show') |                    df.variation.str.contains('White  Show'), 'echo show', df['model'])# ECHO PLUS - 黑色、白色df['model']=np.where(df.variation.str.contains('Black  Plus') |                    df.variation.str.contains('White  Plus'), 'echo plus', df['model'])# ECHO SPOT - 黑色、白色df['model']=np.where(df.variation.str.contains('Black  Spot') |                    df.variation.str.contains('White  Spot'), 'echo spot', df['model'])

接下來，我們將分離原始的df（按模型型別分組）並對生成的df使用pickle儲存，從而得到五個pickle Echo模型。

# 每種型號相似echo=df[df['model']=='echo']pickle.dump(echo,open("Saved Models/echo.pkl","wb"))

現在我們來看看不同Echo模型的視覺化效果，使用plotly。

values=df['model'].value_counts()fig = go.Figure(data=[go.Bar(x=values.index, y=values, text=values, textposition='auto')])fig.update_xaxes(title_text='Echo Models')fig.update_yaxes(title_text='Number of Models')fig.update_layout(title_text='Distribution of Echo Models')fig.show()

fig = go.Figure(data=[    go.Bar(name='echo', x=echo_values.index, y=echo_values, text=echo_values, textposition='auto'),    go.Bar(name='echo spot', x=echospot.index, y=echospot, text=echospot, textposition='auto'),    go.Bar(name='echo show', x=echoshow.index, y=echoshow, text=echoshow, textposition='auto'),    go.Bar(name='echo dot', x=echodot.index, y=echodot, text=echodot, textposition='auto'),    go.Bar(name='echo plus', x=echoplus.index, y=echoplus, text=echoplus, textposition='auto'),])fig.update_xaxes(title_text='Ratings')fig.update_yaxes(title_text='Number of Ratings')fig.update_layout(title_text='Distribution of Echo Ratings Across Models')# 更改條形圖模式fig.update_layout(barmode='group')fig.show()

# 用於計算ECHO、ECHO DOT和ECHO SHOW的情感得分的函式。def sentimentScore(sentences):    analyzer = SentimentIntensityAnalyzer()    results = []    for sentence in sentences:        vs = analyzer.polarity_scores(sentence)        print(str(vs))        results.append(vs)     return results

使用這個函式，可以計算每個評論的情緒得分，將它們放入一個空的資料框中，然後與原始資料框合併，如下所示。

# ECHOwith open('Saved Models/echo.pkl','rb') as read_file:     echo= pickle.load(read_file)echo_sent = sentimentScore(echo['new_reviews'])echo_sent_df = pd.DataFrame(echo_sent)echo.index = echo_sent_df.indexecho_sent_df['rating_1'] = echo['rating']echo_vader = pd.concat([echo, echo_sent_df], axis=1)echo_vader.head()

上面的程式碼也是針對Echo Dot和Echo Show完成的，然後將所有生成的資料幀合併為一個。

接下來使用LDA對前3個Echo模型進行了主題模型。我們使用語料庫建立了LDA模型的輸入，並訓練了LDA模型來顯示Echo、Echo Dot和Echo Show的前3個主題。

對於Echo來說，最常見的主題是：易用性、喜歡Echo播放音樂以及音質。

對於Echo Dot來說，最常見的話題是：偉大的作品、歌唱家和音樂。

對於Echo Show來說，最常見的話題是：喜歡影片，喜歡它！喜歡螢幕。

接下來，透過使用計數向量器（TFIDF），我還分析了使用者對Echo裝置的喜愛和討厭之處，並分析了這些詞對正面和負面情緒的貢獻。

neg_alexa = echo[echo['sentiment']=='negative']pos_alexa = echo[echo['sentiment']=='positive']# Echo 模型 - Negative(將neg_alexa更改為pos_alexa以獲得正面反饋)tfidf_n = TfidfVectorizer(ngram_range=(2, 2))X_tfidf_n = tfidf_n.fit_transform(neg_alexa['new_reviews'])y_n = neg_alexa['rating']chi2score_n = chi2(X_tfidf_n, y_n)[0]scores = list(zip(tfidf_n.get_feature_names(), chi2score_n))chi2_n = sorted(scores, key=lambda x:x[1])topchi2_n = list(zip(*chi2_n[-10:]))x_n=range(len(topchi2_n[1]))fig, ax = plt.subplots(figsize=(16,9))ax.barh(x_n, topchi2_n[1], align='center', alpha=1, color='salmon')plt.title('Echo Negative Feedback', fontsize=24, weight='bold')# x-軸plt.xlabel("Feature Score", fontsize=22, weight='bold')plt.xticks(fontsize=18)#y-軸labels = topchi2_n[0]plt.yticks(x_n, labels, fontsize=18)ax.spines['right'].set_visible(False)ax.spines['top'].set_visible(False)ax.spines['bottom'].set_visible(True)ax.spines['left'].set_visible(True)fig = plt.gcf()plt.show()plt.draw()

從這些圖表中我們可以看出，一些使用者認為Echo的工作非常出色，並提供了有用的響應；而對於其他使用者，Echo裝置幾乎不起作用，他們認為它的功能太多。

我們來看看在Echo Dot 和Echo Show中產生積極和消極情緒的詞語。

對於Echo Dot，我們可以看到對一些使用者來說，它是一個偉大的裝置和易於使用，而對於其他使用者，Echo Dot不播放音樂。

最後，我們看看Echo Show的結果。

使用者喜歡他們能夠打電話、使用youtube，且Echo Show非常容易使用；而對於其他使用者，Echo Show是“啞巴”，他們建議不要購買此裝置。

透過模型分析亞馬遜Alexa裝置要比作為一個整體檢查所有裝置更有洞察力。

最新評論

∧ 整治雙十一購物亂象，國家再次出手！該跟這些套路說再見了

∨ 華為“賠錢”旗艦無奈停產，曾被認為能改變行業，但結局有點慘淡

熱門排行

劇多

亞馬遜Alexa評論的主題模型與情感分析