具體包括的演算法如下:
Model 1 Angle-based Outlier Detector (ABOD)
Model 2 Cluster-based Local Outlier Factor (CBLOF)
Model 3 Feature Bagging
Model 4 Histogram-base Outlier Detection (HBOS)
Model 5 Isolation Forest
Model 6 K Nearest Neighbors (KNN)
Model 7 Average KNN
Model 8 Median KNN
Model 9 Local Outlier Factor (LOF)
Model 10 Minimum Covariance Determinant (MCD)
Model 11 One-class SVM (OCSVM)
Model 12 Principal Component Analysis (PCA)
這些演算法主要都是無監督的方式來實現的異常離群點值檢測的方法。
同時也提供了對所有演算法的比較:
其核心程式碼如下:
for i, (clf_name, clf) in enumerate(classifiers.items()):
print()
print(i + 1, "fitting", clf_name)
# fit the data and tag outliers
clf.fit(X)
scores_pred = clf.decision_function(X) * -1
y_pred = clf.predict(X)
threshold = stats.scoreatpercentile(scores_pred,
100 * outliers_fraction)
n_errors = (y_pred != ground_truth).sum()
# plot the levels lines and the points
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1
Z = Z.reshape(xx.shape)
subplot = plt.subplot(3, 4, i + 1)
subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
cmap=plt.cm.Blues_r)
a = subplot.contour(xx, yy, Z, levels=[threshold],
linewidths=2, colors="red")
subplot.contourf(xx, yy, Z, levels=[threshold, Z.max()],
colors="orange")
b = subplot.scatter(X[:-n_outliers, 0], X[:-n_outliers, 1], c="white",
s=20, edgecolor="k")
c = subplot.scatter(X[-n_outliers:, 0], X[-n_outliers:, 1], c="black",
subplot.axis("tight")
subplot.legend(
[a.collections[0], b, c],
["learned decision function", "true inliers", "true outliers"],
prop=matplotlib.font_manager.FontProperties(size=10),
loc="lower right")
subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name, n_errors))
subplot.set_xlim((-7, 7))
subplot.set_ylim((-7, 7))
具體包括的演算法如下:
Model 1 Angle-based Outlier Detector (ABOD)
Model 2 Cluster-based Local Outlier Factor (CBLOF)
Model 3 Feature Bagging
Model 4 Histogram-base Outlier Detection (HBOS)
Model 5 Isolation Forest
Model 6 K Nearest Neighbors (KNN)
Model 7 Average KNN
Model 8 Median KNN
Model 9 Local Outlier Factor (LOF)
Model 10 Minimum Covariance Determinant (MCD)
Model 11 One-class SVM (OCSVM)
Model 12 Principal Component Analysis (PCA)
這些演算法主要都是無監督的方式來實現的異常離群點值檢測的方法。
同時也提供了對所有演算法的比較:
其核心程式碼如下:
for i, (clf_name, clf) in enumerate(classifiers.items()):
print()
print(i + 1, "fitting", clf_name)
# fit the data and tag outliers
clf.fit(X)
scores_pred = clf.decision_function(X) * -1
y_pred = clf.predict(X)
threshold = stats.scoreatpercentile(scores_pred,
100 * outliers_fraction)
n_errors = (y_pred != ground_truth).sum()
# plot the levels lines and the points
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1
Z = Z.reshape(xx.shape)
subplot = plt.subplot(3, 4, i + 1)
subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
cmap=plt.cm.Blues_r)
a = subplot.contour(xx, yy, Z, levels=[threshold],
linewidths=2, colors="red")
subplot.contourf(xx, yy, Z, levels=[threshold, Z.max()],
colors="orange")
b = subplot.scatter(X[:-n_outliers, 0], X[:-n_outliers, 1], c="white",
s=20, edgecolor="k")
c = subplot.scatter(X[-n_outliers:, 0], X[-n_outliers:, 1], c="black",
s=20, edgecolor="k")
subplot.axis("tight")
subplot.legend(
[a.collections[0], b, c],
["learned decision function", "true inliers", "true outliers"],
prop=matplotlib.font_manager.FontProperties(size=10),
loc="lower right")
subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name, n_errors))
subplot.set_xlim((-7, 7))
subplot.set_ylim((-7, 7))