本文为原创文章,未经本人允许,禁止转载。转载请注明出处。
1.留出法
“留出法”详解请见:链接。
👉引用数据与建立模型:
1
2
3
4
5
6
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
X = iris.data
y = iris.target
👉建立训练与测试数据集:
1
2
3
4
5
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.33, random_state=123)
clf = DecisionTreeClassifier()
clf.fit(train_X, train_y)
👉产生准确度:
1
2
3
4
from sklearn.metrics import accuracy_score
predicted = clf.predict(test_X)
accuracy_score(test_y, predicted)
👉建立混淆矩阵:
1
2
3
from sklearn.metrics import confusion_matrix
m = confusion_matrix(test_y, predicted)
2.交叉验证法
“交叉验证法”详解请见:链接。
1
2
3
4
5
6
7
8
9
from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
for train, test in kf.split(X):
train_X, test_X, train_y, test_y = X[train], X[test], y[train], y[test]
clf = DecisionTreeClassifier()
clf.fit(train_X, train_y)
predicted = clf.predict(test_X)
print(accuracy_score(test_y, predicted))
1
2
3
4
5
6
7
8
9
10
1.0
1.0
1.0
0.9333333333333333
0.9333333333333333
0.8666666666666667
1.0
0.8666666666666667
0.8
1.0
另一种方法:
1
2
3
4
5
from sklearn.model_selection import cross_val_score
acc = cross_val_score(clf, X=iris.data, y=iris.target, cv=10)
print(acc)
print(acc.mean())
1
2
3
[1. 0.93333333 1. 0.93333333 0.93333333 0.86666667
0.93333333 0.93333333 1. 1. ]
0.9533333333333334
3.留一法
“留一法”详解请见:链接。
1
2
3
4
5
6
7
8
9
10
11
from sklearn.model_selection import LeaveOneOut
res = []
loo = LeaveOneOut()
for train, test in loo.split(X):
train_X, test_X, train_y, test_y = X[train], X[test], y[train], y[test]
clf = DecisionTreeClassifier()
clf.fit(train_X, train_y)
predicted = clf.predict(test_X)
res.extend((predicted == test_y).tolist())
print(sum(res)) #143