SciKit-Learn

python - 使用多个分类器时 - 如何衡量集成的性能？ [SciKit学习]

我有一个分类问题(预测一个序列是否属于一个类)，为此我决定使用多种分类方法，以帮助过滤掉误报。(问题在于生物信息学-将蛋白质序列分类为神经肽前体序列。Here'stheoriginalarticle如果有人感兴趣，andthecodeusedtogeneratefeaturesandtotrainasinglepredictor)。现在，分类器具有大致相似的性能指标(10倍CV的训练集上的准确度/精度等为83-94%)，因此我的“天真”方法是简单地使用多个分类器(随机森林，ExtraTrees,SVM(Linearkernel),SVM(RBFkernel)andGRB)，并使用简单多

python - 有没有更快的运行 GridsearchCV 的方法

我正在为sklearn中的SVC优化一些参数，这里最大的问题是在我尝试任何其他参数范围之前必须等待30分钟。更糟糕的是，我想在同一范围内尝试更多的c和gamma值(这样我可以创建更平滑的曲面图)但我知道它会花费越来越长的时间......当我今天运行它时我将cache_size从200更改为600(实际上并不知道它的作用)以查看它是否有所作为。时间减少了大约一分钟。我能帮上忙吗？还是我只需要处理很长时间？clf=svm.SVC(kernel="rbf",probability=True,cache_size=600)gamma_range=[1e-7,1e-6,1e-5,1e-4,1e-

GridsearchCV python code section 1e time scikit-learn svc grid-search

python - 如何在 sklearn 中使用 OneHotEncoder 的输出？

我有一个带有2个分类变量、ID变量和一个目标变量(用于分类)的PandasDataframe。我设法用OneHotEncoder转换分类值。这导致稀疏矩阵。ohe=OneHotEncoder()#FirstIremappedthestringvaluesinthecategoricalvariablestointegersasOneHotEncoderneedsintegersasinput...remappingcode...ohe.fit(df[['col_a','col_b']])ohe.transform(df[['col_a','col_b']])但我不知道如何在Decisi

OneHotEncoder 何在 39 code section python pandas scikit-learn classification one-hot-encoding

python - 如何在 scikit-learn 的 LogisticRegressionCV 调用中将参数传递给评分函数

问题我正在尝试使用scikit-learn的LogisticRegressionCV与roc_auc_score作为评分指标。fromsklearn.linear_modelimportLogisticRegressionfromsklearn.metricsimportroc_auc_scoreclf=LogisticRegressionCV(scoring=roc_auc_score)但是当我尝试拟合模型时(clf.fit(X,y))，它会抛出一个错误。ValueError:averagehastobeoneof(None,'micro','macro','weighted','s

LogisticRegressionCV 何在 code roc_auc_score python function arguments scikit-learn scoring

python - 如何在 Scikit 中计算多类分类的混淆矩阵？

我有一个多类分类任务。当我基于scikitexample运行我的脚本时如下:classifier=OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70,max_depth=3,learning_rate=.02))y_pred=classifier.fit(X_train,y_train).predict(X_test)cnf_matrix=confusion_matrix(y_test,y_pred)我收到这个错误:File"C:\ProgramData\Anaconda2\lib\site-packages\s

中计混淆 code section confusion_matrix python scikit-learn classification confusion-matrix

python - 使用 scikit learn 训练逻辑回归以进行多类分类

根据scikitmulticlassclassification逻辑回归可以通过设置用于多类分类multi_class=multinomial在构造函数中。但是这样做会出错:代码:text_clf=Pipeline([('vect',TfidfVectorizer()),('clf',LogisticRegression(multi_class='multinomial')),])text_clf=text_clf.fit(X_train,Y_train)错误:ValueError:求解器liblinear不支持多项式后端。你能告诉我这里出了什么问题吗？注意:将multi_class保

训练 python section strong code scikit-learn classification

python - 在 Python 中对某些 Dataframe 列进行输入

我正在学习如何在Python上使用Imputer。这是我的代码:df=pd.DataFrame([["XXL",8,"black","class1",22],["L",np.nan,"gray","class2",20],["XL",10,"blue","class2",19],["M",np.nan,"orange","class1",17],["M",11,"green","class3",np.nan],["M",7,"red","class1",22]])df.columns=["size","price","color","class","boh"]fromsklearn.p

Dataframe python 34 section price scikit-learn missing-data imputation

python - Scikit-learn:preprocessing.scale() 与 preprocessing.StandardScaler()

我理解缩放意味着以均值(mean=0)为中心并使单位方差(variance=1)。但是，scikit-learn中的preprocessing.scale(x)和preprocessing.StandardScalar()有什么区别？最佳答案它们做的完全一样，但是:preprocessing.scale(x)只是一个函数，它转换一些数据preprocessing.StandardScaler()是一个支持TransformerAPI的类我会一直使用后者，即使我不需要inverse_transform和co。由StandardSc

preprocessing StandardScaler code section python scikit-learn scale

python - 如何将 LabelEncoder 应用于 Pandas 数据框中的特定列

我有一个由dataframe加载的数据集，其中类标签需要使用来自scikit-learn的LabelEncoder进行编码。label列是具有以下类的类标签列:[‘Standing’,‘Walking’,‘Running’,‘null’]为了执行标签编码，我尝试了以下但它不起作用。我该如何解决？fromsklearnimportpreprocessingimportpandasaspddf=pd.read_csv('dataset.csv',sep=',')df.apply(preprocessing.LabelEncoder().fit_transform(df['label']))

LabelEncoder python code section label python-3.x machine-learning scikit-learn label-encoding

python - 这条消息是什么意思？从 : can't read/var/mail/ex48 (Learn Python the Hard Way ex49)

这个问题在这里已经有了答案:GettingPythonerror"from:can'tread/var/mail/Bio"(7个答案)关闭6个月前。在ex49中，我们被告知使用以下命令调用在ex48中创建的lexicon.py文件。当我尝试使用以下命令导入词典文件时>>>fromex48importlexicon它返回以下内容:from:can'tread/var/mail/ex48我试过查找这个。这是什么意思？文件放错地方了吗？

python section notice code

18 19 202122 23 24