features

python - Sklearn : Alternative Dim Reduction? 中的 PCA 内存错误

我试图在Sklearn中使用PCA来减少一个非常大的矩阵的维数，但它会产生内存错误(所需的RAM超过128GB)。我已经设置了copy=False并且我正在使用计算成本较低的随机PCA。有解决办法吗？如果不是，我可以使用哪些其他需要更少内存的暗淡减少技术。谢谢。更新:我尝试PCA的矩阵是一组特征向量。它来自通过预训练的CNN传递一组训练图像。矩阵是[300000,51200]。尝试的PCA组件:100到500。我想降低它的维度，以便我可以使用这些功能来训练ML算法，例如XGBoost。谢谢。最佳答案最后，我使用了Truncate

python - 带正则化的 Numpy 线性回归

我没有发现我的正则化线性回归代码有什么问题。不规则化我只是这样，我有理由确定这是正确的:importnumpyasnpdefget_model(features,labels):returnnp.linalg.pinv(features).dot(labels)这是我的正则化解决方案代码，我看不出它有什么问题:defget_model(features,labels,lamb=0.0):n_cols=features.shape[1]returnlinalg.inv(features.transpose().dot(features)+lamb*np.identity(n_cols))

python Numpy features code section machine-learning linear-regression

python - sklearn随机森林索引feature_importances_如何做

我在sklearn中使用了RandomForestClassifier来确定数据集中的重要特征。我如何能够返回实际的特征名称(我的变量标记为x1、x2、x3等)而不是它们的相对名称(它告诉我重要的特征是“12”、“22”等)。以下是我目前用于返回重要功能的代码。important_features=[]forx,iinenumerate(rf.feature_importances_):ifi>np.average(rf.feature_importances_):important_features.append(str(x))printimportant_features此外，为了

feature_importances importances code pre important python scikit-learn random-forest feature-selection

python - scikit 学习 : desired amount of Best Features (k) not selected

我正在尝试使用卡方(scikit-learn0.10)选择最佳特征。从总共80个训练文档中，我首先提取了227个特征，并从这227个特征中选择前10个特征。my_vectorizer=CountVectorizer(analyzer=MyAnalyzer())X_train=my_vectorizer.fit_transform(train_data)X_test=my_vectorizer.transform(test_data)Y_train=np.array(train_labels)Y_test=np.array(test_labels)X_train=np.clip(X_tr

Features selected True code False python machine-learning scikit-learn chi-squared

python - 在 pypi 上注册包时为 "Server response (401): You must login to access this feature"

我正在尝试在pyPI上注册一个包。在创建一个看起来像的.pypirc之后[distutils]#thistellsdistutilswhatpackageindexesyoucanpushtoindex-servers=pypipypitest[pypi]repository:https://pypi.python.org/pypiusername:"amfarrell"password:"Idontpostmypassphrasepublicly"[pypitest]repository:https://testpypi.python.org/pypiusername:"amfarr

amp response section pypi python setuptools distutils

python - SkLearn 多项式 NB : Most Informative Features

由于我的分类器在测试数据上产生了大约99%的准确率，我有点怀疑并想深入了解我的NB分类器最有用的特征，看看它正在学习什么样的特征。以下主题非常有用:Howtogetmostinformativefeaturesforscikit-learnclassifiers?至于我的特征输入，我仍在尝试，目前我正在使用CountVectorizer测试一个简单的unigram模型:vectorizer=CountVectorizer(ngram_range=(1,1),min_df=2,stop_words='english')关于上述主题，我发现了以下函数:defshow_most_inform

Informative Features 16.2420 2420 section python machine-learning scikit-learn classification text-classification

python - 投票分类器 : Different Feature Sets

我有两个不同的特征集(因此，行数相同且标签相同)，在我的例子中DataFrames:df1:|A|B|C|-------------|1|4|2||1|4|8||2|1|1||2|3|0||3|2|5|df2:|E|F|---------|6|1||1|3||8|1||2|8||5|2|标签:|labels|----------|5||5||1||7||3|我想用它们来训练VotingClassifier。但是拟合步骤只允许指定单个特征集。目标是使clf1与df1和clf2与df2相匹配。eclf=VotingClassifier(estimators=[('df1-clf',clf1

Different Feature code pre estimators python machine-learning scikit-learn

python - 应用 TensorFlow Transform 来转换/缩放生产中的特征

概览我按照以下指南编写了TFRecords，其中我使用了tf.Transform来预处理我的功能。现在，我想部署我的模型，为此我需要对实时数据应用此预处理功能。我的方法首先，假设我有两个特征:features=['amount','age']我有来自ApacheBeam的transform_fn，位于working_dir=gs://path-to-transform-fn/然后我使用以下方法加载转换函数:tf_transform_output=tft.TFTransformOutput(working_dir)我认为在生产中提供服务的最简单方法是获取经过处理的数据的numpy数组，然

生产中 TensorFlow code features transform python apache-beam tensorflow-serving tensorflow-transform

python - 值错误 : Feature not in features dictionary

我正在尝试使用TensorFlow编写一个简单的深度机器学习模型。我正在使用我在Excel中制作的玩具数据集，只是为了让模型工作并接受数据。我的代码如下:importpandasaspdimportnumpyasnpimporttensorflowastfraw_data=np.genfromtxt('ai/mock-data.csv',delimiter=',',dtype=str)my_data=np.delete(raw_data,(0),axis=0)#deletesthefirstrow,axis=0indicatesrow,axis=1indicatescolumnmy_d

dictionary features 39 column code python numpy tensorflow

python - 如何有效地创建遍历 python 中的大量列表？

我有这样的数据:data={'x':Counter({'a':1,'b':45}),'y':Counter({'b':1,'c':212})}我的标签是data的键，内部字典的键是特征:all_features=['a','b','c']all_labels=['x','y']我需要这样创建列表列表:[[data[label][feat]forfeatinall_features]forlabelinall_labels][输出]:[[1,45,0],[0,1,212]]我的len(all_features)是~5,000,000而len(all_labels)是~100,000最终目

python 如何 code features all_labels list matrix scipy nested-lists

11 12 131415 16 17