pandas_草庐IT

python - 如何获取 matplotlib 箱线图的箱线图数据

我需要获取生成的统计数据以在Pandas中绘制箱形图(使用数据框创建箱形图)。即四分位数1、四分位数2、四分位数3、下晶须值、上晶须值和异常值。我尝试了以下查询来绘制箱线图。importpandasaspddf=pd.DataFrame(np.random.rand(100,5),columns=['A','B','C','D','E'])pd.DataFrame.boxplot(df,return_type='both')有没有办法代替手动计算值？最佳答案一种选择是使用图中的y数据-可能对离群值(传单)最有用_,bp=pd.D

python - pandas - 数据框中出现的唯一行数

如何计算DataFrame中每个唯一行的出现次数？data={'x1':['A','B','A','A','B','A','A','A'],'x2':[1,3,2,2,3,1,2,3]}df=pd.DataFrame(data)dfx1x20A11B32A23A24B35A16A27A3我想得到x1x2count0A121A232A313B32 最佳答案 IIUC您可以将参数as_index=False作为参数传递给groupby:In[100]:df.groupby(['x1','x2'],as_index=False).coun

python pandas code 39 section

python - 从 pandas 转换为 numpy 时如何保留列名

根据tothispost，我应该能够访问ndarray中列的名称作为a.dtype.names但是，如果我使用df.as_matrix()或df.values将pandasDataFrame转换为ndarray，则dtype.names字段为None。此外，如果我尝试将列名分配给ndarrayX=pd.DataFrame(dict(age=[40.,50.,60.],sys_blood_pressure=[140.,150.,160.]))printXprinttype(X.as_matrix())#printtype(X.as_matrix()[0])#m=X.as_matrix()

列名 python code 39 pre pandas numpy

python - Pandas 加入具有不同名称的列

这个问题在这里已经有了答案:PandasMerging101(8个答案)关闭3年前。我有两个不同的数据框，我想对其执行一些sql操作。不幸的是，就像我正在处理的数据一样，拼写通常不同。请参阅下面的示例，其中我认为语法看起来像用户ID属于df1，用户名属于df2。有人帮帮我吗？#notworking-Iassumesomesyntaxissue?pd.merge(df1,df2,on=[['userid'=='username','column1']],how='left')

python Pandas section notice 39 sql merge

python - 将字符串转换为日期 [含年份和季度]

我有一个pandas数据框，其中一列包含以下格式的年份和季度字符串:2015Q1我的问题:如何将其转换为两个日期时间列，一个用于年份，一个用于季度。最佳答案您可以使用split，然后将year列转换为int并在必要时将Q添加到q列:df=pd.DataFrame({'date':['2015Q1','2015Q2']})print(df)date02015Q112015Q2df[['year','q']]=df.date.str.split('Q',expand=True)df.year=df.year.astype(int)d

python 将 code 2015 39 date pandas

Python Numpy 类型错误 : ufunc 'isfinite' not supported for the input types

这是我的代码:deftopK(dataMat,sensitivity):meanVals=np.mean(dataMat,axis=0)meanRemoved=dataMat-meanValscovMat=np.cov(meanRemoved,rowvar=0)eigVals,eigVects=np.linalg.eig(np.mat(covMat))我在上面最后一行的标题中发现了错误。我怀疑与数据类型有关，因此，这是Spyder中变量资源管理器中变量和数据类型的图像:我尝试将np.linalg.eig(np.mat(covMat))更改为np.linalg.eig(np.array(

amp supported code section strong python arrays pandas numpy eigenvalue

python - 以字符串形式返回索引值

我正在尝试将值的索引作为字符串返回。我在这里看到的其他问题是否将索引作为列表返回。抛出的错误是:您返回了一个类型的变量，而我们期望的类型是我的代码:String_to_be_returned=(df['Column'].index[df['Column']==2])例子:当我打印String_to_be_returned时，我得到了这个:Index(['美国'],dtype='object',name='国家名称') 最佳答案我认为您需要添加[0]以选择index的第一个值，即array:String_to_be_returned

python 以 String_to_be_returned code Column pandas

python - 返回索引元组和 .max() 值？

我正在尝试返回索引元组(下面的人名)和下面“%”列的最大值。当我创建一个Dataframe并尝试df['%'].max()Pandas总是只返回值而不是索引。但是，我想从“%”列中的索引和最大值的键值对创建一个元组。我确定这是一个新手问题，谢谢你帮助我!这是一些示例数据:Points_ScoredPossible_Points%FavoriateFoodJan602000.3PuddingJane872000.435PizzaBob542000.27SaladBubba422000.21SalsaJack982000.49AvacodoJohn452000.225BaconMike63

python max section 200 code python-3.x pandas dataframe tuples

python - 应用于行的几何平均数

我以这个数据框为例:Col1Col2Col3Col41232.2我想添加一个名为“Gmean”的第4列，用于计算每行前3列的几何平均值。如何完成？谢谢! 最佳答案一种方法是使用Scipy'sgeometricmeanfunction-fromscipy.stats.mstatsimportgmeandf['Gmean']=gmean(df.iloc[:,:3],axis=1)使用formulaofgeometricmean的另一种方式本身-df['Gmean']=np.power(df.iloc[:,:3].prod(axis=1

python 应用 code section noreferrer pandas numpy scipy

python - pandas groupby 计数、总和和平均值

我在Pandas中有以下DF:+---------+--------+--------------------+|keyword|weight|otherkeywords|+---------+--------+--------------------+|dog|0.12|[cat,horse,pig]||cat|0.5|[dog,pig,camel]||horse|0.07|[dog,camel,cat]||dog|0.1|[cat,horse]||dog|0.2|[cat,horse,pig]||horse|0.3|[camel]|+---------+--------+-----

groupby python horse 39 cat python-3.x pandas