pyhton_Pandas_草庐IT

解决pandas.errors.InvalidIndexError: (slice(None, None, None), None)

Traceback(mostrecentcalllast):File"D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py",line3621,inget_locreturnself._engine.get_loc(casted_key)File"pandas\_libs\index.pyx",line136,inpandas._libs.index.IndexEngine.get_locFile"pandas\_libs\index.pyx",line142,inpandas._libs.index.IndexEngine.get

python - Pandas 在 to_json 时删除空值

我实际上有一个pandas数据框，我想将它保存为json格式。从Pandas文档中它说:NoteNaN‘s,NaT‘sandNonewillbeconvertedtonullanddatetimeobjectswillbeconvertedbasedonthedate_formatanddate_unitparameters然后使用orient选项records我有这样的东西[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]是否可以用这个代替:[{"A":1,"B":4,"C":7},{"B

python - Pandas bool 值 .any() .all()

我一直收到ValueError:ThetruthvalueofaSeriesisambiguous.使用pandas进行bool测试时使用a.empty、a.bool()、a.item()、a.any()或a.all().。不明白它说的是什么，我决定试着弄明白。然而，我现在完全糊涂了。我在这里创建了一个包含两个变量的数据框，它们之间共享一个数据点(3):In[75]:importpandasaspddf=pd.DataFrame()df['x']=[1,2,3]df['y']=[3,4,5]现在我尝试所有(是x小于y)，我将其翻译为“是否所有x小于y的值”，我得到一个没有意义的答案。I

python - Pandas 变换()与应用()

我不明白为什么apply和transform在同一数据帧上调用时返回不同的数据类型。之前我向自己解释这两个函数的方式大致是“apply折叠数据，transform与apply做完全相同的事情”code>但保留了原始索引并且不会崩溃。”请考虑以下事项。df=pd.DataFrame({'id':[1,1,1,2,2,2,2,3,3,4],'cat':[1,1,0,0,1,0,0,0,0,1]})让我们识别那些在cat列中具有非零条目的id。>>>df.groupby('id')['cat'].apply(lambdax:(x==1).any())id1True2True3False4Tr

python - Pandas :合并(内部连接)数据框的行数比原来的多

我在JupyterNotebook上使用python3.4，试图合并两个数据框，如下所示:df_A.shape(204479,2)df_B.shape(178,3)new_df=pd.merge(df_A,df_B,how='inner',on='my_icon_number')new_df.shape(266788,4)我认为上面合并的new_df应该比df_A有更少的行，因为合并就像一个内部连接。但是为什么这里的new_df居然比df_A有更多行呢？这是我真正想要的:我的df_A是这样的:idmy_icon_number-----------------------------A1

python - 在 matplotlib 图中使用 Pandas 数据帧索引作为 x 轴的值

我在Pandasdateframe中有时间序列，其中包含许多我想绘制的列。有没有办法将x轴设置为始终使用dateframe中的索引？当我使用Pandas的.plot()方法时，x轴的格式正确，但是当我传递我的日期和列时，我想直接绘制到matplotlib，该图没有正确绘制。提前致谢。plt.plot(site2.index.values,site2['Cl'])plt.show()仅供引用:site2.index.values生成此内容(为简洁起见，我删除了中间部分):array(['1987-07-25T12:30:00.000000000+0200','1987-07-25T16:

python - 使用逻辑表达式和 if 语句评估 pandas 系列值

我在使用if语句评估字典中的值时遇到问题。给定以下字典，这是我从数据框中导入的(以防万一):>>>pnl[company]29:ActiveCreditDateDebitStrikeType0102013-01-082.326521.15Put1002012-11-264080Put2002012-11-2640080Put我尝试评估以下语句以确定Active的最后一个值的值:ifpnl[company].tail(1)['Active']==1:print'yay'但是，我遇到了以下错误消息:Traceback(mostrecentcalllast):File"",line1,ini

python - Pandas 中的多索引排序

我有一个通过groupby操作创建的多索引DataFrame。我正在尝试使用索引的多个级别进行复合排序，但我似乎无法找到满足我需要的排序函数。初始数据集如下所示(各种产品的每日销售量):DateManufacturerProductNameProductLaunchDateSales02013-01-01AppleiPod2001-10-231212013-01-01AppleiPad2010-04-031322013-01-01SamsungGalaxy2009-04-271432013-01-01SamsungGalaxyTab2010-09-021542013-01-02Appl

python - Pandas 雅虎金融数据阅读器

我正在尝试从YahooFinance获取AdjClose价格到DataFrame中。我有我想要的所有股票，但我无法按日期排序。stocks=['ORCL','TSLA','IBM','YELP','MSFT']ls_key='AdjClose'start=datetime(2014,1,1)end=datetime(2014,3,28)f=web.DataReader(stocks,'yahoo',start,end)cleanData=f.ix[ls_key]dataFrame=pd.DataFrame(cleanData)printdataFrame[:5]我得到了以下结果，几乎是

python - Pandas 中不同的 read_csv index_col = None/0/False

我使用了下面的read_csv命令:In[20]:dataframe=pd.read_csv('D:/UserInterest/output/ENFP_0719/Bookmark.csv',index_col=None)dataframe.head()Out[20]:Unnamed:0timestampurlvisits001.404028e+09http://m.blog.naver.com/PostView.nhn?blogId=mi...2111.404028e+09http://m.facebook.com/l.php?u=http%3A%2F%2Fblo...1221.404