panda_link_草庐IT

python - 将 Pandas (多)索引解析为日期时间

我有如下多索引dfxyiddateabc3/1/199410079/1/19949083/1/1995809日期存储为str。我想解析日期索引。以下声明df.index.levels[1]=pd.to_datetime(df.index.levels[1])返回错误:TypeError:'FrozenList'doesnotsupportmutableoperations. 最佳答案如前所述，您必须重新创建索引:df.index=df.index.set_levels([df.index.levels[0],pd.to_datet

python - 将 Pandas 数据框附加到 Google 电子表格

案例:我的脚本返回一个数据框，该数据框必须作为新数据行附加到现有的谷歌电子表格中。截至目前，我通过gspread将数据框附加为多个单行。我的代码:importgspreadimportpandasaspddf=pd.DataFrame()#Aftersomeprocessinganon-emptydataframehasbeencreated.output_conn=gc.open("SheetName").worksheet("xyz")#Here'SheetName'isgooglespreadsheetand'xyz'issheetintheworkbookfori,rowind

python Pandas section gspread dataframe google-sheets

python - 在 Pandas 的多列上应用自定义函数

我在Pandas中“应用”自定义函数时遇到问题。当我测试该函数时，直接传递它起作用的值并正确返回响应。但是，当我尝试以这种方式传递列值时deffeez(rides,plan):pmt4=200inc4=50#numberridesincludedmin_rate4=4ifplan=="4Plan":ifrides>inc4:fee=((rides-inc4)*min_rate4)+pmt4else:fee=pmt4return(fee)else:return0.1df['fee'].apply(feez(df.total_rides,df.plan_name))我收到错误:“Serie

自定多列 code section rides python pandas function dataframe apply

python - 所有 Pandas 细胞的词形还原

我有一个Pandas数据框。有一列，我们将其命名为:'col'此列的每个条目都是一个单词列表。['word1'、'word2'等]如何使用nltk库有效地计算所有这些词的引理？importnltknltk.stem.WordNetLemmatizer().lemmatize('word')我希望能够为pandas数据集的一列中所有单元格的所有单词找到一个引理。我的数据类似于:importpandasaspddata=[[['walked','am','stressed','Fruit']],[['going','gone','walking','riding','running']]]

词形 python 39 section code pandas

python - Pandas - 在数据框中的列中展开嵌套的 json 数组

我有一个json数据(来自mongodb)，其中包含数千条记录(因此是一个json对象的数组/列表)，每个对象的结构如下所示:{"id":1,"first_name":"Mead","last_name":"Lantaph","email":"mlantaph0@opensource.org","gender":"Male","ip_address":"231.126.209.31","nested_array_to_expand":[{"property":"Quaxo","json_obj":{"prop1":"Chevrolet","prop2":"MercyStreets"}}

中展 python 34 json code pandas

python - pandas:规范化 DataFrame

我在扁平化文件中输入了数据。我想通过将这些数据拆分成表格来规范化这些数据。我可以用pandas巧妙地做到这一点吗-也就是说，通过将扁平化数据读入DataFrame实例，然后应用一些函数来获得生成的DataFrame实例？例子:数据以CSV文件的形式在磁盘上提供给我，如下所示:ItemIdClientIdPriceQuotedItemDescription1110scrollofSneak1212scrollofSneak1313scrollofSneak222500scrollofInvisible242200scrollofInvisible我想创建两个DataFrame:ItemI

DataFrame python scroll code ItemId pandas database-normalization

python - 在 pandas 或 matplotlib 的一张图中绘制多个箱线图？

我有两个箱线图a1=a[['kCH4_sync','week_days']]a1.boxplot(by='week_days',meanline=True,showmeans=True,showcaps=True,showbox=True,showfliers=False)a2=a[['CH4_sync','week_days']]a2.boxplot(by='week_days',meanline=True,showmeans=True,showcaps=True,showbox=True,showfliers=False)但我想将它们放在一张图中进行比较。你对解决这个问题有什么建议

图中 matplotlib section True python pandas

python - Pandas scatter_matrix - 绘制分类变量

我正在查看Kaggle竞赛中著名的泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data我已使用以下方式加载和处理数据:#importrequiredlibrariesimportpandasaspdimportmatplotlib.pyplotasplt%matplotlibinline#loadthedatafromthefiledf=pd.read_csv('./data/train.csv')#importthescatter_matrixfunctionalityfrompandas.tools.plottin

scatter_matrix scatter section 39 python pandas matplotlib kaggle

列数据中的python pandas read_csv定界符

我有这种类型的CSV文件:12012;MyNameisMike.Whatisyour's?;3;01522;Inmyopinion:It'scool;oratleastnotbad;4;021427;Hello.Ilikethisfeature!;5;1我想将此数据放入dapandas.DataFrame中。但是read_csv(sep=";")由于第2行中用户生成的消息列中的分号而引发异常(在我看来:这很酷；或者至少还不错)。所有剩余的列始终具有数字数据类型。管理这个最方便的方法是什么？最佳答案处理不带引号的定界符总是一件麻烦

read_csv python code section 34 csv python-3.x pandas delimiter

python - Pandas TimeSeries 重采样产生 NaN

我正在对PandasTimeSeries进行重采样。时间序列由二进制值(它是一个分类变量)组成，没有缺失值，但在重新采样后出现NaN。这怎么可能？我不能在这里发布任何示例数据，因为它是敏感信息，但我按如下方式创建和重新采样该系列:series=pd.Series(data,ts)series_rs=series.resample('60T',how='mean') 最佳答案 upsampling转换为固定时间间隔，因此如果没有样本，您将得到NaN。您可以通过fill_method='bfill'或正向填充缺失值-fill_metho

TimeSeries python 01 2015 00 pandas time-series resampling