pyspark-dataframes

python - 从 DataFrame 中减去一个 Series，同时保持 DataFrame 结构不变

如何从DataFrame中减去Series，同时保持DataFrame结构完整？df=pd.DataFrame(np.zeros((5,3)))s=pd.Series(np.ones(5))df-s012340-1-1-1NaNNaN1-1-1-1NaNNaN2-1-1-1NaNNaN3-1-1-1NaNNaN4-1-1-1NaNNaN我想要的是相当于从DataFrame中减去一个标量df-10120-1-1-11-1-1-12-1-1-13-1-1-14-1-1-1 最佳答案也许:>>>df=pd.DataFrame(np.ze

python - 在多索引 pandas DataFrame 上选择一列

给定这个DataFrame:frompandasimportDataFramearrays=[['bar','bar','baz','baz','foo','foo'],['one','two','one','two','one','two']]tuples=zip(*arrays)index=pd.MultiIndex.from_tuples(tuples,names=['first','second'])df=DataFrame(randn(3,6),index=[1,2,3],columns=index)如何绘制图表:X轴:1、2、3。这三个系列的名字是:bar、baz、foo。

多索上选 39 section one python matplotlib pandas multi-index

python - 推断 Pandas DataFrame

使用Series.interpolate很容易在Pandas.DataFrame中插入值，如何进行外推？例如，给定一个如图所示的DataFrame，我们如何将它外推14个月到2014年12月31日？线性外推法很好。X1=range(10)X2=map(lambdax:x**2,X1)df=pd.DataFrame({'x1':X1,'x2':X2},index=pd.date_range('20130101',periods=10,freq='M'))我认为必须首先创建一个新的DataFrame，DateTimeIndex从2013-11-31开始，再延长14个M时间段。除此之外，我被

推断 DataFrame code 2013 index python python-2.7 pandas extrapolation

python - 使用 pandas dataframe 绘制误差线 matplotlib

我确信这相对容易，但我似乎无法让它发挥作用。我想使用matplotlib模块绘制此df，其中日期为x轴，gas为y轴，std为错误栏。我可以使用pandas包装器让它工作，但我不知道如何设置错误栏的样式。使用Pandasmatplotlib包装器我可以使用matplotlibpandaswrappertrip.plot(yerr='std',ax=ax,marker='D')绘制误差线但是我不确定如何使用plt.errorbar()访问错误栏以像在matplotlib中那样设置它们的样式使用Matplotlibfig,ax=plt.subplots()ax.bar(trip.index

matplotlib dataframe code trip section python pandas plot

python - 按键更新 pandas DataFrame

我有一个历史股票交易的数据框。该框架包含['ticker'、'date'、'cusip'、'profit'、'security_type']等列。最初:trades['cusip']=np.nantrades['security_type']=np.nan我有历史配置文件，我可以加载到具有['ticker'、'cusip'、'date'、'name'、'security_type'、'primary_exchange']等列的框架中。我想用配置中的cusip和security_type更新交易框架，但仅限于代码和日期匹配的地方。我以为我可以做这样的事情:pd.merge(trades,

按键 DataFrame 39 ticker trades python pandas

python - Pyspark 将列类型从日期更改为字符串

我有以下数据框:corr_temp_df[('vacationdate','date'),('valueE','string'),('valueD','string'),('valueC','string'),('valueB','string'),('valueA','string')]现在我想将vacationdate列的数据类型更改为字符串，以便数据框也采用这种新类型并覆盖所有条目的数据类型数据。例如。写完后:corr_temp_df.dtypes应覆盖vacationdate的数据类型。我已经使用过像cast、StringType或astype这样的函数，但我没有成功。你知道怎

Pyspark python code date 39 apache-spark apache-spark-sql

python - 如何检查 pandas DataFrame 中的特定单元格是否为空？

我在pandas中有以下df。0ABC12NaN8如何检查df.iloc[1]['B']是否为NaN？我尝试使用df.isnan()并得到了这样的表格:0ABC1falsetruefalse但我不确定如何为表格编制索引，以及这是否是执行工作的有效方式？最佳答案使用pd.isnull,供选择使用loc或iloc:print(df)0ABC012NaN8print(df.loc[0,'B'])nana=pd.isnull(df.loc[0,'B'])print(a)Trueprint(df['B'].iloc[0])nana=pd.

DataFrame python code pandas section

python - 在 python pandas 中，如何重新采样和插入 DataFrame？

我有一个pdDataFrame，通常采用这种格式:12340.11000.0000E+001.0000E+005.0000E+000.13237.7444E-058.7935E-011.0452E+000.15454.3548E-047.7209E-014.5432E-010.17681.2130E-036.7193E-012.6896E-010.19902.5349E-035.7904E-011.8439E-010.22134.5260E-034.9407E-011.3771E-01我想做的是从列表中重新采样第1列(索引)值，例如:indexList=numpy.linspace(0

python DataFrame section strong pandas interpolation reindex

python - 使用分层列创建 DataFrame

创建具有分层列的DataFrame的最简单方法是什么？我目前正在从名称字典创建一个DataFrame->Series使用:df=pd.DataFrame(data=serieses)我想使用相同的列名称，但在列上添加额外的层次结构。目前，我希望附加级别的列具有相同的值，比方说“估计”。我正在尝试以下方法，但似乎不起作用:pd.DataFrame(data=serieses,columns=pd.MultiIndex.from_tuples([(x,"Estimates")forxinserieses.keys()]))我得到的只是一个包含所有NaN的DataFrame。比如我要找的大概

DataFrame python code section pandas

python - pandas.DataFrame corrwith() 方法

我最近开始使用pandas。谁能解释一下函数.corrwith()与Series和DataFrame的行为差异？假设我有一个DataFrame:frame=pd.DataFrame(data={'a':[1,2,3],'b':[-1,-2,-3],'c':[10,-10,10]})我想计算特征“a”与所有其他特征之间的相关性。我可以通过以下方式做到这一点:frame.drop(labels='a',axis=1).corrwith(frame['a'])结果将是:b-1.0c0.0但是非常相似的代码:frame.drop(labels='a',axis=1).corrwith(fram

DataFrame corrwith code 39 python pandas