dataframe_草庐IT

Python Pandas : how to add a totally new column to a data frame inside of a groupby/transform operation

我想在我的数据中标记一些分位数，对于DataFrame的每一行，我希望在一个名为例如的新列中的条目"xtile"来保存这个值。例如，假设我创建一个这样的数据框:importpandas,numpyasnpdfrm=pandas.DataFrame({'A':np.random.rand(100),'B':(50+np.random.randn(100)),'C':np.random.randint(low=0,high=3,size=(100,))})假设我编写了自己的函数来计算数组中每个元素的五分位数。我对此有自己的功能，但例如只需引用scipy.stats.mstats.mquan

python - Pandas 合并给出错误 "Buffer has wrong number of dimensions (expected 1, got 2)"

我正在尝试进行pandas合并，并在尝试运行时从标题中得到上述错误。我使用3列进行匹配，而在我只对2列进行类似合并之前，它工作正常。df=pd.merge(df,c,how="left",left_on=["section_term_ps_id","section_school_id","state"],right_on=["term_ps_id","term_school_id","state"])两个数据框的列df:Index([u'section_ps_id',u'section_school_id',u'section_course_number',u'section_term

dimensions amp 39 school section python pandas dataframe data-structures

python - Pandas 合并给出错误 "Buffer has wrong number of dimensions (expected 1, got 2)"

我正在尝试进行pandas合并，并在尝试运行时从标题中得到上述错误。我使用3列进行匹配，而在我只对2列进行类似合并之前，它工作正常。df=pd.merge(df,c,how="left",left_on=["section_term_ps_id","section_school_id","state"],right_on=["term_ps_id","term_school_id","state"])两个数据框的列df:Index([u'section_ps_id',u'section_school_id',u'section_course_number',u'section_term

dimensions amp 39 school section python pandas dataframe data-structures

Python Pandas : Boolean indexing on multiple columns

这个问题在这里已经有了答案:selectingacrossmultiplecolumnswithpandas(3个回答)关闭9年前。尽管至少有twogood关于如何在Python的pandas库中索引DataFrame的教程，我仍然无法找到一种优雅的方式来对多个列进行SELECTing。>>>d=pd.DataFrame({'x':[1,2,3,4,5],'y':[4,5,6,7,8]})>>>dxy014125236347458>>>d[d['x']>2]#Thisworksfinexy236347458>>>d[d['x']>2&d['y']>7]#Ihadexpectedthis

indexing multiple section gt pandas python dataframe

Python Pandas : Boolean indexing on multiple columns

这个问题在这里已经有了答案:selectingacrossmultiplecolumnswithpandas(3个回答)关闭9年前。尽管至少有twogood关于如何在Python的pandas库中索引DataFrame的教程，我仍然无法找到一种优雅的方式来对多个列进行SELECTing。>>>d=pd.DataFrame({'x':[1,2,3,4,5],'y':[4,5,6,7,8]})>>>dxy014125236347458>>>d[d['x']>2]#Thisworksfinexy236347458>>>d[d['x']>2&d['y']>7]#Ihadexpectedthis

indexing multiple section gt pandas python dataframe

python - 使用 matplotlib 中的 dataframe.plot() 函数编辑条的宽度

我正在使用以下方法制作堆积条形图:DataFrame.plot(kind='bar',stacked=True)我想控制条形的宽度，使条形像直方图一样相互连接。我查看了文档但无济于事-有什么建议吗？这样可以吗？最佳答案对于遇到此问题的任何人:从pandas0.14开始，用条形图绘制有一个“宽度”命令:https://github.com/pydata/pandas/pull/6644上面的例子现在可以简单地通过使用来解决df.plot(kind='bar',stacked=True,width=1)见pandas.DataFra

matplotlib dataframe 条形 code pandas python histogram bar-chart

python - 使用 matplotlib 中的 dataframe.plot() 函数编辑条的宽度

我正在使用以下方法制作堆积条形图:DataFrame.plot(kind='bar',stacked=True)我想控制条形的宽度，使条形像直方图一样相互连接。我查看了文档但无济于事-有什么建议吗？这样可以吗？最佳答案对于遇到此问题的任何人:从pandas0.14开始，用条形图绘制有一个“宽度”命令:https://github.com/pydata/pandas/pull/6644上面的例子现在可以简单地通过使用来解决df.plot(kind='bar',stacked=True,width=1)见pandas.DataFra

matplotlib dataframe 条形 code pandas python histogram bar-chart

python - Spark使用前一行的值将新列添加到数据框

我想知道如何在Spark(Pyspark)中实现以下目标初始数据框:+--+---+|id|num|+--+---+|4|9.0|+--+---+|3|7.0|+--+---+|2|3.0|+--+---+|1|5.0|+--+---+结果数据框:+--+---+-------+|id|num|new_Col|+--+---+-------+|4|9.0|7.0|+--+---+-------+|3|7.0|3.0|+--+---+-------+|2|3.0|5.0|+--+---+-------+我设法通过使用以下方式将新列“附加”到数据框中:df.withColumn("new_

python Spark code section stackoverflow apache-spark dataframe pyspark apache-spark-sql

python - Spark使用前一行的值将新列添加到数据框

我想知道如何在Spark(Pyspark)中实现以下目标初始数据框:+--+---+|id|num|+--+---+|4|9.0|+--+---+|3|7.0|+--+---+|2|3.0|+--+---+|1|5.0|+--+---+结果数据框:+--+---+-------+|id|num|new_Col|+--+---+-------+|4|9.0|7.0|+--+---+-------+|3|7.0|3.0|+--+---+-------+|2|3.0|5.0|+--+---+-------+我设法通过使用以下方式将新列“附加”到数据框中:df.withColumn("new_

python Spark code section stackoverflow apache-spark dataframe pyspark apache-spark-sql

python - 索引 Pandas 数据帧 : integer rows, 命名列

说df是一个Pandas数据框。df.loc[]只接受名字df.iloc[]只接受整数(实际位置)df.ix[]接受名称和整数:当引用行时，df.ix[row_idx,]只想被命名。例如df=pd.DataFrame({'a':['one','two','three','four','five','six'],'1':np.arange(6)})df=df.ix[2:6]print(df)1a22three33four44five55sixdf.ix[0,'a']抛出一个错误，它不会返回“二”。当引用列时，iloc更喜欢整数，而不是名称。例如df.ix[2,1]返回“三”，而不是2。(

命名 integer code section 39 python pandas dataframe