草庐IT

python - 用同一列的平均值填充 Pyspark 数据框列空值

有了这样的数据框,rdd_2=sc.parallelize([(0,10,223,"201601"),(0,10,83,"2016032"),(1,20,None,"201602"),(1,20,3003,"201601"),(1,20,None,"201603"),(2,40,2321,"201601"),(2,30,10,"201602"),(2,61,None,"201601")])df_data=sqlContext.createDataFrame(rdd_2,["id","type","cost","date"])df_data.show()+---+----+----+--