和d3.csv我正在阅读CSV,然后存储以至于console.log(data[0])返回Object{username:"mark",y:0,x:0,value:0}现在我想从data每个用户名的首次出现。在pythonpandas中,我会用过data.drop_duplicates(columns='username')编辑:考虑以下示例:varX=[{username:"a",y:0,x:0,value:0},{username:"b",y:0,x:0,value:0},{username:"a",y:1,x:0,value:0}{username:"c",y:0,x:0,value:0
我需要在PysparkDataFrame中旋转多个列。样本数据框,>>>d=[(100,1,23,10),(100,2,45,11),(100,3,67,12),(100,4,78,13),(101,1,23,10),(101,2,45,13),(101,3,67,14),(101,4,78,15),(102,1,23,10),(102,2,45,11),(102,3,67,16),(102,4,78,18)]>>>mydf=spark.createDataFrame(d,['id','day','price','units'])>>>mydf.show()+---+---+-----+---