input_event

Hadoop 流与 Python : splitting input files manually

我是Hadoop的新手，正在尝试将其流功能与Python编写的映射器和缩减器一起使用。问题是我的原始输入文件将包含要由映射器识别的行序列。如果我让Hadoop拆分输入文件，它可能会在序列的中间进行拆分，因此不会检测到该序列。所以，我正在考虑手动拆分文件。这也将打破一些序列，因此，除此之外，我还将提供替代拆分，以创建与“第一个”拆分重叠的文件。这样我就不会丢失任何序列。我将运行thisarticle中描述的以下命令:hduser@ubuntu:/usr/local/hadoop$bin/hadoopjarcontrib/streaming/hadoop-*streaming*.jar\-

hadoop - Flume: kafka channel 和 hdfs sink get unable to deliver event 错误

我想尝试这个新的Flafka流程:只使用kafkachannel将数据传输到hdfssink。我从更容易监控的kafkachannel和记录器接收器中尝试了它。我的配置文件是:#Namethecomponentsonthisagenta1.sinks=sink1a1.channels=channel1a1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannela1.channels.channel1.brokerList=localhost:9093,localhost:9094a1.channels.cha

channel deliver MonitoredCounterGroup apache hadoop hdfs apache-kafka flume flume-ng

hadoop - HDFS NFS 网关 mount.nfs : Input/output error?

HDFSNFSGateWaymount.nfs:输入/输出错误？1.报错如下:[root@xxsbin]#mount-tnfs-overs=3,proto=tcp,nolock,noacl,synclocalhost://hdfs_ymount.nfs:Input/outputerror2016-03-1015:12:06,350WARNorg.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3:Exception804org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.au

hadoop output section code hdfs nfs

java - mapreduce.input.keyvaluelinerecordreader.key.value.separator参数在hadoop安装中的位置

我在java中使用mapreduce来读取由“:”分隔的键值类型文件。我想出了如何解析文件(使用getConf().set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",":");)。我试图找出存储这些参数的位置但找不到。我在hadoop安装上做了一个grep，但是没有设置这些参数的xml文件。根据documentation配置类，我试图在mapred-default.xml中找到该值，但没有成功。我在哪里可以找到这些值？其他一些参数也是如此。谢谢。最佳答案

keyvaluelinerecordreader mapreduce hadoop section java

java - 失败 : ParseException line 1:94 mismatched input 'hdfs' expecting StringLiteral near 'location' in partition location

Java代码:Stringcmd0="hive-e\"use"+hiveuser+";sethive.exec.compress.output=true;setmapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;setmapreduce.job.queuename="+queue+";altertable"+"resident_tmp"+"addifnotexistspartition(weekday='"+"weekday=20170807"+"')location"+location+"\"";C

amp location hive java apache mysql hadoop

hadoop - 组织.apache.ignite.IgniteException : For input string: "30s" in ignite hadoop execution

我想在apacheignite上执行Hadoop的字数统计示例。我在ignite中使用IGFS作为HDFS配置的缓存，但是在通过Hadoop提交作业以在ignite上执行后，我遇到了以下错误。提前感谢任何可以帮助我的人!Usingconfiguration:examples/config/filesystem/example-igfs-hdfs.xml[00:47:13]__________________________[00:47:13]/_/___/|//_/___/__/[00:47:13]_///(77//////_/[00:47:13]/___/\___/_/|_/___/

hadoop ignite apache java

java - hadoop java : how to know that end of reducer input is reached?

我的reducer是这样的publicstaticclassReduceextendsMapReduceBaseimplementsReducer{ListallRecords=newArrayList();publicvoidreduce(IntWritablekey,Iteratorvalues,OutputCollectoroutput,Reporterreporter)throwsIOException{allRecords.add(values.next());Text[]outputValues=newText[7];for(inti=1;i>=7;i++){outputV

java reducer code Text IntWritable hadoop mapreduce

hadoop - 如何在 Hadoop Hive 中执行 "Order of Events"查询？

过去2个月我一直在学习Hive，但我无法弄清楚如何执行某些基于序列的查询。举个例子:我有一个包含用户操作的巨大日志每个用户操作都有一个日期字段，但由于来自不同机器的多个日志文件，显然可能不会按该顺序扫描每个日志都可以记录各种不同的事件。对于这个例子，我将它们表示为字母:A、B、C、D...问题:我该如何编写一个查询，询问“平均而言，事件A在事件B发生之前发生了多少次”？我知道如何对用户进行分组，只取已经完成A和B的用户，并对发生的A的数量进行平均，但是限制第一次出现的B似乎很困难。我认为我实际上可以通过将10个左右看起来令人讨厌的查询串在一起来做到这一点，但我想知道是否有一种我不知道的

何在 amp section Hive 的 hadoop emr hiveql

scala - java.lang.NumberFormatException : For input string: "|" 异常

我已经将一个表导入到HDFS中作为fields-terminated-by'|'sqoopimport\--connectjdbc:mysql://connection\--username\--password\--tableproducts\--as-textfile\--target-dir/user/username/productsdemo\--fields-terminated-by'|'之后，我尝试使用spark-shell版本1.6.2将其读取为RDDvarproductsRDD=sc.textFile("/user/username/productsdemo")并将其

NumberFormatException amp code product section scala apache-spark hadoop sqoop

hadoop - 从 "reduce input records"到 "reduce input groups"

运行MapRed作业后，我们会得到一些关于该作业的摘要，例如:...reduceinputrecords:10reduceinputgroups:3...我知道这是由组合重复键引起的。我的问题是reducer用来组合记录的方法是什么？key1.equals(key2)orkey1.hashCode==key2.hashCode?谢谢。最佳答案只有compareTo因为键必须实现WritableComparable.key.hashCode()用于分区原因。永远不会使用等于。关于ha

amp reduce section input hadoop mapreduce

161 162 163164165 166 167