HFileOutputFormat2

hadoop - HFileOutputFormat 是否启动 reducer ？

我使用HFileOutputFormat将CSV文件批量加载到hbase表中。我只有map而没有使用job.setNumReduceTasks(0)的reduce任务。但是我可以看到一个reducer在作业中运行，这个reducer是因为HFileOutputFormat而启动的吗？以前我在同一个工作中使用TableOutputFormat，其中从未运行过reducer。我最近重构了map任务以使用HFileOutputFormat，但现在经过此更改后，我可以看到一个reducer正在运行。其次，我在reducer中遇到了以下错误，这是我之前使用TableOutputFormat时没有

hadoop - HFileOutputFormat2.configureIncrementalLoad 与 HBASE 中的 HFileOutputFormat.configureIncrementalLoad 之间的区别

您能告诉我HBASE中的HFileOutputFormat2.configureIncrementalLoad与HFileOutputFormat.configureIncrementalLoad之间有什么区别，因为这两种方法都可以正常工作？性能有提升吗？最佳答案如果您使用这两个类共存的HBase版本(0.96+)，那么它们之间绝对没有区别。您可以查看HFileOutputFormat的代码并看到HFileOutputFormat.configureIncrementalLoad只是从HFileOutputFormat2调用相同

configureIncrementalLoad HFileOutputFormat section hadoop mapreduce hbase

Hadoop MultipleOutputFormats 到 HFileOutputFormat 和 TextOutputFormat

我正在使用Hadoop运行ETL作业，我需要将经过转换的有效数据输出到HBase，并将该数据的外部索引输出到MySQL。我最初的想法是，我可以使用MultipleOutputFormats通过HFileOutputFormat(键是Text，值是ProtobufWritable)和TextOutputFormat的索引(键是Text，值是Text)导出转换后的数据。平均大小的作业(我需要同时运行多个作业的能力)的输入记录数约为7亿。我想知道A)就效率和复杂性而言，这似乎是一种合理的方法，以及B)如果可能的话，如何使用CDH3发行版的API来实现这一点。最佳

MultipleOutputFormats HFileOutputFormat section apache hadoop mapreduce hbase bulk

hadoop - 减少 HFileOutputFormat 中挂起的作业

我正在使用Hbase:0.92.1-cdh4.1.2,和Hadoop:2.0.0-cdh4.1.2我有一个mapreduce程序，它将在集群模式下使用HFileOutputFormat将数据从HDFS加载到HBase。在那个mapreduce程序中，我使用HFileOutputFormat.configureIncrementalLoad()批量加载800000条记录7.3GB大小的数据集运行良好，但900000条记录的8.3GB数据集无法运行。在8.3GB数据的情况下，我的mapreduce程序有133个maps和一个reducer，所有maps都成功完成。我的reducer状态一直

HFileOutputFormat hadoop code section hbase hfile

java - 使用 HFileOutputFormat2 时发生 ClassCastException

我正在尝试使用HFileOutputFormat2作为OutputFormat将数据从hdfs中的文件上传到hbase表，但出现以下异常，java.lang.Exception:java.lang.ClassCastException:org.apache.hadoop.hbase.client.Putcannotbecasttoorg.apache.hadoop.hbase.Cellatorg.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)atorg.apache.hadoop.map

HFileOutputFormat2 ClassCastException java code HFileOutputFormat hadoop hbase