草庐IT

Hadoop IO 错误 : Type mismatch in key from map : expected org. apache.hadoop.io.Text,收到 RegexMatcher.CustomKey

coder 2024-01-08 原文

我收到以下错误:

java.lang.Exception: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received RegexMatcher.CustomKey
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received RegexMatcher.CustomKey

首先,我在 Map-reduce 中定义了一个名为 CustomKey 的自定义数据类型:

public  class CustomKey implements Writable {

    public Text userId;
    public Text friendId;

    public CustomKey() {

        this.userId = new Text();
        this.friendId = new Text();

    }

    public CustomKey(String userId, String friendId) {

        this.userId = new Text(userId);
        this.friendId = new Text(friendId);

    }

    @Override
    public void write(DataOutput out) throws IOException {
        userId.write(out);
        userId.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        userId.readFields(in);
        friendId.readFields(in);
    }



}

然后我创建一个 Mapper SingleClassv2LogMapper

public static class SingleClassv2LogMapper extends Mapper<Object, Text, CustomKey, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        Configuration conf = context.getConfiguration();
        String regex = conf.get("regex");
        String delimeter = conf.get("delimeter");
        String currentLine = value.toString();
        String tag = RingIdLogParser.parseHashTag(value.toString());
        String body = RingIdLogParser.parseBody(value.toString());
        if (tag != null) {
            if (tag.equals(RegularExpressionBundle.updateMultipleMessageStatus)) {
                CustomKey customKey = RingIdLogParser.parseUserFrinedInfo(body);
                int messageNo = RingIdLogParser.getMessageCount(body);
                context.write(customKey, new IntWritable(messageNo));
            }
        }
    }

}

还有一个 reducer

public static class SingleClassv2LogReducer extends Reducer<CustomKey, IntWritable, Text, IntWritable> {

    TextArrayWritable sum = new TextArrayWritable();

    @Override
    protected void reduce(CustomKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        int sum = 0;
        for (IntWritable value : values) {
            sum = sum + value.get();

        }
        String compactUser = key.userId.toString() +" "+ key.friendId.toString();
        context.write(new Text(compactUser), new IntWritable(sum));
    }

}

我现在该怎么办?请任何人帮助我。

Driver相关代码如下所示

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Regex Matcher");
job.setJarByClass(SingleClassLogDriverv2.class);
job.setMapperClass(SingleClassv2LogMapper.class);
job.setCombinerClass(SingleClassv2LogCombiner.class);
job.setReducerClass(SingleClassv2LogReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapOutputKeyClass(CustomKey.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

最佳答案

我在使用 Eclipse 为 Map-Reduce 和 Comparability 创建 JAR 时遇到了类似的问题,我的问题是在数字前面打印 390k 数字的单词计数,这与传统的遗留 WordCount 程序不同。这是我的 12 个文件中的数字列表,其中还包含一次冗余。

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

此后我更正了,我希望在下面的聚合中得到结果 -

716900482    Seventy One Crore Sixty Nine Lac Four Hundred Eighty Two only.

我已经开发了一个 Maven 构建工具,用于用文字打印数字,因此将该 JAR 明确添加到我的项目中。

所以,我们开始使用我的程序,它类似于 WordCount 程序但目的不同 -

package com.whodesire.count;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import com.whodesire.numstats.AmtInWords;

public class CountInWords {


    public static class NumberTokenizerMapper 
                    extends Mapper <Object, Text, LongWritable, Text> {

        private static final Text theOne = new Text("1");
        private LongWritable longWord = new LongWritable();

        public void map(Object key, Text value, Context context) {

            try{
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    longWord.set(Long.parseLong(itr.nextToken()));
                    context.write(longWord, theOne);
                }
            }catch(ClassCastException cce){
                System.out.println("ClassCastException raiseddd...");
                System.exit(0);
            }catch(IOException | InterruptedException ioe){
                ioe.printStackTrace();
                System.out.println("IOException | InterruptedException raiseddd...");
                System.exit(0);
            }
        }
    }

    public static class ModeReducerCumInWordsCounter 
            extends Reducer <LongWritable, Text, LongWritable, Text>{
        private Text result = new Text();

        //This is the user defined reducer function which is invoked for each unique key
        public void reduce(LongWritable key, Iterable<Text> values, 
                Context context) throws IOException, InterruptedException {

            /*** Putting the key, which is a LongWritable value, 
                        putting in AmtInWords constructor as String***/
            AmtInWords aiw = new AmtInWords(key.toString());
            result.set(aiw.getInWords());

            //Finally the word and counting is sent to Hadoop MR and thus to target
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        /****
         *** all random numbers generated inside input files has been
         *** generated using url https://andrew.hedges.name/experiments/random/
         ****/

        //Load the configuration files and add them to the the conf object
        Configuration conf = new Configuration();       

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        Job job = new Job(conf, "CountInWords");

        //Specify the jar which contains the required classes for the job to run.
        job.setJarByClass(CountInWords.class);

        job.setMapperClass(NumberTokenizerMapper.class);
        job.setCombinerClass(ModeReducerCumInWordsCounter.class);
        job.setReducerClass(ModeReducerCumInWordsCounter.class);

        //Set the output key and the value class for the entire job
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);

        //Set the Input (format and location) and similarly for the output also
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        //Setting the Results to Single Target File
        job.setNumReduceTasks(1);

        //Submit the job and wait for it to complete
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

我知道这是一个为时已晚的回复,但希望这也能帮助其他人找到方法,谢谢。

关于Hadoop IO 错误 : Type mismatch in key from map : expected org. apache.hadoop.io.Text,收到 RegexMatcher.CustomKey,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36369953/

有关Hadoop IO 错误 : Type mismatch in key from map : expected org. apache.hadoop.io.Text,收到 RegexMatcher.CustomKey的更多相关文章

  1. ruby-on-rails - Rails 常用字符串(用于通知和错误信息等) - 2

    大约一年前,我决定确保每个包含非唯一文本的Flash通知都将从模块中的方法中获取文本。我这样做的最初原因是为了避免一遍又一遍地输入相同的字符串。如果我想更改措辞,我可以在一个地方轻松完成,而且一遍又一遍地重复同一件事而出现拼写错误的可能性也会降低。我最终得到的是这样的:moduleMessagesdefformat_error_messages(errors)errors.map{|attribute,message|"Error:#{attribute.to_s.titleize}#{message}."}enddeferror_message_could_not_find(obje

  2. ruby-on-rails - 使用 Sublime Text 3 突出显示 HTML 背景语法中的 ERB? - 2

    所以我在关注Railscast,我注意到在html.erb文件中,ruby代码有一个微弱的背景高亮效果,以区别于其他代码HTML文档。我知道Ryan使用TextMate。我正在使用SublimeText3。我怎样才能达到同样的效果?谢谢! 最佳答案 为SublimeText安装ERB包。假设您安装了SublimeText包管理器*,只需点击cmd+shift+P即可获得命令菜单,然后键入installpackage并选择PackageControl:InstallPackage获取包管理器菜单。在该菜单中,键入ERB并在看到包时选择

  3. ruby-on-rails - 迷你测试错误 : "NameError: uninitialized constant" - 2

    我遵循MichaelHartl的“RubyonRails教程:学习Web开发”,并创建了检查用户名和电子邮件长度有效性的测试(名称最多50个字符,电子邮件最多255个字符)。test/helpers/application_helper_test.rb的内容是:require'test_helper'classApplicationHelperTest在运行bundleexecraketest时,所有测试都通过了,但我看到以下消息在最后被标记为错误:ERROR["test_full_title_helper",ApplicationHelperTest,1.820016791]test

  4. ruby-on-rails - 如何在 Rails View 上显示错误消息? - 2

    我是rails的新手,想在form字段上应用验证。myviewsnew.html.erb.....模拟.rbclassSimulation{:in=>1..25,:message=>'Therowmustbebetween1and25'}end模拟Controller.rbclassSimulationsController我想检查模型类中row字段的整数范围,如果不在范围内则返回错误信息。我可以检查上面代码的范围,但无法返回错误消息提前致谢 最佳答案 关键是您使用的是模型表单,一种显示ActiveRecord模型实例属性的表单。c

  5. 使用 ACL 调用 upload_file 时出现 Ruby S3 "Access Denied"错误 - 2

    我正在尝试编写一个将文件上传到AWS并公开该文件的Ruby脚本。我做了以下事情:s3=Aws::S3::Resource.new(credentials:Aws::Credentials.new(KEY,SECRET),region:'us-west-2')obj=s3.bucket('stg-db').object('key')obj.upload_file(filename)这似乎工作正常,除了该文件不是公开可用的,而且我无法获得它的公共(public)URL。但是当我登录到S3时,我可以正常查看我的文件。为了使其公开可用,我将最后一行更改为obj.upload_file(file

  6. ruby-on-rails - 错误 : Error installing pg: ERROR: Failed to build gem native extension - 2

    我克隆了一个rails仓库,我现在正尝试捆绑安装背景:OSXElCapitanruby2.2.3p173(2015-08-18修订版51636)[x86_64-darwin15]rails-v在您的Gemfile中列出的或native可用的任何gem源中找不到gem'pg(>=0)ruby​​'。运行bundleinstall以安装缺少的gem。bundleinstallFetchinggemmetadatafromhttps://rubygems.org/............Fetchingversionmetadatafromhttps://rubygems.org/...Fe

  7. ruby - #之间? Cooper 的 *Beginning Ruby* 中的错误或异常 - 2

    在Cooper的书BeginningRuby中,第166页有一个我无法重现的示例。classSongincludeComparableattr_accessor:lengthdef(other)@lengthother.lengthenddefinitialize(song_name,length)@song_name=song_name@length=lengthendenda=Song.new('Rockaroundtheclock',143)b=Song.new('BohemianRhapsody',544)c=Song.new('MinuteWaltz',60)a.betwee

  8. ruby - 如何验证 IO.copy_stream 是否成功 - 2

    这里有一个很好的答案解释了如何在Ruby中下载文件而不将其加载到内存中:https://stackoverflow.com/a/29743394/4852737require'open-uri'download=open('http://example.com/image.png')IO.copy_stream(download,'~/image.png')我如何验证下载文件的IO.copy_stream调用是否真的成功——这意味着下载的文件与我打算下载的文件完全相同,而不是下载一半的损坏文件?documentation说IO.copy_stream返回它复制的字节数,但是当我还没有下

  9. ruby-on-rails - 每次我尝试部署时,我都会得到 - (gcloud.preview.app.deploy) 错误响应 : [4] DEADLINE_EXCEEDED - 2

    我是Google云的新手,我正在尝试对其进行首次部署。我的第一个部署是RubyonRails项目。我基本上是在关注thisguideinthegoogleclouddocumentation.唯一的区别是我使用的是我自己的项目,而不是他们提供的“helloworld”项目。这是我的app.yaml文件runtime:customvm:trueentrypoint:bundleexecrackup-p8080-Eproductionconfig.ruresources:cpu:0.5memory_gb:1.3disk_size_gb:10当我转到我的项目目录并运行gcloudprevie

  10. ruby-on-rails - Rails 5 Active Record 记录无效错误 - 2

    我有两个Rails模型,即Invoice和Invoice_details。一个Invoice_details属于Invoice,一个Invoice有多个Invoice_details。我无法使用accepts_nested_attributes_forinInvoice通过Invoice模型保存Invoice_details。我收到以下错误:(0.2ms)BEGIN(0.2ms)ROLLBACKCompleted422UnprocessableEntityin25ms(ActiveRecord:4.0ms)ActiveRecord::RecordInvalid(Validationfa

随机推荐