草庐IT

hadoop 警告 EBADF : Bad file descriptor

coder 2024-01-05 原文

我是 Hadoop 的新手,尝试使用 Hadoop 编写关系连接。该算法尝试在连续两轮中连接三个关系。我使用递归方法。该程序运行良好。但是在执行期间它会尝试打印这样的警告:

14/12/02 10:41:16 WARN io.ReadaheadPool: Failed readahead on ifile                                                                                                                  
EBADF: Bad file descriptor                                                                                                                                                          
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)                                                                                                
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263)                                                                                   
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142)                                                                  
        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)                                                                                      
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)                                                                                          
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)                                                                                          
        at java.lang.Thread.run(Thread.java:745)

这很烦人,我想知道问题的原因以及如何摆脱它们。我的代码如下:

public class Recursive {  
    /**
     * Join three relations together using recursive method
     * R JOIN S JOIN T = ((R JOIN S) JOIN T)
     */
    static String[] relationSequence;           // Keeps sequence of relations in join
    static int round;                           // Round number running
    /**
     * Mapper
     * Relation name = R
     * Input tuple   = a    b
     * Output pair   = (b, (R,a))
     * We assume that join value is the last attribute for the first relation
     * and the first attribute for the second relation.
     * So using this assumption, this map-reduce algorithm will work for any number of attributes  
     */
    public static class joinMapper extends Mapper<Object, Text, IntWritable, Text>{
        public void map(Object keyIn, Text valueIn, Context context) throws IOException, InterruptedException {
            // Read tuple and put attributes in a string array
            String curValue = valueIn.toString();
            String[] values = curValue.split("\t");
            // Get relation name from input file name
            String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
            // Get join attribute index number R join S
            int joinIndex;
            String others = "";
            if(fileName.compareTo(relationSequence[round])==0){
                joinIndex = 0;
                others = curValue.substring(0+2);
            }else{
                joinIndex = values.length - 1;
                others = curValue.substring(0, curValue.length()-2);
            }
            IntWritable joinValue = new IntWritable(Integer.parseInt(values[joinIndex]));

            // Create list of attributes which are not join attribute
            Text temp = new Text(fileName + "|" + others);
            context.write(joinValue,temp);
        }
    }

    /**
     * Reducer
     * 
     *  1. Divide the input list in two ArrayLists based on relation name:
     *      a. first relation
     *      b. second relation
     *  2. Test if the second relation is not empty. If it's so, we shouldn't continue.
     *  3. For each element of the first array list, join it with the all elements in
     *     the second array list
     */
    public static class joinReducer extends Reducer<IntWritable, Text, Text, Text>{
        public void reduce(IntWritable keyIn, Iterable<Text> valueIn, Context context)
                throws IOException, InterruptedException{
            ArrayList<String> firstRelation = new ArrayList<String>();
            ArrayList<String> secondRelation = new ArrayList<String>();
            for (Text value : valueIn) {
                String[] values = value.toString().split("\\|");
                if(values[0].compareTo(relationSequence[round])==0){
                    secondRelation.add(values[1]);
                }else{
                    firstRelation.add(values[1]);
                }
            }
            if(secondRelation.size()>0){
                for (String firstItem : firstRelation) {
                    for (String secondItem : secondRelation) {
                        context.write(new Text(firstItem.toString()), new Text(keyIn.toString() + "\t"
                                                                            + secondItem.toString() 
                                                                            ));
                    }
                }
            }
        }

    }

    /**
     * Partitioner
     * 
     * In order to hash pairs to reducer tasks, we used logical which is 
     * obviously faster than module function.
     */
    public static class joinPartitioner extends Partitioner<IntWritable, Text> {
        public int getPartition(IntWritable key, Text value, int numReduceTasks) {
                int partitionNumber = key.get()&0x007F;
                return partitionNumber;
            }
     }

     /**
      * Main method
      * 
      * (R join S join T)
      * hadoop jar ~/COMP6521.jar Recursive /input/R /input/S /input2/T /output R,S,T
      * 
      * @param args
      * <br> args[0]: first relation
      * <br> args[1]: second relation
      * <br> args[2]: third relation
      * <br> args[3]: output directory
      * <br> args[4]: relation sequence to join, separated by comma
      */
    public static void main(String[] args) throws IllegalArgumentException, IOException, InterruptedException, ClassNotFoundException {
        long s = System.currentTimeMillis();
        /****** Preparing problem variables *******/
        relationSequence = args[4].split(",");      // Keep sequence of relations
        round = 1;                                  // Variable to keep current round number
        int maxOfReducers = 128;                    // Maximum number of available reducers
        int noReducers;                             // Number of reducers for one particular job
        noReducers = maxOfReducers;

        Path firstRelation  = new Path(args[0]);
        Path secondRelation = new Path(args[1]);
        Path thirdRelation  = new Path(args[2]);
        Path temp           = new Path("/temp");    // Temporary path to keep intermediate result
        Path out            = new Path(args[3]);
        /****** End of variable Preparing   *******/

        Configuration conf = new Configuration();

        /****** Configuring first job *******/
//      General configuration
        Job job = Job.getInstance(conf, "Recursive multi-way join (first round)");
        job.setNumReduceTasks(noReducers);

//      Pass appropriate classes
        job.setJarByClass(Recursive.class);
        job.setMapperClass(joinMapper.class);
        job.setPartitionerClass(joinPartitioner.class);
        job.setReducerClass(joinReducer.class);

//      Specify input and output type of reducers
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);
        FileSystem fs = FileSystem.get(conf);
        if(fs.exists(temp)){ fs.delete(temp, true);}
        if(fs.exists(out)) { fs.delete(out, true); }

//      Specify the input and output paths  
        FileInputFormat.addInputPath(job, firstRelation);
        FileInputFormat.addInputPath(job, secondRelation);
        FileOutputFormat.setOutputPath(job, temp);
        /****** End of first job configuration *******/
        job.submit();
//      Running the first job
        boolean b = job.waitForCompletion(true);
        if(b){
//          try to execute the second job after completion of the first one
            round++;                    // Specify round number
            Configuration conf2 = new Configuration();  // Create new configuration object

            /****** Configuring second job *******/
//          General configuration
            Job job2 = Job.getInstance(conf2, "Reduce multi-way join (second round)");
            job2.setNumReduceTasks(noReducers);

//          Pass appropriate classes
            job2.setJarByClass(Recursive.class);
            job2.setMapperClass(joinMapper.class);
            job2.setPartitionerClass(joinPartitioner.class);
            job2.setReducerClass(joinReducer.class);

//          Specify input and output type of reducers
            job2.setOutputKeyClass(Text.class);
            job2.setOutputValueClass(Text.class);

//          Specify input and output type of mappers
            job2.setMapOutputKeyClass(IntWritable.class);
            job2.setMapOutputValueClass(Text.class);
//          End of 2014-11-25
//          Specify the input and output paths  
            FileInputFormat.addInputPath(job2, temp);
            FileInputFormat.addInputPath(job2, thirdRelation);
            FileOutputFormat.setOutputPath(job2, out);
            /****** End of second job configuration *******/
            job2.submit();
//          Running the first job
            b = job2.waitForCompletion(true);

//          Output time measurement
            long e = System.currentTimeMillis() - s;
            System.out.println("Total: " + e);
            System.exit(b ? 0 : 1);
        }
        System.exit(1);
    }

}

最佳答案

我有一个类似的错误,我结束了你的问题,这个 mail list thread EBADF: Bad file descriptor

To clarify a little bit, the readahead pool can sometimes spit out this message if you close a file while a readahead request is in flight. It's not an error and just reflects the fact that the file was closed hastily, probably because of some other bug which is the real problem.

在我的例子中,我关闭了一个 writer 而没有用 hflush

刷新它

由于您似乎没有手动使用 writer 或 reader,我可能会看看您是如何发送 mr 任务的。

关于hadoop 警告 EBADF : Bad file descriptor,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27253616/

有关hadoop 警告 EBADF : Bad file descriptor的更多相关文章

  1. ruby - 在院子里用@param 标签警告 - 2

    我试图使用yard记录一些Ruby代码,尽管我所做的正是所描述的here或here#@param[Integer]thenumberoftrials(>=0)#@param[Float]successprobabilityineachtrialdefinitialize(n,p)#initialize...end虽然我仍然得到这个奇怪的错误@paramtaghasunknownparametername:the@paramtaghasunknownparametername:success然后生成的html看起来很奇怪。我称yard为:$yarddoc-mmarkdown我做错了什么?

  2. ruby-on-rails - active_admin 目录中的常量警告重新声明 - 2

    我正在使用active_admin,我在Rails3应用程序的应用程序中有一个目录管理,其中包含模型和页面的声明。时不时地我也有一个类,当那个类有一个常量时,就像这样:classFooBAR="bar"end然后,我在每个必须在我的Rails应用程序中重新加载一些代码的请求中收到此警告:/Users/pupeno/helloworld/app/admin/billing.rb:12:warning:alreadyinitializedconstantBAR知道发生了什么以及如何避免这些警告吗? 最佳答案 在纯Ruby中:classA

  3. ruby-on-rails - 启动 Rails 服务器时 ImageMagick 的警告 - 2

    最近,当我启动我的Rails服务器时,我收到了一长串警告。虽然它不影响我的应用程序,但我想知道如何解决这些警告。我的估计是imagemagick以某种方式被调用了两次?当我在警告前后检查我的git日志时。我想知道如何解决这个问题。-bcrypt-ruby(3.1.2)-better_errors(1.0.1)+bcrypt(3.1.7)+bcrypt-ruby(3.1.5)-bcrypt(>=3.1.3)+better_errors(1.1.0)bcrypt和imagemagick有关系吗?/Users/rbchris/.rbenv/versions/2.0.0-p247/lib/ru

  4. ruby-on-rails - 我更新了 ruby​​ gems,现在到处都收到解析树错误和弃用警告! - 2

    简而言之错误:NOTE:Gem::SourceIndex#add_specisdeprecated,useSpecification.add_spec.Itwillberemovedonorafter2011-11-01.Gem::SourceIndex#add_speccalledfrom/opt/local/lib/ruby/site_ruby/1.8/rubygems/source_index.rb:91./opt/local/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/rails/gem_dependency.rb:275:in`==':und

  5. hadoop安装之保姆级教程(二)之YARN的配置 - 2

    1.1.1 YARN的介绍 为克服Hadoop1.0中HDFS和MapReduce存在的各种问题⽽提出的,针对Hadoop1.0中的MapReduce在扩展性和多框架⽀持⽅⾯的不⾜,提出了全新的资源管理框架YARN. ApacheYARN(YetanotherResourceNegotiator的缩写)是Hadoop集群的资源管理系统,负责为计算程序提供服务器计算资源,相当于⼀个分布式的操作系统平台,⽽MapReduce等计算程序则相当于运⾏于操作系统之上的应⽤程序。 YARN被引⼊Hadoop2,最初是为了改善MapReduce的实现,但是因为具有⾜够的通⽤性,同样可以⽀持其他的分布式计算模

  6. ruby - 警告 : PATH set to RVM ruby but GEM_HOME and/or GEM_PATH not set, 请参阅 : https://github. com/wayneeseguin/rvm/issues/3212 - 2

    我每次打开终端时都会收到这个错误:警告:PATH设置为RVMruby​​但未设置GEM_HOME和/或GEM_PATH,请参阅:https://github.com/wayneeseguin/rvm/issues/3212这是在我最近安装zsh(oh-my-zsh)后开始发生的我不知道如何设置GEM_HOME和/或GEM_PATH的路径。 最佳答案 我也面临同样的问题,更改.zshrc中的以下行,exportPATH="/usr/local/heroku/bin:.........."到exportPATH="$PATH:/usr/

  7. ruby - 有没有办法让 2.4.0 中的 Ruby 弃用警告静音? - 2

    从Ruby2.4.0开始,对于使用某些已弃用的功能,会出现弃用警告。例如,Bignum、Fixnum、TRUE和FALSE都会触发弃用警告。当我修复我的代码时,有相当多的代码我希望它保持沉默,尤其是在Rails中。我该怎么做? 最佳答案 moduleKerneldefsuppress_warningsoriginal_verbosity=$VERBOSE$VERBOSE=nilresult=yield$VERBOSE=original_verbosityreturnresultendend>>X=:foo=>:foo>>X=:bar

  8. ruby - 在 Ruby 中,如何在加载 YAML 文档时警告散列中的重复键? - 2

    在下面的Ruby示例中,是否有一种模式可以让YAMLNOT静默忽略重复键“one”?irb(main):001:0>require'yaml'=>trueirb(main):002:0>str='{one:1,one:2}'=>"{one:1,one:2}"irb(main):003:0>YAML.load(str)=>{"one"=>2}谢谢! 最佳答案 使用Psych,您可以遍历AST树以查找重复键。我在我的测试套件中使用以下辅助方法来验证我的i18n翻译中没有重复键:defduplicate_keys(file_or_cont

  9. ruby-on-rails - 如何在 Rails 中启用 Ruby 警告? - 2

    我在test.rb中做了这个:defsome_methodp"Firstdefinition"enddefsome_methodp"Seconddefinition"endsome_method当我调用rubytest.rb时,它打印出Seconddefinition(预期)当我调用ruby-wtest.rb时,它会打印Seconddefinition(预期)并打印警告test。rb:5:警告:方法重新定义;丢弃旧的some_method有没有办法在Rails中启用这些警告?(并将警告打印到控制台/日志文件)为什么我要启用警告:例如,如果我无意中重新定义Controller中的一个方法

  10. 大数据之Hadoop数据仓库Hive - 2

    目录:一、简介二、HQL的执行流程三、索引四、索引案例五、Hive常用DDL操作六、Hive常用DML操作七、查询结果插入到表八、更新和删除操作九、查询结果写出到文件系统十、HiveCLI和Beeline命令行的基本使用十一、Hive配置一、简介Hive是一个构建在Hadoop之上的数据仓库,它可以将结构化的数据文件映射成表,并提供类SQL查询功能,用于查询的SQL语句会被转化为MapReduce作业,然后提交到Hadoop上运行。特点:简单、容易上手(提供了类似sql的查询语言hql),使得精通sql但是不了解Java编程的人也能很好地进行大数据分析;灵活性高,可以自定义用户函数(UDF)和

随机推荐