我正在尝试运行一个非常简单的作业来测试我的 hadoop 设置,所以我尝试了 Word Count Example ,它卡在了 0% ,所以我尝试了一些其他简单的作业,每个都卡住了
52191_0003/
14/07/14 23:55:51 INFO mapreduce.Job: Running job: job_1405376352191_0003
14/07/14 23:55:57 INFO mapreduce.Job: Job job_1405376352191_0003 running in uber mode : false
14/07/14 23:55:57 INFO mapreduce.Job: map 0% reduce 0%
I am using hadoop version- Hadoop 2.3.0-cdh5.0.2
我在 Google 上做了快速研究,发现增加了
yarn.scheduler.minimum-allocation-mb
yarn.nodemanager.resource.memory-mb
我有一个单节点集群,在我的双核 Macbook 和 8 GB Ram 上运行。
我的 yarn-site.xml 文件 -
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager.company.com</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
</property>
<property>
</property>
<name>yarn.log.aggregation.enable</name>
<value>true</value>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>8092</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx768m</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework.</description>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>4</value>
<description>The number of virtual cores required for each map task.</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>8092</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>jobtracker.alexjf.net:8021</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8092</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>2</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>10</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
我的 mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
只有 1 个属性。 尝试了几种排列和组合,但无法消除错误。
作业日志
23:55:55,694 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-07-14 23:55:55,697 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-07-14 23:55:55,699 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2014-07-14 23:55:55,769 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: 8092
2014-07-14 23:55:55,769 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.abhishekchoudhary
2014-07-14 23:55:55,775 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2014-07-14 23:55:55,777 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
2014-07-14 23:55:55,787 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1405376352191_0003Job Transitioned from INITED to SETUP
2014-07-14 23:55:55,789 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2014-07-14 23:55:55,800 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1405376352191_0003Job Transitioned from SETUP to RUNNING
2014-07-14 23:55:55,823 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000000 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,824 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000001 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,824 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000002 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,825 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000003 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,826 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,828 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:8092
2014-07-14 23:55:55,858 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1405376352191_0003, File: hdfs://localhost/tmp/hadoop-yarn/staging/abhishekchoudhary/.staging/job_1405376352191_0003/job_1405376352191_0003_1.jhist
2014-07-14 23:55:56,773 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-07-14 23:55:56,799 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1405376352191_0003: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
最佳答案
基于消息 Connecting to ResourceManager at/0.0.0.0:8030,
您确定您的 ResourceManager 应该位于 0.0.0.0:8030(默认值)吗?
如果不是,您应该将以下内容添加到您的 yarn-site.xml:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>MASTER ADDRESS</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8040</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
用主节点的地址替换MASTER ADDRESS。可以单独更改资源管理器的webapp、admin等地址。
关于Hadoop YARN 作业卡在 map 0% 和 reduce 0%,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24747427/
是否可以在所有delayed_job任务之前运行一个方法?基本上,我们试图确保每个运行delayed_job的服务器都有我们代码的最新实例,所以我们想运行一个方法来在每个作业运行之前检查它。(我们已经有了“check”方法并在别处使用它。问题只是关于如何从delayed_job中调用它。) 最佳答案 现在有一种官方方法可以通过插件来做到这一点。这篇博文通过示例清楚地描述了如何执行此操作http://www.salsify.com/blog/delayed-jobs-callbacks-and-hooks-in-rails(本文中描述
我需要从json记录中获取一些值并像下面这样提取curr_json_doc['title']['genre'].map{|s|s['name']}.join(',')但对于某些记录,curr_json_doc['title']['genre']可以为空。所以我想对map和join()使用try函数。我试过如下curr_json_doc['title']['genre'].try(:map,{|s|s['name']}).try(:join,(','))但是没用。 最佳答案 你没有正确传递block。block被传递给参数括号外的方法
Enumerable#each和Enumerable#map的区别在于返回的是接收者还是映射后的结果。回到接收者是微不足道的,你通常不需要在each之后继续一个方法链,比如each{...}.another_method(我可能没见过这样的案例。即使你想回到接收者那里,你也可以通过tap来实现)。所以我认为所有或者大部分使用Enumerable#each的情况都可以用Enumerable#map代替。我错了吗?如果我是对的,each的目的是什么?map是否比each慢?编辑:我知道当您对返回值不感兴趣时使用each是一种常见的做法。我对这种做法是否存在不感兴趣,但感兴趣的是,除了从
map遍历数组是否比each更快?两者有速度差异吗?mapresult=arr.map{|a|a+2}每个result=[]arr.eachdo|a|result.push(a+2)end 最佳答案 我认为是的。我试过这个测试require"benchmark"n=10000arr=Array.new(10000,1)Benchmark.bmdo|x|#Mapx.reportdon.timesdoresult=arr.map{|a|a+2}endend#Eachx.reportdon.timesdoresult=[]arr.each
我有一个bash脚本,它运行一个ruby脚本来获取我的Twitter提要。##/home/username/twittercron#!/bin/bashcd/home/username/twitterrubytwitter.rbfriends命令行运行成功/home/username/twittercron但是当我尝试将它作为cronjob运行时,它运行了但无法获取提要。##crontab-e*/15*****/home/username/twittercron脚本已经chmod+x。不知道为什么会这样。有什么想法吗? 最佳答案
我想念Ruby中的Hash方法来仅转换/映射散列值。h={1=>[9,2,3,4],2=>[6],3=>[5,7,1]}h.map_values{|v|v.size}#=>{1=>4,2=>1,3=>3}你如何在Ruby中归档它?更新:我正在寻找map_values()的实现。#moreexamplesh.map_values{|v|v.reduce(0,:+)}#=>{1=>18,2=>6,3=>13}h.map_values(&:min)#=>{1=>2,2=>6,3=>1} 最佳答案 Ruby2.4引入了方法Hash#tran
假设我有一个函数defodd_or_evennifn%2==0return:evenelsereturn:oddendend我有一个简单的可枚举数组simple=[1,2,3,4,5]然后我用我的函数在map中运行它,使用一个do-endblock:simple.mapdo|n|odd_or_even(n)end#=>[:odd,:even,:odd,:even,:odd]如果不首先定义函数,我怎么能做到这一点?例如,#doesnotworksimple.mapdo|n|ifn%2==0return:evenelsereturn:oddendend#Desiredresult:#=>[
我想获取一个数组并将其作为订单列表。目前我正在尝试以这种方式进行:r=["a","b","c"]r.each_with_index{|w,index|puts"#{index+1}.#{w}"}.map.to_a#1.a#2.b#3.c#=>["a","b","c"]输出应该是["1.a","2.b","3.c"]。如何让正确的输出成为r数组的新值? 最佳答案 a.to_enum.with_index(1).map{|element,index|"#{index}.#{element}"}或a.map.with_index(1){|
我正在尝试在RVM环境中运行10.5的旧PPC机器上运行一个简单的ruby脚本。在SO上搜索,我遵循了这个post中选择的答案.这是cron中的结果行:SHELL=/bin/bash00****BASH_ENV=~/.bash_profile&&/bin/bash-c'~/deggy/onlineGW.rb'此命令在用户sam的根目录下的Bash中运行良好。这是我脚本的重要部分:#!/usr/bin/envrubyrequire'open-uri'require'nokogiri'...这是cron的错误输出:X-Cron-Env:X-Cron-Env:X-Cron-Env:X-C
我看到有关未找到文件min.map的错误消息:GETjQuery'sjquery-1.10.2.min.mapistriggeringa404(NotFound)截图这是从哪里来的? 最佳答案 如果ChromeDevTools报告.map文件的404(可能是jquery-1.10.2.min.map、jquery.min.map或jquery-2.0.3.min.map,但任何事情都可能发生)首先要知道的是,这仅在使用DevTools时才会请求。您的用户不会遇到此404。现在您可以修复此问题或禁用sourcemap功能。修复:获取文