java - Hadoop 集群卡住卡在 Reduce > copy >

coder 2024-01-09 原文

到目前为止，对于这个问题，我已经尝试了这里的解决方案，1 ，在这里，2 .然而，虽然这些解决方案确实导致执行 mapreduce 任务，但看起来它们只在名称节点上运行，因为我得到类似于此处的输出，3。 .

基本上，我正在使用我自己设计的 mapreduce 算法运行一个2 节点集群。 mapreduce jar 在单节点集群上完美执行，这让我觉得我的 hadoop 多节点配置有问题。要设置多节点，我遵循了教程 here .

为了报告出了什么问题，当我执行我的程序时(在检查名称节点、任务跟踪器、作业跟踪器和数据节点正在各自的节点上运行之后)，我的程序在终端中的这一行停止:

INFO mapred.JobClient:map 100% reduce 0%

如果我查看任务日志，我看到copy failed: attempt... from slave-node 后跟SocketTimeoutException.

查看我的从节点 (DataNode) 上的日志显示执行在以下行停止:

TaskTracker:尝试...减少 0.0% > 复制 >

正如链接 1 和 2 中的解决方案所建议的那样，从 etc/hosts 文件中删除各种 ip 地址会导致成功执行，但是我最终得到了诸如在我的slave-node (DataNode) log 的链接 4 中，例如:

信息 org.apache.hadoop.mapred.TaskTracker:收到“KillJobAction” 工作:job_201201301055_0381

警告 org.apache.hadoop.mapred.TaskTracker:未知作业 job_201201301055_0381 被删除。

作为一个新的 hadoop 用户，这看起来对我来说很可疑，但看到这个可能是完全正常的。对我来说，这看起来好像有什么东西指向主机文件中不正确的 ip 地址，并且通过删除这个 ip 地址，我只是停止在从节点上执行，并且处理改为在名称节点上继续(这根本没有什么好处)。

总结:

这是预期的输出吗？
有什么方法可以查看执行后在哪个节点上执行了什么？
有人能发现我做错了什么吗？

编辑为每个节点添加主机和配置文件

主人:etc/hosts

127.0.0.1       localhost
127.0.1.1       joseph-Dell-System-XPS-L702X

#The following lines are for hadoop master/slave setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

从机:etc/hosts

127.0.0.1       localhost
127.0.1.1       joseph-Home # this line was incorrect, it was set as 7.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

掌握:core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hduser/tmp</value>
    <description>A base for other temporary directories.</description>
</property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:54310</value>
        <description>The name of the default file system. A URI whose
        scheme and authority determine the FileSystem implementation. The
        uri’s scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class. The uri’s authority is used to
        determine the host, port, etc. for a filesystem.</description>
    </property>
</configuration>

从站:core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hduser/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:54310</value>
        <description>The name of the default file system. A URI whose
        scheme and authority determine the FileSystem implementation. The
        uri’s scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class. The uri’s authority is used to
        determine the host, port, etc. for a filesystem.</description>
    </property>

</configuration>

母版:hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>
</configuration>

从站:hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>
</configuration>

掌握:mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at. If “local”, then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>
</configuration>

从站:mapre-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>mapred.job.tracker</name>
        <value>master:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at. If “local”, then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>

</configuration>

最佳答案

错误在 etc/hosts:

在错误运行期间，slave etc/hosts 文件如下所示:

127.0.0.1       localhost
7.0.1.1       joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

您可能已经发现，这台计算机“joseph-Home”的 IP 地址配置不正确。当它应该设置为 127.0.1.1 时，它被设置为 7.0.1.1。因此，将 slave etc/hosts 文件的第 2 行更改为 127.0.1.1 joseph-Home 解决了这个问题，并且我的日志在从属节点上正常显示。

新的 etc/hosts 文件:

127.0.0.1       localhost
127.0.1.1       joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

关于java - Hadoop 集群卡住卡在 Reduce > copy >，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18634825/

卡住 amp strong gt lt java apache hadoop

有关java - Hadoop 集群卡住卡在 Reduce > copy >的更多相关文章

ruby-on-rails - 如何从 format.xml 中删除 <hash></hash> - 2
我有一个对象has_many应呈现为xml的子对象。这不是问题。我的问题是我创建了一个Hash包含此数据，就像解析器需要它一样。但是rails自动将整个文件包含在.........我需要摆脱type="array"和我该如何处理？我没有在文档中找到任何内容。最佳答案我遇到了同样的问题；这是我的XML:我在用这个:entries.to_xml将散列数据转换为XML，但这会将条目的数据包装到中所以我修改了:entries.to_xml(root:"Contacts")但这仍然将转换后的XML包装在“联系人”中，将我的XML代码修改为
java - 等价于 Java 中的 Ruby Hash - 2
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么？最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设？关于java-等价于Java中的RubyHash，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/22737685/
ruby-on-rails - rspec should have_select ('cars' , :options => ['volvo' , 'saab' ] 不工作 - 2
关闭。这个问题需要detailsorclarity.它目前不接受答案。想改进这个问题吗？通过editingthispost添加细节并澄清问题.关闭8年前。Improvethisquestion在首页我有:汽车:VolvoSaabMercedesAudistatic_pages_spec.rb中的测试代码:it"shouldhavetherightselect"dovisithome_pathit{shouldhave_select('cars',:options=>['volvo','saab','mercedes','audi'])}end响应是rspec./spec/request
ruby - 如何验证 IO.copy_stream 是否成功 - 2
这里有一个很好的答案解释了如何在Ruby中下载文件而不将其加载到内存中:https://stackoverflow.com/a/29743394/4852737require'open-uri'download=open('http://example.com/image.png')IO.copy_stream(download,'~/image.png')我如何验证下载文件的IO.copy_stream调用是否真的成功——这意味着下载的文件与我打算下载的文件完全相同，而不是下载一半的损坏文件？documentation说IO.copy_stream返回它复制的字节数，但是当我还没有下
ruby-on-rails - Nokogiri:使用 XPath 搜索 <div> - 2
我使用Nokogiri(Rubygem)css搜索寻找某些在我的html里面。看起来Nokogiri的css搜索不喜欢正则表达式。我想切换到Nokogiri的xpath搜索，因为这似乎支持搜索字符串中的正则表达式。如何在xpath搜索中实现下面提到的(伪)css搜索？require'rubygems'require'nokogiri'value=Nokogiri::HTML.parse(ABBlaCD3"HTML_END#my_blockisgivenmy_bl="1"#my_eqcorrespondstothisregexmy_eq="\/[0-9]+\/"#FIXMEThefoll
java - 从 JRuby 调用 Java 类的问题 - 2
我正在尝试使用boilerpipe来自JRuby。我看过guide从JRuby调用Java，并成功地将它与另一个Java包一起使用，但无法弄清楚为什么同样的东西不能用于boilerpipe。我正在尝试基本上从JRuby中执行与此Java等效的操作:URLurl=newURL("http://www.example.com/some-location/index.html");Stringtext=ArticleExtractor.INSTANCE.getText(url);在JRuby中试过这个:require'java'url=java.net.URL.new("http://www
java - 我的模型类或其他类中应该有逻辑吗 - 2
我只想对我一直在思考的这个问题有其他意见，例如我有classuser_controller和classuserclassUserattr_accessor:name,:usernameendclassUserController//dosomethingaboutanythingaboutusersend问题是我的User类中是否应该有逻辑user=User.newuser.do_something(user1)oritshouldbeuser_controller=UserController.newuser_controller.do_something(user1,user2)我
java - 什么相当于 ruby 的 rack 或 python 的 Java wsgi？ - 2
什么是ruby的rack或python的Java的wsgi？还有一个路由库。最佳答案来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht
Observability：从零开始创建 Java 微服务并监控它（二） - 2
这篇文章是继上一篇文章“Observability：从零开始创建Java微服务并监控它（一）”的续篇。在上一篇文章中，我们讲述了如何创建一个Javaweb应用，并使用Filebeat来收集应用所生成的日志。在今天的文章中，我来详述如何收集应用的指标，使用APM来监控应用并监督web服务的在线情况。源码可以在地址 https://github.com/liu-xiao-guo/java_observability 进行下载。摄入指标指标被视为可以随时更改的时间点值。当前请求的数量可以改变任何毫秒。你可能有1000个请求的峰值，然后一切都回到一个请求。这也意味着这些指标可能不准确，你还想提取最小/
【Java 面试合集】HashMap中为什么引入红黑树，而不是AVL树呢 - 2
HashMap中为什么引入红黑树，而不是AVL树呢1.概述开始学习这个知识点之前我们需要知道，在JDK1.8以及之前，针对HashMap有什么不同。JDK1.7的时候，HashMap的底层实现是数组+链表JDK1.8的时候，HashMap的底层实现是数组+链表+红黑树我们要思考一个问题，为什么要从链表转为红黑树呢。首先先让我们了解下链表有什么不好？？？2.链表上述的截图其实就是链表的结构，我们来看下链表的增删改查的时间复杂度增：因为链表不是线性结构，所以每次添加的时候，只需要移动一个节点，所以可以理解为复杂度是N(1)删：算法时间复杂度跟增保持一致查：既然是非线性结构，所以查询某一个节点的时候

java - Hadoop 集群卡住卡在 Reduce > copy >

编辑为每个节点添加主机和配置文件

有关java - Hadoop 集群卡住卡在 Reduce > copy >的更多相关文章

随机推荐