hadoop - getCacheFiles() 和 getLocalCacheFiles() 是一样的吗？

coder 2024-01-06 原文

作为getLocalCacheFiles()已弃用，我正在尝试寻找替代方案。 getCacheFiles()似乎是一个，但我怀疑它们是否相同。

当您调用 addCacheFile() 时, HDFS 中的文件将被下载到每个节点，使用 getLocalCacheFiles() 你可以获得 localized 文件路径，你可以从本地文件系统读取它。但是，getCacheFiles() 返回的是文件在 HDFS 中的 URI。如果你通过这个 URI 读取文件，我怀疑你仍然从 HDFS 而不是本地文件系统读取。

以上是我的理解，不知道对不对。如果是这样，getLocalCacheFiles() 的替代方法是什么？为什么 Hadoop 首先弃用它？

最佳答案

它是开源的。你总能找到引入 @Deprectated 的 git blame:commit 735b50e8bd23f7fbeff3a08cf8f3fff8cbff7449 , 这是 MAPREDUCE-4493 .在 JIRA 的尾部，您会发现这个讨论:

Omkar Vinit Joshi added a comment - 13/Jul/13 00:18
Robert Joseph Evans if we are deprecating getLocalCacheFiles and getCacheFiles in jobContext() then how the user is going to get local cached files in map task? YARN-916 is the related issue.. Thanks.

Robert Joseph Evans added a comment - 19/Jul/13 15:27
Omkar Vinit Joshi By opening the symbolic link in the current working directory. Prior to YARN the default behavior was to not create symlinks in the current working directory pointing to the items in the distributed cache. If you wanted links you had to specifically turn that option on and provide the name of the symlink you wanted. The only way to get to files without symlinks was to call getLocalCacheFiles and getCacheFiles. In YARN all files will have a symlink created. The name of the file/directory will be the name of the symlink. However, it is possible to have a name collision where I wanted hdfs://foo/bar.zip and hdfs://bar/bar.zip. In 1.0 both of these would have been downloaded and accessible through the deprecated APIs, but in YARN a warning will be output and only one of them will be downloaded. Also because of the way these APIs were written the mapper code may not know that only one of them was downloaded and will not be able to find the missing one and blow up. That is why I deprecated them in favor of nudging people to always use the symlinks so the behavior is always consistent.

Omkar Vinit Joshi added a comment - 19/Jul/13 16:56
Robert Joseph Evans sounds good.. however by this we will be putting limitation based on file name ..but that sounds reasonable considering the fact that this will stop potential bugs in map code and users can definitely version them to avoid it... Thanks...

所以你应该只打开文件，它就会在那里。没有专用的 API。

关于hadoop - getCacheFiles() 和 getLocalCacheFiles() 是一样的吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26492964/

有关hadoop - getCacheFiles() 和 getLocalCacheFiles() 是一样的吗？的更多相关文章

hadoop安装之保姆级教程（二）之YARN的配置 - 2
1.1.1 YARN的介绍为克服Hadoop1.0中HDFS和MapReduce存在的各种问题⽽提出的，针对Hadoop1.0中的MapReduce在扩展性和多框架⽀持⽅⾯的不⾜，提出了全新的资源管理框架YARN. ApacheYARN（YetanotherResourceNegotiator的缩写）是Hadoop集群的资源管理系统，负责为计算程序提供服务器计算资源，相当于⼀个分布式的操作系统平台，⽽MapReduce等计算程序则相当于运⾏于操作系统之上的应⽤程序。 YARN被引⼊Hadoop2,最初是为了改善MapReduce的实现，但是因为具有⾜够的通⽤性，同样可以⽀持其他的分布式计算模
ruby - 当前的 Ruby 方法是通过 super 调用的吗？ - 2
在运行时的方法中，有没有办法知道该方法是否已通过子类中的super调用？例如moduleSuperDetectordefvia_super?#whatgoeshere?endendclassFooincludeSuperDetectordefbarvia_super??'super!':'nothingspecial'endendclassFu"nothingspecial"Fu.new.bar#=>"super!"我如何编写via_super?，或者，如果需要，如何编写via_super?(:bar)？最佳答案可能有更好的方法
ruby - 可以像在 C# 中使用#region 一样在 Ruby 中使用 begin/end 吗？ - 2
我最近从C#转向了Ruby，我发现自己无法制作可折叠的标记代码区域。我只是想到做这种事情应该没问题:classExamplebegin#agroupofmethodsdefmethod1..enddefmethod2..endenddefmethod3..endend...但是这样做真的可以吗？method1和method2最终与method3是同一种东西吗？还是有一些我还没有见过的用于执行此操作的Ruby惯用语？最佳答案正如其他人所说，这不会改变方法定义。但是，如果要标记方法组，为什么不使用Ruby语义来标记它们呢？您可以使用
ruby - Ruby 导入的方法总是私有(private)的吗？ - 2
最好用一个例子来解释:文件1.rb:deffooputs123end文件2.rb:classArequire'file1'endA.new.foo将给出错误“':调用了私有(private)方法'foo'”。我可以通过执行A.new.send("foo")来解决这个问题，但是有没有办法公开导入的方法？编辑:澄清一下，我没有混淆include和require。另外，我不能使用正常包含的原因(正如许多人正确指出的那样)是因为这是元编程设置的一部分。我需要允许用户在运行时添加功能；例如，他可以说“run-this-app--includefile1.rb”，应用程序的行为将根据他在file1
java - Java 中的 "caller"和 Ruby 中的 "receiver"一样吗？ - 2
如果我说x.hello()在Java中，对象x正在“调用”它包含的方法。在Ruby中，对象x正在“接收”它包含的方法。这只是表达相同想法的不同术语，还是意识形态上的根本差异？来自Java，我发现Ruby的“接收器”想法非常令人困惑。也许有人可以解释这与Java的关系？最佳答案在您的示例中，x不调用hello()。包含该片段的任何对象都是“调用”(即，它是“调用者”)。在Java中，x可以称为接收者；它正在接收对hello()方法的调用。关于java-Java中的"caller"和R
Ruby AWS::S3::S3Object (aws-sdk):是否有与 aws-s3 一样的流式数据方法？ - 2
在aws-s3中，有一种方法(AWS::S3::S3Object.stream)可让您将S3上的文件流式传输到本地文件。我无法在aws-sdk中找到类似的方法。即在aws-s3中，我这样做:File.open(to_file,"wb")do|file|AWS::S3::S3Object.stream(key,region)do|chunk|file.writechunkendendAWS::S3:S3Object.read方法确实将block作为参数，但似乎没有对其执行任何操作。最佳答案 aws-sdkgem现在支持S3中对象的分
大数据之Hadoop数据仓库Hive - 2
目录：一、简介二、HQL的执行流程三、索引四、索引案例五、Hive常用DDL操作六、Hive常用DML操作七、查询结果插入到表八、更新和删除操作九、查询结果写出到文件系统十、HiveCLI和Beeline命令行的基本使用十一、Hive配置一、简介Hive是一个构建在Hadoop之上的数据仓库，它可以将结构化的数据文件映射成表，并提供类SQL查询功能，用于查询的SQL语句会被转化为MapReduce作业，然后提交到Hadoop上运行。特点：简单、容易上手(提供了类似sql的查询语言hql)，使得精通sql但是不了解Java编程的人也能很好地进行大数据分析；灵活性高，可以自定义用户函数(UDF)和
c# - Ruby 是否像 C# 一样具有 Skip(n)？ - 2
在C#中你可以这样做:varlist=newList(){1,2,3,4,5};list.skip(2).take(2);//returns(3,4)我正在尝试学习所有Ruby可枚举方法，但我没有看到skip(n)的等效方法a=[1,2,3,4,5]a.skip(2).take(2)#takeexists,skipdoesn't那么，“最好的”Ruby方法是什么？所有这些都有效，但它们非常丑陋。a.last(a.length-2).take(2)(a-a.first(2)).take(2)a[2...a.length].take(2) 最佳答案
ruby-on-rails - 我如何解析一个 Excel 文件，它会给我提供与视觉上完全一样的数据？ - 2
我正在使用Rails5(Ruby2.4)。我想阅读.xls文档，我想将数据转换为CSV格式，就像它出现在Excel文件中一样。有人推荐我使用Roo，所以我有book=Roo::Spreadsheet.open(file_location)sheet=book.sheet(0)text=sheet.to_csvarr_of_arrs=CSV.parse(text)但是，返回的内容与我在电子表格中看到的内容不同。例如，电子表格中的一个单元格有16:45.81当我从上面获取CSV数据时，返回的是"0.011641319444444444"如何解析Excel文档并准确获取我所看到的内容？我不在
ruby - 像 Smalltalk 一样浏览 Ruby 代码？ - 2
与Smalltalk类层次结构浏览器最接近的等效项是什么？我见过一些解决方法，例如this,但它似乎不可编写脚本。最佳答案确实没有，至少没有包含静态和动态行为的类似Smalltalk的UI。Eclipse和IntelliJ都具有一定的结构洞察力。Eclipse有一种类似于浏览器的View。两者最大的问题是，除非您正在处理实时对象(例如，调试)，否则您不一定知道对象的所有行为，因为有些行为是在运行时定义的。没有图像或部分运行时的静态View无法提供完整的图片。IntelliJ在解决问题方面做得不错。例如，具有attr_access

hadoop - getCacheFiles() 和 getLocalCacheFiles() 是一样的吗？

有关hadoop - getCacheFiles() 和 getLocalCacheFiles() 是一样的吗？的更多相关文章

随机推荐