java - 并行流是否可以在不同的操作下正常工作？

coder 2023-08-28 原文

我正在阅读有关无状态的内容并在 doc 中遇到了这个:

Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful. A stateful lambda (or other object implementing the appropriate functional interface) is one whose result depends on any state which might change during the execution of the stream pipeline.

现在，如果我有一个字符串列表(比如 strList)，然后尝试使用以下方式使用并行流从中删除重复的字符串:

List<String> resultOne = strList.parallelStream().distinct().collect(Collectors.toList());

或者如果我们不区分大小写:

List<String> result2 = strList.parallelStream().map(String::toLowerCase)
                       .distinct().collect(Collectors.toList());

这段代码会不会有任何问题，因为并行流会拆分输入并且在一个 block 中不同并不一定意味着在整个输入中不同？

编辑(以下答案的快速总结)

distinct 是有状态操作，在有状态中间操作的情况下，并行流可能需要多次传递或大量缓冲开销。如果元素的排序不相关，distinct 也可以更有效地实现。同样根据 doc :

For ordered streams, the selection of distinct elements is stable (for duplicated elements, the element appearing first in the encounter order is preserved.) For unordered streams, no stability guarantees are made.

但是在并行运行的有序流的情况下，distinct 可能是不稳定的——这意味着它会在重复的情况下保留任意元素，而不一定是 distinct 中预期的第一个元素。

来自link :

Internally, the distinct() operation keeps a Set that contains elements that have been seen previously, but it’s buried inside the operation and we can’t get to it from application code.

因此在并行流的情况下，它可能会消耗整个流或可能会使用 CHM(例如 ConcurrentHashMap.newKeySet())。对于有序的，它很可能会使用 LinkedHashSet 或类似的结构。

最佳答案

大致指出了doc的相关部分(强调，我的):

Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements

Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream. As a result, under parallel computation, some pipelines containing stateful intermediate operations may require multiple passes on the data or may need to buffer significant data. Pipelines containing exclusively stateless intermediate operations can be processed in a single pass, whether sequential or parallel, with minimal data buffering

如果您进一步阅读(关于订购的部分):

Streams may or may not have a defined encounter order. Whether or not a stream has an encounter order depends on the source and the intermediate operations. Certain stream sources (such as List or arrays) are intrinsically ordered, whereas others (such as HashSet) are not. Some intermediate operations, such as sorted(), may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.unordered(). Further, some terminal operations may ignore encounter order, such as forEach().

...

For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Certain aggregate operations, such as filtering duplicates (distinct()) or grouped reductions (Collectors.groupingBy()) can be implemented more efficiently if ordering of elements is not relevant. Similarly, operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.

总而言之，

distinct 可以很好地处理并行流，但您可能已经知道，它必须在继续之前消耗掉整个流，这可能会占用大量内存。
如果项目的来源是无序集合(例如 hashset)或流是 unordered()，则 distinct 不担心对输出进行排序并且这样会很有效率

如果您不担心顺序并希望看到更多性能，解决方案是将 .unordered() 添加到流管道。

List<String> result2 = strList.parallelStream()
                              .unordered()
                              .map(String::toLowerCase)
                              .distinct()
                              .collect(Collectors.toList());

唉，在 Java 中没有(可用的内置)并发哈希集(除非他们对 ConcurrentHashMap 很聪明)，所以我只能留给你一个不幸的可能性，即 distinct 是使用常规 Java 集以阻塞方式实现的。在这种情况下，我看不到执行并行 distinct 有任何好处。

编辑:我说得太早了。使用具有不同的并行流可能会有一些好处。看起来 distinct 的实现比我最初想象的要聪明。参见 @Eugene's answer .

关于java - 并行流是否可以在不同的操作下正常工作？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53645037/

java 并行 code distinct operations java-8 java-stream

有关java - 并行流是否可以在不同的操作下正常工作？的更多相关文章

ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - 由于 "wkhtmltopdf"，PDFKIT 显然无法正常工作 - 2
我在从html页面生成PDF时遇到问题。我正在使用PDFkit。在安装它的过程中，我注意到我需要wkhtmltopdf。所以我也安装了它。我做了PDFkit的文档所说的一切......现在我在尝试加载PDF时遇到了这个错误。这里是错误:commandfailed:"/usr/local/bin/wkhtmltopdf""--margin-right""0.75in""--page-size""Letter""--margin-top""0.75in""--margin-bottom""0.75in""--encoding""UTF-8""--margin-left""0.75in""-
ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息？最佳答案在Rails3中，update_all不返回任何有意义的信息，除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
ruby-on-rails - 'compass watch' 是如何工作的/它是如何与 rails 一起使用的 - 2
我在我的项目目录中完成了compasscreate.和compassinitrails。几个问题:我已将我的.sass文件放在public/stylesheets中。这是放置它们的正确位置吗？当我运行compasswatch时，它不会自动编译这些.sass文件。我必须手动指定文件:compasswatchpublic/stylesheets/myfile.sass等。如何让它自动运行？文件ie.css、print.css和screen.css已放在stylesheets/compiled。如何在编译后不让它们重新出现的情况下删除它们？我自己编译的.sass文件编译成compiled/t
ruby - 使用 Vim Rails，您可以创建一个新的迁移文件并一次性打开它吗？ - 2
使用带有Rails插件的vim，您可以创建一个迁移文件，然后一次性打开该文件吗？textmate也可以这样吗？最佳答案你可以使用rails.vim然后做类似的事情::Rgeneratemigratonadd_foo_to_bar插件将打开迁移生成的文件，这正是您想要的。我不能代表textmate。关于ruby-使用VimRails，您可以创建一个新的迁移文件并一次性打开它吗？，我们在StackOverflow上找到一个类似的问题： https://sta
ruby - 我可以使用 Ruby 从 CSV 中删除列吗？ - 2
查看Ruby的CSV库的文档，我非常确定这是可能且简单的。我只需要使用Ruby删除CSV文件的前三列，但我没有成功运行它。最佳答案 csv_table=CSV.read(file_path_in,:headers=>true)csv_table.delete("header_name")csv_table.to_csv#=>ThenewCSVinstringformat检查CSV::Table文档:http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
ruby - 检查数组是否在增加 - 2
这个问题在这里已经有了答案:Checktoseeifanarrayisalreadysorted?(8个答案)关闭9年前。我只是想知道是否有办法检查数组是否在增加？这是我的解决方案，但我正在寻找更漂亮的方法:n=-1@arr.flatten.each{|e|returnfalseife
ruby - 无法让 RSpec 工作—— 'require' : cannot load such file - 2
我花了三天的时间用头撞墙，试图弄清楚为什么简单的“rake”不能通过我的规范文件。如果您遇到这种情况:任何文件夹路径中都不要有空格!。严重地。事实上，从现在开始，您命名的任何内容都没有空格。这是我的控制台输出:(在/Users/*****/Desktop/LearningRuby/learn_ruby)$rake/Users/*******/Desktop/LearningRuby/learn_ruby/00_hello/hello_spec.rb:116:in`require':cannotloadsuchfile--hello(LoadError) 最佳
ruby - 我可以使用 aws-sdk-ruby 在 AWS S3 上使用事务性文件删除/上传吗？ - 2
我发现ActiveRecord::Base.transaction在复杂方法中非常有效。我想知道是否可以在如下事务中从AWSS3上传/删除文件:S3Object.transactiondo#writeintofiles#raiseanexceptionend引发异常后，每个操作都应在S3上回滚。S3Object这可能吗？？最佳答案虽然S3API具有批量删除功能，但它不支持事务，因为每个删除操作都可以独立于其他操作成功/失败。该API不提供任何批量上传功能(通过PUT或POST)，因此每个上传操作都是通过一个独立的API调用完成的
java - 等价于 Java 中的 Ruby Hash - 2
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么？最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设？关于java-等价于Java中的RubyHash，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/22737685/

java - 并行流是否可以在不同的操作下正常工作？

编辑(以下答案的快速总结)

有关java - 并行流是否可以在不同的操作下正常工作？的更多相关文章

随机推荐