python - 条件生成器表达式的意外行为

coder 2023-05-21 原文

我正在运行一段代码，该代码意外地在程序的某个部分出现逻辑错误。在调查该部分时，我创建了一个测试文件来测试正在运行的语句集，并发现了一个看起来很奇怪的异常错误。

我测试了这个简单的代码:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original to something else

print(list(f)) # Outputs filtered

输出是:

>>> []

是的，没什么。我希望过滤器理解能够获取数组中计数为 2 的项目并输出它，但我没有得到:

# Expected output
>>> [2, 2]

当我注释掉第三行再次测试时:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
### array = [5, 6, 1, 2, 9] # Ignore line

print(list(f)) # Outputs filtered

输出是正确的(你可以自己测试一下):

>>> [2, 2]

有一次我输出了变量f的类型:

array = [1, 2, 2, 4, 5] # Original array
f = (x for x in array if array.count(x) == 2) # Filters original
array = [5, 6, 1, 2, 9] # Updates original

print(type(f))
print(list(f)) # Outputs filtered

我得到了:

>>> <class 'generator'>
>>> []

为什么在 Python 中更新列表会改变另一个生成器变量的输出？这对我来说似乎很奇怪。

最佳答案

Python 的生成器表达式是后期绑定(bind)(参见 PEP 289 -- Generator Expressions)(其他答案称为“懒惰”):

Early Binding versus Late Binding

After much discussion, it was decided that the first (outermost) for-expression [of the generator expression] should be evaluated immediately and that the remaining expressions be evaluated when the generator is executed.

[...] Python takes a late binding approach to lambda expressions and has no precedent for automatic, early binding. It was felt that introducing a new paradigm would unnecessarily introduce complexity.

After exploring many possibilities, a consensus emerged that binding issues were hard to understand and that users should be strongly encouraged to use generator expressions inside functions that consume their arguments immediately. For more complex applications, full generator definitions are always superior in terms of being obvious about scope, lifetime, and binding.

这意味着它仅在创建生成器表达式时评估最外层的 for。所以它实际上 绑定(bind) 名称为 array 的值在“子表达式” in array 中(实际上它绑定(bind)了等价于 iter (数组) 此时)。但是，当您遍历生成器时，if array.count 调用实际上是指当前名为 array 的内容。

因为它实际上是一个 list 而不是 array 我在答案的其余部分更改了变量名称以更准确。

在您的第一种情况下，您迭代的 list 和您计算的 list 将是不同的。就好像你用过:

list1 = [1, 2, 2, 4, 5]
list2 = [5, 6, 1, 2, 9]
f = (x for x in list1 if list2.count(x) == 2)

因此，您检查 list1 中的每个元素是否在 list2 中的计数为 2。

您可以通过修改第二个列表轻松验证这一点:

>>> lst = [1, 2, 2]
>>> f = (x for x in lst if lst.count(x) == 2)
>>> lst = [1, 1, 2]
>>> list(f)
[1]

如果它遍历第一个列表并计入第一个列表，它将返回 [2, 2](因为第一个列表包含两个 2)。如果它迭代并计入第二个列表，则输出应为 [1, 1]。但由于它遍历第一个列表(包含一个 1)但检查第二个列表(包含两个 1)，因此输出只是一个 1。

使用生成器函数的解决方案

有几种可能的解决方案，如果不立即迭代，我通常不喜欢使用“生成器表达式”。一个简单的生成器函数就足以让它正常工作:

def keep_only_duplicated_items(lst):
    for item in lst:
        if lst.count(item) == 2:
            yield item

然后像这样使用它:

lst = [1, 2, 2, 4, 5]
f = keep_only_duplicated_items(lst)
lst = [5, 6, 1, 2, 9]

>>> list(f)
[2, 2]

请注意，PEP(请参阅上面的链接)还指出，对于更复杂的事情，最好使用完整的生成器定义。

使用带有计数器的生成器函数的更好解决方案

更好的解决方案(避免二次运行时行为，因为您为数组中的每个元素迭代整个数组)是计算(collections.Counter)元素一次，然后在恒定时间内进行查找(导致线性时间):

from collections import Counter

def keep_only_duplicated_items(lst):
    cnts = Counter(lst)
    for item in lst:
        if cnts[item] == 2:
            yield item

附录:使用子类“可视化”发生了什么以及何时发生

创建一个在调用特定方法时打印的 list 子类非常容易，因此可以验证它是否真的像那样工作。

在这种情况下，我只是重写方法 __iter__ 和 count 因为我对生成器表达式迭代哪个列表以及它在哪个列表中计数感兴趣。方法体实际上只是委托(delegate)给父类(super class)并打印一些东西(因为它使用 super 没有参数和 f-strings 它需要 Python 3.6，但它应该很容易适应其他 Python 版本):

class MyList(list):
    def __iter__(self):
        print(f'__iter__() called on {self!r}')
        return super().__iter__()
        
    def count(self, item):
        cnt = super().count(item)
        print(f'count({item!r}) called on {self!r}, result: {cnt}')
        return cnt

这是一个简单的子类，仅在调用 __iter__ 和 count 方法时打印:

>>> lst = MyList([1, 2, 2, 4, 5])

>>> f = (x for x in lst if lst.count(x) == 2)
__iter__() called on [1, 2, 2, 4, 5]

>>> lst = MyList([5, 6, 1, 2, 9])

>>> print(list(f))
count(1) called on [5, 6, 1, 2, 9], result: 1
count(2) called on [5, 6, 1, 2, 9], result: 1
count(2) called on [5, 6, 1, 2, 9], result: 1
count(4) called on [5, 6, 1, 2, 9], result: 0
count(5) called on [5, 6, 1, 2, 9], result: 1
[]

关于python - 条件生成器表达式的意外行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54245618/

有关python - 条件生成器表达式的意外行为的更多相关文章

ruby - 使用 RubyZip 生成 ZIP 文件时设置压缩级别 - 2
我有一个Ruby程序，它使用rubyzip压缩XML文件的目录树。gem。我的问题是文件开始变得很重，我想提高压缩级别，因为压缩时间不是问题。我在rubyzipdocumentation中找不到一种为创建的ZIP文件指定压缩级别的方法。有人知道如何更改此设置吗？是否有另一个允许指定压缩级别的Ruby库？最佳答案这是我通过查看rubyzip内部创建的代码。level=Zlib::BEST_COMPRESSIONZip::ZipOutputStream.open(zip_file)do|zip|Dir.glob("**/*")d
python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby - 检查 "command"的输出应该包含 NilClass 的意外崩溃 - 2
为了将Cucumber用于命令行脚本，我按照提供的说明安装了arubagem。它在我的Gemfile中，我可以验证是否安装了正确的版本并且我已经包含了require'aruba/cucumber'在'features/env.rb'中为了确保它能正常工作，我写了以下场景:@announceScenario:Testingcucumber/arubaGivenablankslateThentheoutputfrom"ls-la"shouldcontain"drw"假设事情应该失败。它确实失败了，但失败的原因是错误的:@announceScenario:Testingcucumber/ar
ruby - 在 jRuby 中使用 'fork' 生成进程的替代方案？ - 2
在MRIRuby中我可以这样做:deftransferinternal_server=self.init_serverpid=forkdointernal_server.runend#Maketheserverprocessrunindependently.Process.detach(pid)internal_client=self.init_client#Dootherstuffwithconnectingtointernal_server...internal_client.post('somedata')ensure#KillserverProcess.kill('KILL',
ruby 正则表达式 - 如何替换字符串中匹配项的第 n 个实例 - 2
在我的应用程序中，我需要能够找到所有数字子字符串，然后扫描每个子字符串，找到第一个匹配范围(例如5到15之间)的子字符串，并将该实例替换为另一个字符串“X”。我的测试字符串s="1foo100bar10gee1"我的初始模式是1个或多个数字的任何字符串，例如，re=Regexp.new(/\d+/)matches=s.scan(re)给出["1","100","10","1"]如果我想用“X”替换第N个匹配项，并且只替换第N个匹配项，我该怎么做？例如，如果我想替换第三个匹配项“10”(匹配项[2])，我不能只说s[matches[2]]="X"因为它做了两次替换“1fooX0barXg
ruby - 如何使用 Ruby aws/s3 Gem 生成安全 URL 以从 s3 下载文件 - 2
我正在编写一个小脚本来定位aws存储桶中的特定文件，并创建一个临时验证的url以发送给同事。(理想情况下，这将创建类似于在控制台上右键单击存储桶中的文件并复制链接地址的结果)。我研究过回形针，它似乎不符合这个标准，但我可能只是不知道它的全部功能。我尝试了以下方法:defauthenticated_url(file_name,bucket)AWS::S3::S3Object.url_for(file_name,bucket,:secure=>true,:expires=>20*60)end产生这种类型的结果:...-1.amazonaws.com/file_path/file.zip.A
ruby - 如何根据特征实现 FactoryGirl 的条件行为 - 2
我有一个用户工厂。我希望默认情况下确认用户。但是鉴于unconfirmed特征，我不希望它们被确认。虽然我有一个基于实现细节而不是抽象的工作实现，但我想知道如何正确地做到这一点。factory:userdoafter(:create)do|user,evaluator|#unwantedimplementationdetailshereunlessFactoryGirl.factories[:user].defined_traits.map(&:name).include?(:unconfirmed)user.confirm!endendtrait:unconfirmeddoenden
ruby - 在 Ruby 中有条件地定义函数 - 2
我有一些代码在几个不同的位置之一运行:作为具有调试输出的命令行工具，作为不接受任何输出的更大程序的一部分，以及在Rails环境中。有时我需要根据代码的位置对代码进行细微的更改，我意识到以下样式似乎可行:print"Testingnestedfunctionsdefined\n"CLI=trueifCLIdeftest_printprint"CommandLineVersion\n"endelsedeftest_printprint"ReleaseVersion\n"endendtest_print()这导致:TestingnestedfunctionsdefinedCommandLin
ruby - 定义方法参数的条件 - 2
我有一个只接受一个参数的方法:defmy_method(number)end如果使用number调用方法，我该如何引发错误？？通常，我如何定义方法参数的条件？比如我想在调用的时候报错:my_method(1) 最佳答案您可以添加guard在函数的开头，如果参数无效则引发异常。例如:defmy_method(number)failArgumentError,"Inputshouldbegreaterthanorequalto2"ifnumbereputse.messageend#=>Inputshouldbegreaterthano
ruby-on-rails - Ruby on Rails - 为文本区域和图片生成列 - 2
我是Rails的新手，所以请原谅简单的问题。我正在为一家公司创建一个网站。那家公司想在网站上展示它的客户。我想让客户自己管理这个。我正在为“客户”生成一个表格，我想要的三列是:公司名称、公司描述和Logo。对于名称，我使用的是name:string但不确定如何在脚本/生成脚手架终端命令中最好地创建描述列(因为我打算将其设置为文本区域)和图片。我怀疑描述(我想成为一个文本区域)应该仍然是描述:字符串，然后以实际形式进行调整。不确定如何处理图片字段。那么……说来话长:我在脚手架命令中输入什么来生成描述和图片列？最佳答案对于“文本”数