php - Notepad++ 删除里面有特定文本的标签

coder 2024-07-05 原文

我有一个包含产品的大型 XML 文件。我正在尝试删除所有缺货的产品。文件大小超过 20MB。

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

<product>
  <name>bla2</name>
  <price>60$</price>
  <stock>no</stock>
  <description>bla</description>
</product>

...

是否可以使用 Notepad++ 的正则表达式删除它们，还是应该使用 simpleXML(PHP) 或类似的东西？

我的基本 PHP 代码:

$url = 'input/products.xml';
    $xml = new SimpleXMLElement(file_get_contents($url));

    foreach ($xml->product->children() as $product) {

        //finding out of stock products and deleting them

    }
    $xml->asXml('output/products.xml');

最佳答案

前进

通过正则表达式进行模式匹配并不理想，如果您可以访问 PHP，那么我建议使用合适的 HTLM 解析工具。话虽如此，我提供了一个可以在 Notepad++ 中使用的解决方案

描述

<product\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>(?:(?!</product).)*<stock\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s?\/?>no</stock>(?:(?!</product).)*<\/product>

替换为: 什么都没有

为了更好地查看图像，您可以右键单击它并选择在新窗口中查看。

此正则表达式将执行以下操作:

找到整个产品部分
需要子标签 stock
需要子标签 stock的值为 no
避免使 HTML 中的模式匹配变得困难的极端情况

从 Notepad++

在 Notepad++ 中，请注意您应该使用 notpad++ 版本 6.1 或更高版本，因为旧版本中的正则表达式问题现已解决。

按ctrlh进入查找替换模式
选择正则表达式选项
在“查找内容”字段中放置正则表达式
在“替换为”字段中输入``
点击全部替换

例子

现场演示

https://regex101.com/r/cW9nC5/1

示例文本

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

<product>
  <name>bla2</name>
  <price>60$</price>
  <stock>no</stock>
  <description>bla</description>
</product>

替换后

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  <product                 '<product'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ='                       '=\''
----------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    '                        '\''
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ="                       '="'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    [^'"]                    any character except: ''', '"'
----------------------------------------------------------------------
    [^\s>]*                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*?                      end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >                        '>\r\n'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      </product                '</product'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  <stock                   '<stock'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ='                       '=\''
----------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    '                        '\''
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ="                       '="'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    [^'"]                    any character except: ''', '"'
----------------------------------------------------------------------
    [^\s>]*                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*?                      end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >no</stock>              '>no</stock>'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      </product                '</product'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  product>                 'product>'
----------------------------------------------------------------------

关于php - Notepad++ 删除里面有特定文本的标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37528281/

amp Notepad 39 gt lt php regex xml notepad++

有关php - Notepad++ 删除里面有特定文本的标签的更多相关文章

ruby - 使用 ruby 将 HTML 转换为纯文本并维护结构/格式 - 2
我想将html转换为纯文本。不过，我不想只删除标签，我想智能地保留尽可能多的格式。为插入换行符标签，检测段落并格式化它们等。输入非常简单，通常是格式良好的html(不是整个文档，只是一堆内容，通常没有anchor或图像)。我可以将几个正则表达式放在一起，让我达到80%，但我认为可能有一些现有的解决方案更智能。最佳答案首先，不要尝试为此使用正则表达式。很有可能你会想出一个脆弱/脆弱的解决方案，它会随着HTML的变化而崩溃，或者很难管理和维护。您可以使用Nokogiri快速解析HTML并提取文本:require'nokogiri'h
ruby-on-rails - 如何从 format.xml 中删除 <hash></hash> - 2
我有一个对象has_many应呈现为xml的子对象。这不是问题。我的问题是我创建了一个Hash包含此数据，就像解析器需要它一样。但是rails自动将整个文件包含在.........我需要摆脱type="array"和我该如何处理？我没有在文档中找到任何内容。最佳答案我遇到了同样的问题；这是我的XML:我在用这个:entries.to_xml将散列数据转换为XML，但这会将条目的数据包装到中所以我修改了:entries.to_xml(root:"Contacts")但这仍然将转换后的XML包装在“联系人”中，将我的XML代码修改为
ruby - 我可以使用 Ruby 从 CSV 中删除列吗？ - 2
查看Ruby的CSV库的文档，我非常确定这是可能且简单的。我只需要使用Ruby删除CSV文件的前三列，但我没有成功运行它。最佳答案 csv_table=CSV.read(file_path_in,:headers=>true)csv_table.delete("header_name")csv_table.to_csv#=>ThenewCSVinstringformat检查CSV::Table文档:http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
ruby - 在院子里用@param 标签警告 - 2
我试图使用yard记录一些Ruby代码，尽管我所做的正是所描述的here或here#@param[Integer]thenumberoftrials(>=0)#@param[Float]successprobabilityineachtrialdefinitialize(n,p)#initialize...end虽然我仍然得到这个奇怪的错误@paramtaghasunknownparametername:the@paramtaghasunknownparametername:success然后生成的html看起来很奇怪。我称yard为:$yarddoc-mmarkdown我做错了什么？
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 我可以使用 aws-sdk-ruby 在 AWS S3 上使用事务性文件删除/上传吗？ - 2
我发现ActiveRecord::Base.transaction在复杂方法中非常有效。我想知道是否可以在如下事务中从AWSS3上传/删除文件:S3Object.transactiondo#writeintofiles#raiseanexceptionend引发异常后，每个操作都应在S3上回滚。S3Object这可能吗？？最佳答案虽然S3API具有批量删除功能，但它不支持事务，因为每个删除操作都可以独立于其他操作成功/失败。该API不提供任何批量上传功能(通过PUT或POST)，因此每个上传操作都是通过一个独立的API调用完成的
ruby - 如何安全地删除文件？ - 2
在Ruby中是否有Gem或安全删除文件的方法？我想避免系统上可能不存在的外部程序。“安全删除”指的是覆盖文件内容。最佳答案如果您使用的是*nix，一个很好的方法是使用exec/open3/open4调用shred:`shred-fxuz#{filename}`http://www.gnu.org/s/coreutils/manual/html_node/shred-invocation.html检查这个类似的帖子:Writingafileshredderinpythonorruby?
css - 用 watir 检查标签类？ - 2
我有一个div，它根据表单是否正确提交而改变。我想知道是否可以检查类的特定元素？开始元素看起来像这样。如果输入不正确，添加错误类。最佳答案试试这个:browser.div(:id=>"myerrortest").class_name更多信息:http://watir.github.com/watir-webdriver/doc/Watir/HTMLElement.html#class_name-instance_method另一种选择是只查看具有您期望的类的div是否存在browser.div((:id=>"myerrortes
ruby-on-rails - 标准化文件名的字符串，删除重音和特殊字符 - 2
我正在尝试找到一种方法来规范化字符串以将其作为文件名传递。到目前为止我有这个:my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.gsub(/[^a-z]/,'_')但第一个问题:-字符。我猜这个方法还有更多问题。我不控制名称，名称字符串可以有重音符、空格和特殊字符。我想删除所有这些，用相应的字母('é'=>'e')替换重音符号，并将其余的替换为'_'字符。名字是这样的:“Prélèvements-常规”“健康证”...我希望它们像一个没有空格/特殊字符的文件名:“prelevements_routin
ruby - 使用 `+=` 和 `send` 方法 - 2
如何将send与+=一起使用？a=20;a.send"+=",10undefinedmethod`+='for20:Fixnuma=20;a+=10=>30 最佳答案恐怕你不能。+=不是方法，而是语法糖。参见http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_expressions.html它说Incommonwithmanyotherlanguages,Rubyhasasyntacticshortcut:a=a+2maybewrittenasa+=2.你能做的最好的事情是:

php - Notepad++ 删除里面有特定文本的标签

前进

描述

从 Notepad++

例子

说明

有关php - Notepad++ 删除里面有特定文本的标签的更多相关文章

随机推荐