.net - 为什么在分配大量内存时我的线程化 .Net 应用程序不能线性扩展？

coder 2023-06-04 原文

关于大内存分配对 .Net 运行时可伸缩性的影响，我遇到了一些奇怪的事情。在我的测试应用程序中，我在固定循环数的紧密循环中创建了许多字符串，并输出每秒循环迭代的速率。当我在多个线程中运行这个循环时，奇怪的事情就出现了——看起来速率并没有线性增加。当您创建大字符串时，问题会变得更糟。

让我告诉你结果。我的机器是一个 8gb、8 核的机器，运行 Windows Server 2008 R1，32 位。它有两个 4 核 Intel Xeon 1.83ghz (E5320) 处理器。执行的“工作”是对字符串的一组交替调用 ToUpper() 和 ToLower()。我对一个线程、两个线程等运行测试——最多。下表中的列是:

速率:所有线程的循环次数除以持续时间。
线性比率:理想的比率，如果性能是线性扩展的。它的计算方法是一个线程实现的速率乘以该测试的线程数。
方差:计算为比率低于线性比率的百分比。

示例 1:10,000 个循环，8 个线程，每个字符串 1024 个字符

第一个示例从一个线程开始，然后是两个线程，最终使用八个线程运行测试。每个线程创建 10,000 个字符串，每个字符串包含 1024 个字符:

Creating 10000 strings per thread, 1024 chars each, using up to 8 threads
GCMode = Server

Rate          Linear Rate   % Variance    Threads
--------------------------------------------------------
322.58        322.58        0.00 %        1
689.66        645.16        -6.90 %       2
882.35        967.74        8.82 %        3
1081.08       1290.32       16.22 %       4
1388.89       1612.90       13.89 %       5
1666.67       1935.48       13.89 %       6
2000.00       2258.07       11.43 %       7
2051.28       2580.65       20.51 %       8
Done.

Example 2: 10,000 loops, 8 threads, 32,000 chars per string

In the second example I’ve increased the number of chars for each string to 32,000.

Creating 10000 strings per thread, 32000 chars each, using up to 8 threads
GCMode = Server

Rate          Linear Rate   % Variance    Threads
--------------------------------------------------------
14.10         14.10         0.00 %        1
24.36         28.21         13.64 %       2
33.15         42.31         21.66 %       3
40.98         56.42         27.36 %       4
48.08         70.52         31.83 %       5
61.35         84.63         27.51 %       6
72.61         98.73         26.45 %       7
67.85         112.84        39.86 %       8
Done.

Notice the difference in variance from the linear rate; in the second table the actual rate is 39% less than the linear rate.

My question is: Why does this app not scale linearly?

My Observations

False Sharing

I initially thought that this could be due to False Sharing but, as you’ll see in the source code, I’m not sharing any collections and the strings are quite big. The only overlap that could exist is at the beginning of one string and the end of another.

Server-mode Garbage Collector

I’m using gcServer enabled=true so that each core gets its own heap and garbage collector thread.

Large Object Heap

I don't think that objects I allocate are being sent to the Large Object Heap because they are under 85000 bytes big.

String Interning

I thought that string values may being shared under the hood due to interningMSDN, so I tried compiling interning disabled. This produced worse results than those shown above

Other data types

I tried the same example using small and large integer arrays, in which I loop through each element and change the value. It produces similar results, following the trend of performing worse with larger allocations.

Source Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Diagnostics;
using System.Runtime;
using System.Runtime.CompilerServices;

namespace StackOverflowExample
{
  public class Program
  {
    private static int columnWidth = 14;

    static void Main(string[] args)
    {
      int loopCount, maxThreads, stringLength;
      loopCount = maxThreads = stringLength = 0;
      try
      {
        loopCount = args.Length != 0 ? Int32.Parse(args[0]) : 1000;
        maxThreads = args.Length != 0 ? Int32.Parse(args[1]) : 4;
        stringLength = args.Length != 0 ? Int32.Parse(args[2]) : 1024;
      }
      catch
      {
        Console.WriteLine("Usage: StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]");
        System.Environment.Exit(2);
      }

      float rate;
      float linearRate = 0;
      Stopwatch stopwatch;
      Console.WriteLine("Creating {0} strings per thread, {1} chars each, using up to {2} threads", loopCount, stringLength, maxThreads);
      Console.WriteLine("GCMode = {0}", GCSettings.IsServerGC ? "Server" : "Workstation");
      Console.WriteLine();
      PrintRow("Rate", "Linear Rate", "% Variance", "Threads"); ;
      PrintRow(4, "".PadRight(columnWidth, '-'));

      for (int runCount = 1; runCount <= maxThreads; runCount++)
      {
        // Create the workers
        Worker[] workers = new Worker[runCount];
        workers.Length.Range().ForEach(index => workers[index] = new Worker());

        // Start timing and kick off the threads
        stopwatch = Stopwatch.StartNew();
        workers.ForEach(w => new Thread(
          new ThreadStart(
            () => w.DoWork(loopCount, stringLength)
          )
        ).Start());

        // Wait until all threads are complete
        WaitHandle.WaitAll(
          workers.Select(p => p.Complete).ToArray());
        stopwatch.Stop();

        // Print the results
        rate = (float)loopCount * runCount / stopwatch.ElapsedMilliseconds;
        if (runCount == 1) { linearRate = rate; }

        PrintRow(String.Format("{0:#0.00}", rate),
          String.Format("{0:#0.00}", linearRate * runCount),
          String.Format("{0:#0.00} %", (1 - rate / (linearRate * runCount)) * 100),
          runCount.ToString()); 
      }
      Console.WriteLine("Done.");
    }

    private static void PrintRow(params string[] columns)
    {
      columns.ForEach(c => Console.Write(c.PadRight(columnWidth)));
      Console.WriteLine();
    }

    private static void PrintRow(int repeatCount, string column)
    {
      for (int counter = 0; counter < repeatCount; counter++)
      {
        Console.Write(column.PadRight(columnWidth));
      }
      Console.WriteLine();
    }
  }

  public class Worker
  {
    public ManualResetEvent Complete { get; private set; }

    public Worker()
    {
      Complete = new ManualResetEvent(false);
    }

    public void DoWork(int loopCount, int stringLength)
    {
      // Build the string
      string theString = "".PadRight(stringLength, 'a');
      for (int counter = 0; counter < loopCount; counter++)
      {
        if (counter % 2 == 0) { theString.ToUpper(); }
        else { theString.ToLower(); }
      }
      Complete.Set();
    }
  }

  public static class HandyExtensions
  {
    public static IEnumerable<int> Range(this int max)
    {
      for (int counter = 0; counter < max; counter++)
      {
        yield return counter;
      }
    }

    public static void ForEach<T>(this IEnumerable<T> items, Action<T> action)
    {
      foreach(T item in items)
      {
        action(item);
      }
    }
  }
}

App.Config

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <runtime>
    <gcServer enabled="true"/>
  </runtime>
</configuration>

运行示例

要在您的机器上运行 StackOverflowExample.exe，请使用以下命令行参数调用它:

StackOverFlowExample.exe [loopCount] [maxThreads] [stringLength]

loopCount:每个线程操作字符串的次数。
maxThreads:要前进到的线程数。
stringLength:填充字符串的字符数。

最佳答案

你可能想看看 this question of mine .

我遇到了类似的问题，这是由于 CLR 在分配内存时执行线程间同步以避免重叠分配。现在，使用服务器 GC，锁定算法可能会有所不同——但这些相同的东西可能会影响您的代码。

关于.net - 为什么在分配大量内存时我的线程化 .Net 应用程序不能线性扩展？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2072752/

net 34 stringLength loopCount .net multithreading memory concurrency scalability

有关.net - 为什么在分配大量内存时我的线程化 .Net 应用程序不能线性扩展？的更多相关文章

ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - Ruby net/ldap 模块中的内存泄漏 - 2
作为我的Rails应用程序的一部分，我编写了一个小导入程序，它从我们的LDAP系统中吸取数据并将其塞入一个用户表中。不幸的是，与LDAP相关的代码在遍历我们的32K用户时泄漏了大量内存，我一直无法弄清楚如何解决这个问题。这个问题似乎在某种程度上与LDAP库有关，因为当我删除对LDAP内容的调用时，内存使用情况会很好地稳定下来。此外，不断增加的对象是Net::BER::BerIdentifiedString和Net::BER::BerIdentifiedArray，它们都是LDAP库的一部分。当我运行导入时，内存使用量最终达到超过1GB的峰值。如果问题存在，我需要找到一些方法来更正我的代
ruby-on-rails - Rails - 子类化模型的设计模式是什么？ - 2
我有一个模型:classItem项目有一个属性“商店”基于存储的值，我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式？如果方法中没有大的if-else语句，这是如何干净利落地完成的？最佳答案通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.co
ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby 中生成它们？ - 2
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
ruby - 解析 RDFa、微数据等的最佳方式是什么，使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2
我主要使用Ruby来执行此操作，但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式，例如使用这个yaml文件，它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i
ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996？但是 4.2%2==0.2 - 2
为什么4.1%2返回0.0999999999999996？但是4.2%2==0.2。最佳答案参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意，这不是与Ruby相关的问题，而是与所有编程语言相关的问题，因为它来自计算机表示实数的方式。关于ruby-为什么4.1%2使用Ruby返
ruby - 如何模拟 Net::HTTP::Post？ - 2
是的，我知道最好使用webmock，但我想知道如何在RSpec中模拟此方法:defmethod_to_testurl=URI.parseurireq=Net::HTTP::Post.newurl.pathres=Net::HTTP.start(url.host,url.port)do|http|http.requestreq,foo:1endresend这是RSpec:let(:uri){'http://example.com'}specify'HTTPcall'dohttp=mock:httpNet::HTTP.stub!(:start).and_yieldhttphttp.shou
Ruby Koans about_array_assignment - 非平行与平行分配歧视 - 2
通过rubykoans.com，我在about_array_assignment.rb中遇到了这两段代码你怎么知道第一个是非并行赋值，第二个是一个变量的并行赋值？在我看来，除了命名差异之外，代码几乎完全相同。4deftest_non_parallel_assignment5names=["John","Smith"]6assert_equal["John","Smith"],names7end45deftest_parallel_assignment_with_one_variable46first_name,=["John","Smith"]47assert_equal'John
ruby - ruby 中的 TOPLEVEL_BINDING 是什么？ - 2
它不等于主线程的binding，这个toplevel作用域是什么？此作用域与主线程中的binding有何不同？>ruby-e'putsTOPLEVEL_BINDING===binding'false 最佳答案事实是，TOPLEVEL_BINDING始终引用Binding的预定义全局实例，而Kernel#binding创建的新实例>Binding每次封装当前执行上下文。在顶层，它们都包含相同的绑定(bind)，但它们不是同一个对象，您无法使用==或===测试它们的绑定(bind)相等性。putsTOPLEVEL_BINDINGput
ruby - Infinity 和 NaN 的类型是什么？ - 2
我可以得到Infinity和NaNn=9.0/0#=>Infinityn.class#=>Floatm=0/0.0#=>NaNm.class#=>Float但是当我想直接访问Infinity或NaN时:Infinity#=>uninitializedconstantInfinity(NameError)NaN#=>uninitializedconstantNaN(NameError)什么是Infinity和NaN？它们是对象、关键字还是其他东西？最佳答案您看到打印为Infinity和NaN的只是Float类的两个特殊实例的字符串