草庐IT

c# - 如何确保数据流 block 仅按需创建线程?

coder 2023-07-13 原文

我使用 TPL 数据流 API 编写了一个小型管道,它从多个线程接收数据并对其执行处理。

设置 1

当我将其配置为使用 MaxDegreeOfParallelism = Environment.ProcessorCount 时(在我的例子中是 8)对于每个 block ,我注意到它填满了多个线程中的缓冲区并且处理第二个 block 直到所有线程都接收到 +- 1700 个元素才开始。你可以在行动中看到这个 here .

设置 2

当我设置 MaxDegreeOfParallelism = 1然后我注意到所有元素都在单个线程上接收,并且在接收到 +- 40 个元素后已经开始处理发送。 Data here .

设置 3

当我设置 MaxDegreeOfParallelism = 1并且我在发送每个输入之前引入了 1000 毫秒的延迟,我注意到元素在收到后立即发送,并且每个收到的元素都放在一个单独的线程中。资料here .


到目前为止设置。我的问题如下:

  1. 当我比较设置 1 和 2 时,我注意到与并行相比,串行完成时处理元素的启动速度要快得多(即使考虑到并行的线程数是原来的 8 倍)。是什么导致了这种差异?

  2. 由于这将在 ASP.NET 环境中运行,我不想生成不必要的线程,因为它们都来自单个线程池。如设置 3 所示,即使只有少量数据,它仍会分布在多个线程上。这也令人惊讶,因为从设置 1 开始,我假设数据按顺序分布在线程上(注意前 50 个元素如何全部进入线程 16)。我能否确保它仅按需创建新线程?

  3. 还有一个概念叫做 BufferBlock<T> .如果TransformBlock<T>已经对输入进行了排队,将我的管道 ( ReceiveElement ) 中的第一步交换为 BufferBlock 的实际区别是什么? ?


class Program
{
    static void Main(string[] args)
    {
        var dataflowProcessor = new DataflowProcessor<string>();
        var amountOfTasks = 5;
        var tasks = new Task[amountOfTasks];

        for (var i = 0; i < amountOfTasks; i++)
        {
            tasks[i] = SpawnThread(dataflowProcessor, $"Task {i + 1}");
        }

        foreach (var task in tasks)
        {
            task.Start();
        }

        Task.WaitAll(tasks);
        Console.WriteLine("Finished feeding threads"); // Needs to use async main
        Console.Read();
    }

    private static Task SpawnThread(DataflowProcessor<string> dataflowProcessor, string taskName)
    {
        return new Task(async () =>
        {
            await FeedData(dataflowProcessor, taskName);
        });
    }

    private static async Task FeedData(DataflowProcessor<string> dataflowProcessor, string threadName)
    {
        foreach (var i in Enumerable.Range(0, short.MaxValue))
        {
            await Task.Delay(1000); // Only used for the delayedSerialProcessing test
            dataflowProcessor.Process($"Thread name: {threadName}\t Thread ID:{Thread.CurrentThread.ManagedThreadId}\t Value:{i}");
        }
    }
}


public class DataflowProcessor<T>
{
    private static readonly ExecutionDataflowBlockOptions ExecutionOptions = new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = Environment.ProcessorCount
    };

    private static readonly TransformBlock<T, T> ReceiveElement = new TransformBlock<T, T>(element =>
    {
        Console.WriteLine($"Processing received element in thread {Thread.CurrentThread.ManagedThreadId}");
        return element;
    }, ExecutionOptions);

    private static readonly ActionBlock<T> SendElement = new ActionBlock<T>(element =>
    {
        Console.WriteLine($"Processing sent element in thread {Thread.CurrentThread.ManagedThreadId}");
        Console.WriteLine(element);
    }, ExecutionOptions);

    static DataflowProcessor()
    {
        ReceiveElement.LinkTo(SendElement);

        ReceiveElement.Completion.ContinueWith(x =>
        {
            if (x.IsFaulted)
            {
                ((IDataflowBlock) ReceiveElement).Fault(x.Exception);
            }
            else
            {
                ReceiveElement.Complete();
            }
        });
    }


    public void Process(T newElement)
    {      
        ReceiveElement.Post(newElement);
    }
}

最佳答案

在将解决方案部署到 ASP.NET 环境之前,我建议您更改体系结构:IIS 可以在请求处理后暂停 ASP.NET 中的线程供自己使用,这样您的任务就可以未完成。更好的方法是创建一个单独的 Windows 服务守护进程来处理您的数据流。

现在回到 TPL 数据流。

我喜欢 TPL Dataflow 库,但它的文档真是一团糟。
我找到的唯一有用的文件是 Introduction to TPL Dataflow .

其中有一些线索可能会有所帮助,尤其是关于配置设置的线索(我建议您研究使用您自己的 TheadPool 实现您自己的 TaskScheduler实现和 MaxMessagesPerTask 选项)如果需要:

The built-in dataflow blocks are configurable, with a wealth of control provided over how and where blocks perform their work. Here are some key knobs available to the developer, all of which are exposed through the DataflowBlockOptions class and its derived types (ExecutionDataflowBlockOptions and GroupingDataflowBlockOptions), instances of which may be provided to blocks at construction time.

  • TaskScheduler 自定义,如@i3arnon 所述:

    By default, dataflow blocks schedule work to TaskScheduler.Default, which targets the internal workings of the .NET ThreadPool.

  • 最大并行度

    It defaults to 1, meaning only one thing may happen in a block at a time. If set to a value higher than 1, that number of messages may be processed concurrently by the block. If set to DataflowBlockOptions.Unbounded (-1), any number of messages may be processed concurrently, with the maximum automatically managed by the underlying scheduler targeted by the dataflow block. Note that MaxDegreeOfParallelism is a maximum, not a requirement.

  • MaxMessagesPerTask

    TPL Dataflow is focused on both efficiency and control. Where there are necessary trade-offs between the two, the system strives to provide a quality default but also enable the developer to customize behavior according to a particular situation. One such example is the trade-off between performance and fairness. By default, dataflow blocks try to minimize the number of task objects that are necessary to process all of their data. This provides for very efficient execution; as long as a block has data available to be processed, that block’s tasks will remain to process the available data, only retiring when no more data is available (until data is available again, at which point more tasks will be spun up). However, this can lead to problems of fairness. If the system is currently saturated processing data from a given set of blocks, and then data arrives at other blocks, those latter blocks will either need to wait for the first blocks to finish processing before they’re able to begin, or alternatively risk oversubscribing the system. This may or may not be the correct behavior for a given situation. To address this, the MaxMessagesPerTask option exists. It defaults to DataflowBlockOptions.Unbounded (-1), meaning that there is no maximum. However, if set to a positive number, that number will represent the maximum number of messages a given block may use a single task to process. Once that limit is reached, the block must retire the task and replace it with a replica to continue processing. These replicas are treated fairly with regards to all other tasks scheduled to the scheduler, allowing blocks to achieve a modicum of fairness between them. In the extreme, if MaxMessagesPerTask is set to 1, a single task will be used per message, achieving ultimate fairness at the potential expense of more tasks than may otherwise have been necessary.

  • MaxNumberOfGroups

    The grouping blocks are capable of tracking how many groups they’ve produced, and automatically complete themselves (declining further offered messages) after that number of groups has been generated. By default, the number of groups is DataflowBlockOptions.Unbounded (-1), but it may be explicitly set to a value greater than one.

  • 取消 token

    This token is monitored during the dataflow block’s lifetime. If a cancellation request arrives prior to the block’s completion, the block will cease operation as politely and quickly as possible.

  • 贪心

    By default, target blocks are greedy and want all data offered to them.

  • 有限容量

    This is the limit on the number of items the block may be storing and have in flight at any one time.

关于c# - 如何确保数据流 block 仅按需创建线程?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37833712/

有关c# - 如何确保数据流 block 仅按需创建线程?的更多相关文章

  1. ruby - 如何使用 Nokogiri 的 xpath 和 at_xpath 方法 - 2

    我正在学习如何使用Nokogiri,根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div

  2. ruby - 如何从 ruby​​ 中的字符串运行任意对象方法? - 2

    总的来说,我对ruby​​还比较陌生,我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础,我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用

  3. ruby - 如何在 Ruby 中顺序创建 PI - 2

    出于纯粹的兴趣,我很好奇如何按顺序创建PI,而不是在过程结果之后生成数字,而是让数字在过程本身生成时显示。如果是这种情况,那么数字可以自行产生,我可以对以前看到的数字实现垃圾收集,从而创建一个无限系列。结果只是在Pi系列之后每秒生成一个数字。这是我通过互联网筛选的结果:这是流行的计算机友好算法,类机器算法:defarccot(x,unity)xpow=unity/xn=1sign=1sum=0loopdoterm=xpow/nbreakifterm==0sum+=sign*(xpow/n)xpow/=x*xn+=2sign=-signendsumenddefcalc_pi(digits

  4. python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声? - 2

    关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题?更新问题,以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点?我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等),但我确实想创建一个输出文件。

  5. ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2

    给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息? 最佳答案 在Rails3中,update_all不返回任何有意义的信息,除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru

  6. ruby-on-rails - 'compass watch' 是如何工作的/它是如何与 rails 一起使用的 - 2

    我在我的项目目录中完成了compasscreate.和compassinitrails。几个问题:我已将我的.sass文件放在public/stylesheets中。这是放置它们的正确位置吗?当我运行compasswatch时,它不会自动编译这些.sass文件。我必须手动指定文件:compasswatchpublic/stylesheets/myfile.sass等。如何让它自动运行?文件ie.css、print.css和screen.css已放在stylesheets/compiled。如何在编译后不让它们重新出现的情况下删除它们?我自己编译的.sass文件编译成compiled/t

  7. ruby - 如何将脚本文件的末尾读取为数据文件(Perl 或任何其他语言) - 2

    我正在寻找执行以下操作的正确语法(在Perl、Shell或Ruby中):#variabletoaccessthedatalinesappendedasafileEND_OF_SCRIPT_MARKERrawdatastartshereanditcontinues. 最佳答案 Perl用__DATA__做这个:#!/usr/bin/perlusestrict;usewarnings;while(){print;}__DATA__Texttoprintgoeshere 关于ruby-如何将脚

  8. ruby - 解析 RDFa、微数据等的最佳方式是什么,使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2

    我主要使用Ruby来执行此操作,但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式,例如使用这个yaml文件,它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i

  9. ruby - 如何指定 Rack 处理程序 - 2

    Rackup通过Rack的默认处理程序成功运行任何Rack应用程序。例如:classRackAppdefcall(environment)['200',{'Content-Type'=>'text/html'},["Helloworld"]]endendrunRackApp.new但是当最后一行更改为使用Rack的内置CGI处理程序时,rackup给出“NoMethodErrorat/undefinedmethod`call'fornil:NilClass”:Rack::Handler::CGI.runRackApp.newRack的其他内置处理程序也提出了同样的反对意见。例如Rack

  10. ruby - 使用 Vim Rails,您可以创建一个新的迁移文件并一次性打开它吗? - 2

    使用带有Rails插件的vim,您可以创建一个迁移文件,然后一次性打开该文件吗?textmate也可以这样吗? 最佳答案 你可以使用rails.vim然后做类似的事情::Rgeneratemigratonadd_foo_to_bar插件将打开迁移生成的文件,这正是您想要的。我不能代表textmate。 关于ruby-使用VimRails,您可以创建一个新的迁移文件并一次性打开它吗?,我们在StackOverflow上找到一个类似的问题: https://sta

随机推荐