c++ - CUDA、互斥量和 atomicCAS()

coder 2023-11-17 原文

最近开始在CUDA上开发，遇到了atomicCAS()的问题。要在设备代码中对内存进行一些操作，我必须创建一个互斥量，以便只有一个线程可以在代码的关键部分使用内存。

下面的设备代码在 1 个 block 和多个线程上运行。

__global__ void cudaKernelGenerateRandomGraph(..., int* mutex)
{
    int i = threadIdx.x;
    ...

    do 
    {
        atomicCAS(mutex, 0, 1 + i);
    }
    while (*mutex != i + 1);

    //critical section
    //do some manipulations with objects in device memory

    *mutex = 0;

    ...
}

当第一个线程执行时

atomicCAS(mutex, 0, 1 + i);

mutex 为 1。在第一个线程将其状态从 Active 更改为 Inactive 之后，行

*mutex = 0;

未执行。其他线程永远处于循环中。我已经尝试了这个循环的许多变体，比如 while(){};、do{}while();，在循环内使用 temp variable = *mutex，甚至使用 if(){} 和 goto 的变体。但结果是一样的。

宿主部分代码:

...
int verticlesCount = 5;
int *mutex;
cudaMalloc((void **)&mutex, sizeof(int));
cudaMemset(mutex, 0, sizeof(int));
cudaKernelGenerateRandomGraph<<<1, verticlesCount>>>(..., mutex);

我使用带有 CUDA 5.5 的 Visual Studio 2012。

该设备是具有 1.2 计算能力的 NVidia GeForce GT 240。

提前致谢。

更新: 今年 Spring 在我的文凭项目上工作了一段时间后，我找到了 cuda 上关键部分的解决方案。这是无锁和互斥机制的组合。这是工作代码。用它来插入原子动态调整大小的数组。

// *mutex should be 0 before calling this function
__global__ void kernelFunction(..., unsigned long long* mutex) 
{
    bool isSet = false; 
    do 
    {
        if (isSet = atomicCAS(mutex, 0, 1) == 0) 
        {
            // critical section goes here
        }
        if (isSet) 
        {
            mutex = 0;
        }
    } 
    while (!isSet);
}

最佳答案

有问题的循环

do 
{
    atomicCAS(mutex, 0, 1 + i);
}
while (*mutex != i + 1);

如果它在主机 (CPU) 端运行，可以正常工作；一旦线程 0 将 *mutex 设置为 1，其他线程将等待直到线程 0 将 *mutex 设置回 0。

但是，GPU 线程并不像 CPU 线程那样独立。 GPU 线程以 32 个为一组进行分组，通常称为 warps。 同一个 warp 中的线程将以完全锁步的方式执行指令。如果诸如 if 或 while 之类的控制语句导致 32 个线程中的某些线程与其余线程分道扬镳，其余线程将等待(即休眠) 为了完成不同的线程。 [1]

回到有问题的循环，线程 0 变为非事件状态，因为线程 1、2、...、31 仍停留在 while 循环中。所以线程 0 永远不会到达 *mutex = 0 行，而其他 31 个线程永远循环。

一个可能的解决方案是制作一个有问题的共享资源的本地拷贝，让 32 个线程修改拷贝，然后选择一个线程将更改“推送”回共享资源。 __shared__ 变量在这种情况下是理想的:它将由属于同一 block 但不属于其他 block 的线程共享。我们可以使用__syncthreads()来精细控制成员线程对该变量的访问。

[1] CUDA Best Practices Guide - Branching and Divergence

Avoid different execution paths within the same warp.

Any flow control instruction (if, switch, do, for, while) can significantly affect the instruction throughput by causing threads of the same warp to diverge; that is, to follow different execution paths. If this happens, the different execution paths must be serialized, since all of the threads of a warp share a program counter; this increases the total number of instructions executed for this warp. When all the different execution paths have completed, the threads converge back to the same execution path.

关于c++ - CUDA、互斥量和 atomicCAS()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21341495/

互斥 amp code mutex strong c++cuda atomic

有关c++ - CUDA、互斥量和 atomicCAS()的更多相关文章

ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 使用 `+=` 和 `send` 方法 - 2
如何将send与+=一起使用？a=20;a.send"+=",10undefinedmethod`+='for20:Fixnuma=20;a+=10=>30 最佳答案恐怕你不能。+=不是方法，而是语法糖。参见http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_expressions.html它说Incommonwithmanyotherlanguages,Rubyhasasyntacticshortcut:a=a+2maybewrittenasa+=2.你能做的最好的事情是:
ruby - 如何计算 Liquid 中的变量 +1 - 2
我对如何计算通过{%assignvar=0%}赋值的变量加一完全感到困惑。这应该是最简单的任务。到目前为止，这是我尝试过的:{%assignamount=0%}{%forvariantinproduct.variants%}{%assignamount=amount+1%}{%endfor%}Amount:{{amount}}结果总是0。也许我忽略了一些明显的东西。也许有更好的方法。我想要存档的只是获取运行的迭代次数。最佳答案因为{{incrementamount}}将输出您的变量值并且不会影响{%assign%}定义的变量，我
arrays - Ruby 数组 += vs 推送 - 2
我有一个数组数组，想将元素附加到子数组。+=做我想做的，但我想了解为什么push不做。我期望的行为(并与+=一起工作):b=Array.new(3,[])b[0]+=["apple"]b[1]+=["orange"]b[2]+=["frog"]b=>[["苹果"],["橙子"],["Frog"]]通过推送，我将推送的元素附加到每个子数组(为什么？):a=Array.new(3,[])a[0].push("apple")a[1].push("orange")a[2].push("frog")a=>[[“苹果”、“橙子”、“Frog”]、[“苹果”、“橙子”、“Frog”]、[“苹果”、“
+= 的 Ruby 方法 - 2
有没有办法让Ruby能够做这样的事情？classPlane@moved=0@x=0defx+=(v)#thisiserror@x+=v@moved+=1enddefto_s"moved#{@moved}times,currentxis#{@x}"endendplane=Plane.newplane.x+=5plane.x+=10putsplane.to_s#moved2times,currentxis15 最佳答案您不能在Ruby中覆盖复合赋值运算符。任务在内部处理。您应该覆盖+，而不是+=。plane.a+=b与plane.a=
ruby - Sinatra + Heroku + Datamapper 使用 dm-sqlite-adapter 部署问题 - 2
出于某种原因，heroku尝试要求dm-sqlite-adapter，即使它应该在这里使用Postgres。请注意，这发生在我打开任何URL时-而不是在gitpush本身期间。我构建了一个默认的Facebook应用程序。gem文件:source:gemcuttergem"foreman"gem"sinatra"gem"mogli"gem"json"gem"httparty"gem"thin"gem"data_mapper"gem"heroku"group:productiondogem"pg"gem"dm-postgres-adapter"endgroup:development,:t
ruby-on-rails - Rails 中的类实例变量应该在互斥体中设置吗？ - 2
假设我的Rails项目中有一个设置实例变量的Ruby类。classSomethingdefself.objects@objects||=begin#somelogicthatbuildsanarray,whichisultimatelystoredin@objectsendendend是否可以多次设置@objects？是否有可能在一个请求期间，在上面的begin/end之间执行代码时，可以在第二个请求期间调用此方法？我想这实际上归结为Rails服务器实例如何fork的问题。我应该改用Mutex还是线程同步？例如:classSomethingdefself.objectsreturn@o
ruby - Ruby 中字符串运算符 + 和 << 的区别 - 2
我是Ruby和这个网站的新手。下面两个函数是不同的，一个在函数外修改变量，一个不修改。defm1(x)x我想确保我理解正确-当调用m1时，对str的引用被复制并传递给将其视为x的函数。运算符当调用m2时，对str的引用被复制并传递给将其视为x的函数。运算符+创建一个新字符串，赋值x=x+"4"只是将x重定向到新字符串，而原始str变量保持不变。对吧？谢谢最佳答案 String#+::str+other_str→new_strConcatenation—ReturnsanewStringcontainingother_strconc
ruby - rails 3.2.2(或 3.2.1)+ Postgresql 9.1.3 + Ubuntu 11.10 连接错误 - 2
我正在使用PostgreSQL9.1.3(x86_64-pc-linux-gnu上的PostgreSQL9.1.3，由gcc-4.6.real(Ubuntu/Linaro4.6.1-9ubuntu3)4.6.1，64位编译)和在ubuntu11.10上运行3.2.2或3.2.1。现在，我可以使用以下命令连接PostgreSQLsupostgres输入密码我可以看到postgres=#我将以下详细信息放在我的config/database.yml中并执行“railsdb”，它工作正常。开发:adapter:postgresqlencoding:utf8reconnect:falsedat
ruby - 在 Ruby + Chef 中检查现有目录失败 - 2
这是我在ChefRecipe中的一blockRuby:#ifdatadirdoesn'texist,moveoverthedefaultoneif!File.exist?("/vol/postgres/data")execute"mv/var/lib/postgresql/9.1/main/vol/postgres/data"end结果是:Executingmv/var/lib/postgresql/9.1/main/vol/postgres/datamv:inter-devicemovefailed:`/var/lib/postgresql/9.1/main'to`/vol/post

c++ - CUDA、互斥量和 atomicCAS()

有关c++ - CUDA、互斥量和 atomicCAS()的更多相关文章

随机推荐