go - CUDA 内核包装器的共享库 undefined reference

coder 2023-06-30 原文

因此，我尝试在 Windows 上将 CUDA Runtime API 与 Go 的 cgo 结合使用。我已经这样做了几天了，但卡住了:我得到了对我的内核包装器的 undefined reference 。

我已经分离出我的内核并将其包装到下面

文件:cGo.cuh

typedef unsigned long int ktype;
typedef unsigned char glob;

/*
function Prototypes
*/

extern "C" void kernel_kValid(int, int, ktype *, glob *);

__global__ void kValid(ktype *, glob *);

文件:cGo.cu

#include "cGo.cuh"
#include "device_launch_parameters.h"
#include "cuda.h"
#include "cuda_runtime.h"

//function Definitions

/*
kernel_kValid is a wrapper function for the CUDA Kernel to be called from Go
*/
extern "C" void kernel_kValid(int blocks, int threads, ktype *kInfo, glob *values) {
    kValid<<<blocks, threads>>>(kInfo, values);//execute the kernel
}


/*
kValid is the CUDA Kernel which is to be executed
*/
__global__ void kValid(ktype *kInfo, glob *values) {
    //lots of code
}

我将我的 CUDA 源代码编译成一个共享库:

nvcc -shared -o myLib.so cGo.cu

然后我创建了一个头文件以包含在我的 cgo 中

文件:cGo.h

typedef unsigned long int ktype;
typedef unsigned char glob;

/*
function Declarations
*/

void kernel_kValid(int , int , ktype *, glob *);

然后从 go 包中我利用 cgo 调用我的内核包装器

package cuda
/*
#cgo LDFLAGS: -LC:/Storage/Cuda/lib/x64 -lcudart //this is the Cuda library
#cgo LDFLAGS: -L${SRCDIR}/lib -lmyLib //this is my shared library
#cgo CPPFLAGS: -IC:/Storage/Cuda/include //this contains cuda headers
#cgo CPPFLAGS: -I${SRCDIR}/include //this contains cGo.h

#include <cuda_runtime.h>
#include <stdlib.h>
#include "cGo.h"
*/
import "C"

func useKernel(){
//other code
C.kernel_kValid(C.int(B), C.int(T), unsafe.Pointer(storageDevice), unsafe.Pointer(globDevice))
cudaErr, err = C.cudaDeviceSynchronize()
//rest of the code
}

所以对 CUDA 运行时 API 的所有调用都不会抛出错误，它只是我的内核包装器。这是我用 go 构建 cuda 包时的输出。

C:\Users\user\Documents\Repos\go\cuda_wrapper>go build cuda_wrapper\cuda
# cuda_wrapper/cuda
In file included from C:/Storage/Cuda/include/host_defines.h:50:0,
                 from C:/Storage/Cuda/include/device_types.h:53,
                 from C:/Storage/Cuda/include/builtin_types.h:56,
                 from C:/Storage/Cuda/include/cuda_runtime.h:86,
                 from C:\Go\workspace\src\cuda_wrapper\cuda\cuda.go:12:
C:/Storage/Cuda/include/crt/host_defines.h:84:0: warning: "__cdecl" redefined
 #define __cdecl

<built-in>: note: this is the location of the previous definition
# cuda_wrapper/cuda
C:\Users\user\AppData\Local\Temp\go-build038297194\cuda_wrapper\cuda\_obj\cuda.cgo2.o: In function `_cgo_440ebb0a3e25_Cfunc_kernel_kValid':
/tmp/go-build\cuda_wrapper\cuda\_obj/cgo-gcc-prolog:306: undefined reference to `kernel_kValid'
collect2.exe: error: ld returned 1 exit status

就在这里，我不太确定哪里出了问题。我一直在查看有关使用 cgo undefined reference 的问题，但我发现没有任何问题可以解决我的问题。我也一直在研究 CUDA 运行时 API 是用 C++ 编写的，这是否会影响 cgo 编译它的方式，但我还是没有发现任何结论。在这一点上，我认为自己比其他任何事情都更困惑，所以我希望更有知识的人能给我指明正确的方向。

最佳答案

名字管理很好。

这是我们用于 gorgonia 的解决方案:

#include <math.h>

#ifdef __cplusplus
extern "C" {
#endif


__global__ void sigmoid32(float* A, int size)
{
    int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
    int idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;
    if (idx >= size) {
        return;
    }
    A[idx] = 1 / (1 + powf((float)(M_E), (-1 * A[idx])));
}

#ifdef __cplusplus
}
#endif

所以...只需将内核包装器函数包装在 extern "C"

中

关于go - CUDA 内核包装器的共享库 undefined reference ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49042518/

有关go - CUDA 内核包装器的共享库 undefined reference的更多相关文章

ruby - 通过 ruby 进程共享变量 - 2
我正在编写一个gem，我必须在其中fork两个启动两个webrick服务器的进程。我想通过基类的类方法启动这个服务器，因为应该只有这两个服务器在运行，而不是多个。在运行时，我想调用这两个服务器上的一些方法来更改变量。我的问题是，我无法通过基类的类方法访问fork的实例变量。此外，我不能在我的基类中使用线程，因为在幕后我正在使用另一个不是线程安全的库。所以我必须将每个服务器派生到它自己的进程。我用类变量试过了，比如@@server。但是当我试图通过基类访问这个变量时，它是nil。我读到在Ruby中不可能在分支之间共享类变量，对吗？那么，还有其他解决办法吗？我考虑过使用单例，但我不确定这是
ruby-on-rails - Cucumber 是否只是 rspec 的包装器以帮助将测试组织成功能？ - 2
只是想确保我理解了事情。据我目前收集到的信息，Cucumber只是一个“包装器”，或者是一种通过将事物分类为功能和步骤来组织测试的好方法，其中实际的单元测试处于步骤阶段。它允许您根据事物的工作方式组织您的测试。对吗？最佳答案有点。它是一种组织测试的方式，但不仅如此。它的行为就像最初的Rails集成测试一样，但更易于使用。这里最大的好处是您的session在整个Scenario中保持透明。关于Cucumber的另一件事是您(应该)从使用您的代码的浏览器或客户端的角度进行测试。如果您愿意，您可以使用步骤来构建对象和设置状态，但通常您
ruby - 在模块/类之间共享全局记录器 - 2
在许多ruby类之间共享记录器实例的最佳(正确)方法是什么？现在我只是将记录器创建为全局$logger=Logger.new变量，但我觉得有更好的方法可以在不使用全局变量的情况下执行此操作。如果我有以下内容:moduleFooclassAclassBclassC...classZend在所有类之间共享记录器实例的最佳方式是什么？我是以某种方式在Foo模块中声明/创建记录器还是只是使用全局$logger没问题？最佳答案在模块中添加常量:moduleFooLogger=Logger.newclassAclassBclassC..
ruby - 如何使用 cucumber 在场景之间共享状态 - 2
我有一个功能“从外部网站导入文章”。在我的第一个场景中，我测试从外部网站导入链接列表。Feature:ImportingarticlesfromexternalwebsiteScenario:Searchingarticlesonexample.comandreturnthelinksGiventhereisanImporterAnditsURLis"http://example.com"Whenwesearchfor"demo"ThentheImportershouldreturn25linksAndoneofthelinksshouldbe"http://example.com/d
ruby - 为什么 Object 在 Ruby 中既包含内核又继承它？ - 2
在Ruby(1.8.X)中为什么Object既继承了内核又包含了内核？仅仅继承还不够吗？irb(main):006:0>Object.ancestors=>[Object,Kernel]irb(main):005:0>Object.included_modules=>[Kernel]irb(main):011:0>Object.superclass=>nil请注意，在Ruby1.9中情况类似(但更简洁):irb(main):001:0>Object.ancestors=>[Object,Kernel,BasicObject]irb(main):002:0>Object.included
ruby-on-rails - Textmate 'Go to symbol' 相当于 Vim - 2
在Railcasts上，我注意到一个非常有趣的功能“转到符号”窗口。它像Command-T一样工作，但显示当前文件中可用的类和方法。如何在vim中获取它？最佳答案尝试:helptags有各种程序和脚本可以生成标记文件。此外，标记文件格式非常简单，因此很容易将sed(1)或类似的脚本组合在一起，无论您使用何种语言，它们都可以生成标记文件。轻松获取标记文件(除了下载生成器之外)的关键在于格式化样式而不是实际解析语法。关于ruby-on-rails-Textmate'Gotosymbol
【RuntimeError: CUDA error: device-side assert triggered】问题与解决 - 2
RuntimeError:CUDAerror:device-sideasserttriggered问题描述解决思路发现问题：总结问题描述当我在调试模型的时候，出现了如下的问题/opt/conda/conda-bld/pytorch_1656352465323/work/aten/src/ATen/native/cuda/IndexKernel.cu:91:operator():block:[5,0,0],thread:[63,0,0]Assertion`index>=-sizes[i]&&index通过提示信息可以知道是个数组越界的问题。但是如图一中第二行话所说这个问题可能并不出在提示的代码段
ruby - Sinatra 路由中定义的全局变量是否在请求之间共享？ - 2
假设我有:get'/'do$random=Random.rand()response.body=$randomend如果我每秒有数千个请求到达/，$random是否会被共享并“泄漏”到上下文之外，或者它会像getblock的“本地”变量一样？我想如果它是在get'/'do的上下文之外定义的，它确实会被共享，但我想知道在ruby中是否有我不知道的$机制。最佳答案 ThispartoftheSinatraREADMEaboutscopeisalwayshelpfultoread但是，如果您只需要为请求保留变量，那么我认为我建议使用
ruby - 跨线程共享枚举器 - 2
我想从不同线程调用一个公共(public)枚举器。当我执行以下操作时，enum=(0..1000).to_enumt1=Thread.newdopenum.nextsleep(1)endt2=Thread.newdopenum.nextsleep(1)endt1.joint2.join它引发了一个错误:Fibercalledacrossthreads.当enum在从t1调用一次后从t2调用时。为什么Ruby设计为不允许跨线程调用枚举器(或纤程)，以及是否有其他方法可以提供类似的功能？我猜测枚举器/纤程上的操作的原子性在这里是相关的，但我不完全确定。如果这是问题所在，那么在使用时独占锁定
ruby - 两个 gem 共享相同的要求？ - 2
当我打电话时:require'retryable'这两个gem冲突:https://github.com/robertsosinski/retryablehttps://github.com/carlo/retryable因为他们都有一个“可重试”文件，所以他们要求用户要求。我对使用第一个gem很感兴趣，但这并不总是会发生。这段代码作为我自己的gem的一部分执行，它必须对所有用户都是可靠的。有没有办法从gem中专门要求(因为gem名称当然不同)？如何解决这个命名冲突？编辑:澄清一下，这是官方仓库，gem名称实际上是不同的(“retryable-rb”和“carlo-retryable”

go - CUDA 内核包装器的共享库 undefined reference

有关go - CUDA 内核包装器的共享库 undefined reference的更多相关文章

随机推荐