c++ - 为什么同一内核(超线程)中的两个线程比两个内核的 L1 写访问权限更差？

coder 2024-02-09 原文

我编写了一个 c/c++ 程序(printf 和 std:: 的混合)来了解不同的缓存性能。我想并行化一个在大块内存上计算的进程。我必须对相同的内存位置进行多次计算，因此我将结果写到位，覆盖源数据。完成第一个微积分后，我会用之前的结果再做一个。

我猜想如果我有两个线程，一个执行第一个微积分，另一个执行第二个，我会提高性能，因为每个线程完成一半的工作，从而使处理速度提高一倍。我已经阅读了缓存的工作原理，所以我知道如果做得不好，可能会更糟，所以我编写了一个小程序来衡量一切。

(请参阅下面的机器拓扑、CPU 类型和标志以及源代码。)

我看到了一些奇怪的结果。显然，为了执行计算。如果我在同一个缓冲区或两个不同的缓冲区(它们之间的内存距离)工作，我并不重要，除非它们在同一个核心中。我的意思是:最糟糕的结果是当两个线程在同一个内核(超线程)中时。我用 CPU 亲和性设置它们

我的程序有一些选项，但它们是不言自明的。

这些是命令和结果:

./main --loops 200 --same-buffer --flush

200000 loops.
Flushing caches.
Cache size: 32768
Using same buffer.
Running in cores 0 and 1.
Waiting 2 seconds just for threads to be ready.
Post threads to begin work 200000 iterations.
Thread two created, pausing.
Go ahead and calculate in 2...
Buffer address: 0x7f087c156010.
Waiting for thread semaphores.
Thread one created, pausing.
Go ahead and calculate in 1...
Buffer address: 0x7f087c156010.
Time 1 18.436685
Time 2 18.620263
We don't wait anymore.
Joining threads.
Dumping data.
Exiting from main thread.

我们可以看到它运行在核心 0 和 1，根据我的拓扑，不同的核心。缓冲区地址相同:0x7f087c156010。

时间:18 秒。

现在在同一个核心:

./main --loops 200 --same-buffer --same-core --flush

200000 loops.
Flushing caches.
Cache size: 32768
Using same buffer.
Using same core. (HyperThreading)
Thread one created, pausing.
Thread two created, pausing.
Running in cores 0 and 6.
Waiting 2 seconds just for threads to be ready.
Post threads to begin work 200000 iterations.
Waiting for thread semaphores.
Go ahead and calculate in 1...
Buffer address: 0x7f0a6bbe1010.
Go ahead and calculate in 2...
Buffer address: 0x7f0a6bbe1010.
Time 1 26.572419
Time 2 26.951195
We don't wait anymore.
Joining threads.
Dumping data.
Exiting from main thread.

根据我的拓扑，我们可以看到它在核心 0 和 6 中运行，相同的核心，两个超线程。相同的缓冲区。

时间:26秒。

所以慢了 10 秒。

这怎么可能？我知道如果缓存行不脏，就不会从内存(L1、2、3 或 RAM)中获取它。我已经让程序编写替代的 64 字节数组，与一个缓存行相同。如果一个线程写入缓存行 0，另一个线程写入缓存行 1，则不会发生缓存行冲突。

这是否意味着两个超线程，即使它们共享 L1 缓存，也不能同时写入？

显然，在两个不同的核心中工作比单独使用一个要好。

-- 编辑--

根据评论者和 Max Langhof 的建议，我已经包含了对齐缓冲区的代码。我还添加了一个选项来错位缓冲区，只是为了查看差异。

我不确定对齐和错误代码，但我已经从 here 复制了

就像他们告诉我的那样，测量未优化的代码是浪费时间。

对于优化代码，结果非常有趣。我发现令人惊讶的是，它需要相同的时间，甚至不对齐数据和两个内核，但我想这是因为内部循环中的工作量很小。 (我想这表明当今的处理器设计得有多好。)

数字(使用 perf stat -d -d -d 获取):

*** Same core

No optimization
---------------
No aligment
    39.866.074.445      L1-dcache-loads           # 1485,716 M/sec                    (21,75%)
        10.746.914      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (20,84%)
Aligment
    39.685.928.674      L1-dcache-loads           # 1470,627 M/sec                    (22,77%)
        11.003.261      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (27,37%)
Misaligment
    39.702.205.508      L1-dcache-loads           # 1474,958 M/sec                    (24,08%)
        10.740.380      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (29,05%)


Optimization
------------
No aligment
    39.702.205.508      L1-dcache-loads           # 1474,958 M/sec                    (24,08%)
        10.740.380      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (29,05%)
       2,390298203 seconds time elapsed
Aligment
        19.450.626      L1-dcache-loads           #   25,108 M/sec                    (23,21%)
         1.758.012      L1-dcache-load-misses     #    9,04% of all L1-dcache hits    (22,95%)
       2,400644369 seconds time elapsed
Misaligment
         2.687.025      L1-dcache-loads           #    2,876 M/sec                    (24,64%)
           968.413      L1-dcache-load-misses     #   36,04% of all L1-dcache hits    (12,98%)
       2,483825841 seconds time elapsed

*** Two cores

No optimization
---------------
No aligment
    39.714.584.586      L1-dcache-loads           # 2156,408 M/sec                    (31,17%)
       206.030.164      L1-dcache-load-misses     #    0,52% of all L1-dcache hits    (12,55%)
Aligment
    39.698.566.036      L1-dcache-loads           # 2129,672 M/sec                    (31,10%)
       209.659.618      L1-dcache-load-misses     #    0,53% of all L1-dcache hits    (12,54%)
Misaligment
         2.687.025      L1-dcache-loads           #    2,876 M/sec                    (24,64%)
           968.413      L1-dcache-load-misses     #   36,04% of all L1-dcache hits    (12,98%)


Optimization
------------
No aligment
        16.711.148      L1-dcache-loads           #    9,431 M/sec                    (31,08%)
       202.059.646      L1-dcache-load-misses     # 1209,13% of all L1-dcache hits    (12,87%)
       2,898511757 seconds time elapsed
Aligment
        18.476.510      L1-dcache-loads           #   10,484 M/sec                    (30,99%)
       202.180.021      L1-dcache-load-misses     # 1094,25% of all L1-dcache hits    (12,83%)
       2,894591875 seconds time elapsed
Misaligment
        18.663.711      L1-dcache-loads           #   11,041 M/sec                    (31,28%)
       190.887.434      L1-dcache-load-misses     # 1022,77% of all L1-dcache hits    (13,22%)
       2,861316941 seconds time elapsed

-- 结束编辑--

该程序会创建带有缓冲区转储的日志文件，因此我已验证它按预期工作(您可以在下面看到)。

我还有 ASM，我们可以在其中看到循环正在做某事。

 269:main.cc       ****             for (int x = 0; x < 64; ++x)
 1152                   .loc 1 269 0 is_stmt 1
 1153 0c0c C745F000         movl    $0, -16(%rbp)   #, x
 1153      000000
 1154               .L56:
 1155                   .loc 1 269 0 is_stmt 0 discriminator 3
 1156 0c13 837DF03F         cmpl    $63, -16(%rbp)  #, x
 1157 0c17 7F26             jg  .L55    #,
 270:main.cc       ****                 th->cache->cache[i].data[x] = '2';
 1158                   .loc 1 270 0 is_stmt 1 discriminator 2
 1159 0c19 488B45E8         movq    -24(%rbp), %rax # th, tmp104
 1160 0c1d 488B4830         movq    48(%rax), %rcx  # th_9->cache, _25
 1161 0c21 8B45F0           movl    -16(%rbp), %eax # x, tmp106
 1162 0c24 4863D0           movslq  %eax, %rdx  # tmp106, tmp105
 1163 0c27 8B45F4           movl    -12(%rbp), %eax # i, tmp108
 1164 0c2a 4898             cltq
 1165 0c2c 48C1E006         salq    $6, %rax    #, tmp109
 1166 0c30 4801C8           addq    %rcx, %rax  # _25, tmp109
 1167 0c33 4801D0           addq    %rdx, %rax  # tmp105, tmp110
 1168 0c36 C60032           movb    $50, (%rax) #, *_25
 269:main.cc       ****             for (int x = 0; x < 64; ++x)

这是转储的一部分:

== buffer ==============================================================================================================
00000001 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000002 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000003 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000004 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000005 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000006 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000007 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000008 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32

我的机器拓扑:

这是 CPU 类型和标志。

processor   : 11
vendor_id   : GenuineIntel
cpu family  : 6
model       : 45
model name  : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
stepping    : 7
microcode   : 0x70b
cpu MHz     : 1504.364
cache size  : 15360 KB
physical id : 0
siblings    : 12
core id     : 5
cpu cores   : 6
apicid      : 11
initial apicid  : 11
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb kaiser tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 4987.77
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

这是完整的源代码:

//
//
//
//
#include <emmintrin.h>
#include <x86intrin.h>
#include <stdio.h>
#include <time.h>
#include <ctime>
#include <semaphore.h>
#include <pthread.h>
#include <string.h>
#include <string>


struct cache_line {
    char data[64];
};

//
// 32768 = 32 Kb = 512 64B cache lines
struct cache_l1 {
    struct cache_line cache[512];
};

size_t TOTAL = 100000;

void * thread_one (void * data);
void * thread_two (void * data);

void dump (FILE * file, char * buffer, size_t size);

class thread {
public:
    sem_t sem;
    sem_t * glob;
    pthread_t thr;
    struct cache_l1 * cache;
};

bool flush = false;

int main (int argc, char ** argv)
{
    bool same_core = false;
    bool same_buffer = false;
    bool align = false;
    bool misalign = false;
    size_t reserve_mem = 32768; // 15MB 15.728.640
    std::string file_name ("pseudobench_");
    std::string core_option ("diffcore");
    std::string buffer_option ("diffbuff");
    std::string cache_option ("l1");

    for (int i = 1; i < argc; ++i) {
        if (::strcmp("--same-core", argv[i]) == 0) {

            same_core = true;
            core_option = "samecore";

        } else if (::strcmp("--same-buffer", argv[i]) == 0) {

            same_buffer = true;
            buffer_option = "samebuffer";

        } else if (::strcmp("--l1", argv[i]) == 0) {

            // nothing already L1 cache size

        } else if (::strcmp("--l2", argv[i]) == 0) {

            reserve_mem *= 8; // 256KB, L2 cache size
            cache_option = "l2";

        } else if (::strcmp("--l3", argv[i]) == 0) {

            reserve_mem *= 480; // 15MB, L3 cache size
            cache_option = "l3";

        } else if (::strcmp("--ram", argv[i]) == 0) {

            reserve_mem *= 480; // 15MB, plus two times L1 cache size
            reserve_mem += sizeof(struct cache_l1) * 2;
            cache_option = "ram";

        } else if (::strcmp("--loops", argv[i]) == 0) {

            TOTAL = ::strtol(argv[++i], nullptr, 10) * 1000;
            printf ("%ld loops.\n", TOTAL);

        } else if (::strcmp("--align", argv[i]) == 0) {

            align = true;
            printf ("Align memory to 16 bytes.\n");

        } else if (::strcmp("--misalign", argv[i]) == 0) {

            misalign = true;
            printf ("Misalign memory.\n");

        } else if (::strcmp("--flush", argv[i]) == 0) {

            flush = true;
            printf ("Flushing caches.\n");

        } else if (::strcmp("-h", argv[i]) == 0) {

            printf ("There is no help here. Please put loops in units, "
                    "they will be multiplicated by thousands. (Default 100.000 EU separator)\n");
        } else {
            printf ("Unknown option: '%s', ignoring it.\n", argv[i]);
        }
    }

    char * ch = new char[(reserve_mem * 2) + (sizeof(struct cache_l1) * 2) + 16];
    struct cache_l1 * cache4 = nullptr;
    struct cache_l1 * cache5 = nullptr;

    if (align) {
        // Align memory (void *)(((uintptr_t)ch+15) & ~ (uintptr_t)0x0F);
        cache4 = (struct cache_l1 *) (((uintptr_t)ch + 15) & ~(uintptr_t)0x0F);
        cache5 = (struct cache_l1 *) &cache4[reserve_mem - sizeof(struct cache_l1)];
        cache5 = (struct cache_l1 *)(((uintptr_t)cache5) & ~(uintptr_t)0x0F);
    } else {
        cache4 = (struct cache_l1 *) ch;
        cache5 = (struct cache_l1 *) &ch[reserve_mem - sizeof(struct cache_l1)];
    }

    if (misalign) {
        cache4 = (struct cache_l1 *) ((char *)cache4 + 5);
        cache5 = (struct cache_l1 *) ((char *)cache5 + 5);
    }

    (void)cache4;
    (void)cache5;

    printf ("Cache size: %ld\n", sizeof(struct cache_l1));

    if (cache_option == "l1") {
        // L1 doesn't allow two buffers, so same buffer
        buffer_option = "samebuffer";
    }

    sem_t globsem;

    thread th1;
    thread th2;

    if (same_buffer) {
        printf ("Using same buffer.\n");
        th1.cache = cache5;
    } else {
        th1.cache = cache4;
    }
    th2.cache = cache5;

    sem_init (&globsem, 0, 0);

    if (sem_init(&th1.sem, 0, 0) < 0) {
        printf ("There is an error with the 1 semaphore.\n");
    }
    if (sem_init(&th2.sem, 0, 0) < 0) {
        printf ("There is an error with the 2 semaphore.\n");
    }

    th1.glob = &globsem;
    th2.glob = &globsem;

    cpu_set_t cpuset;
    int rc = 0;

    pthread_create (&th1.thr, nullptr, thread_one, &th1);
    CPU_ZERO (&cpuset);
    CPU_SET (0, &cpuset);
    rc = pthread_setaffinity_np(th1.thr,
                                sizeof(cpu_set_t),
                                &cpuset);
    if (rc != 0) {
        printf ("Can't change affinity of thread one!\n");
    }

    pthread_create (&th2.thr, nullptr, thread_two, &th2);
    CPU_ZERO (&cpuset);
    int cpu = 1;

    if (same_core) {
        printf ("Using same core. (HyperThreading)\n");
        cpu = 6; // Depends on CPU topoglogy (see that with lstopo)
    }

    CPU_SET (cpu, &cpuset);
    rc = pthread_setaffinity_np(th2.thr,
                                sizeof(cpu_set_t),
                                &cpuset);
    if (rc != 0) {
        printf ("Can't change affinity of thread two!\n");
    }

    printf ("Running in cores 0 and %d.\n", cpu);

    fprintf (stderr, "Waiting 2 seconds just for threads to be ready.\n");
    struct timespec time;
    time.tv_sec = 2;
    nanosleep (&time, nullptr);

    fprintf (stderr, "Post threads to begin work %ld iterations.\n", TOTAL);

    sem_post (&globsem);
    sem_post (&globsem);

    printf ("Waiting for thread semaphores.\n");

    sem_wait (&th1.sem);
    sem_wait (&th2.sem);

    printf ("We don't wait anymore.\n");

    printf ("Joining threads.\n");
    pthread_join (th1.thr, nullptr);
    pthread_join (th2.thr, nullptr);

    printf ("Dumping data.\n");
    file_name += core_option;
    file_name += "_";
    file_name += buffer_option;
    file_name += "_";
    file_name += cache_option;
    file_name += ".log";
    FILE * file = ::fopen(file_name.c_str(), "w");
    if (same_buffer)
        dump (file, (char *)cache5, sizeof(struct cache_l1));
    else {
        dump (file, (char *)cache4, sizeof(struct cache_l1));
        dump (file, (char *)cache5, sizeof(struct cache_l1));
    }
    printf ("Exiting from main thread.\n");
    return 0;
}

void * thread_one (void * data)
{
    thread * th = (thread *) data;
    printf ("Thread one created, pausing.\n");
    if (flush)
        _mm_clflush (th->cache);
    sem_wait (th->glob);

    printf ("Go ahead and calculate in 1...\n");
    printf ("Buffer address: %p.\n", th->cache);
    clock_t begin, end;
    double time_spent;
    register uint64_t counter = 0;
    begin = clock();
    for (size_t z = 0; z < TOTAL; ++z ) {
        ++counter;
        for (int i = 0; i < 512; i += 2) {
            ++counter;
            for (int x = 0; x < 64; ++x) {
                ++counter;
                th->cache->cache[i].data[x] = '1';
            }
        }
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;    
    printf ("Time 1 %f %ld\n", time_spent, counter);

    sem_post (&th->sem);

    return nullptr;
}

void * thread_two (void * data)
{
    thread * th = (thread *) data;
    printf ("Thread two created, pausing.\n");
    if (flush)
        _mm_clflush (th->cache);
    sem_wait (th->glob);

    printf ("Go ahead and calculate in 2...\n");
    printf ("Buffer address: %p.\n", th->cache);
    clock_t begin, end;
    double time_spent;
    register uint64_t counter = 0;
    begin = clock();
    for (size_t z = 0; z < TOTAL; ++z ) {
        ++counter;
        for (int i = 1; i < 512; i += 2) {
            ++counter;;
            for (int x = 0; x < 64; ++x) {
                ++counter;
                th->cache->cache[i].data[x] = '2';
            }
        }
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;    
    printf ("Time 2 %f  %ld\n", time_spent, counter);

    sem_post (&th->sem);

    return nullptr;
}

void dump (FILE * file, char * buffer, size_t size)
{
    size_t lines = 0;
    fprintf (file, "\n");
    fprintf (file, "== buffer =================================================="
             "============================================================\n");

    for (size_t i = 0; i < size; i += 16) {
        fprintf (file, "%08ld %p ", ++lines, &buffer[i]);
        for (size_t x = i; x < (i+16); ++x) {
            if (buffer[x] >= 32 && buffer[x] < 127)
                fprintf (file, "%c ", buffer[x]);
            else
                fprintf (file, ". ");
        }
        for (size_t x = i; x < (i+16); ++x) {
            fprintf (file, "0x%02x ", buffer[x]);
        }
        fprintf (file, "\n");
    }
    fprintf (file, "== buffer =================================================="
             "============================================================\n");
}

最佳答案

Apparently, there is no difference in taking data from L1, L2, L3 or RAM in order to do the calculations.

在请求下一个之前，您将完全遍历每个级别(和每个页面)的每个缓存行。内存访问速度很慢，但不会慢到您可以在下一个页面到达之前遍历整个页面。如果您每次访问不同的 L3 缓存行或不同的 RAM 页面，您肯定会注意到不同之处。但是你这样做的方式，让你的 CPU 在每个 L2、L3 或 RAM 请求之间处理大量指令，完全隐藏任何类型的缓存未命中延迟。

因此，您丝毫不受内存限制。您基本上拥有最良性的使用模式:您的所有数据几乎一直都在缓存中。有时您会遇到缓存未命中，但与您花费的时间相比，获取时间就显得微不足道了使用缓存数据。此外，您的 CPU 可能会预测您的(非常可预测的)使用模式，并且在您访问它之前就已经预取了内存。

So 10 seconds slower.
How's that possible? I've understood if the cache line isn't dirty, it wouldn't be fetched from memory (either, L1, 2, 3 or RAM).

如上所示，您不受内存限制。你受制于你的 CPU 处理指令的速度(编辑:禁用优化会使指令数量增加)，并且两个超线程线程不会那么擅长也就不足为奇了作为独立物理内核上的两个线程。

对于这一观察特别重要的是，并非所有资源都为每对超线程内核复制。例如，执行端口(例如加法器、除法器、浮点单元等)不是共享的。这是 Skylake 调度程序的图表，用于演示该概念:

使用超线程时，两个线程都必须争夺这些资源(即使是单线程程序也会由于乱序执行而受到这种设计的影响)。此设计中有四个简单整数 ALU，但只有一个 Store Data 端口。因此，同一个内核(在这个 Haswell CPU 中)上的两个线程不能同时存储数据，但它们可以同时计算多个整数运算(注意:不能保证实际上端口 4 是争用的来源 - 一些英特尔工具可能能够为你解决这个问题)。在两个不同的物理内核之间拆分负载时，不存在此限制。

同步不同物理内核之间的 L2 缓存线可能会产生一些开销(因为 L2 缓存显然不会在您的 CPU 的所有内核之间共享)，但这很难从这里衡量。

我在此页面中找到了上面的图片，它对上面的内容(以及更多)进行了更深入的解释:https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)

关于c++ - 为什么同一内核(超线程)中的两个线程比两个内核的 L1 写访问权限更差？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51965481/

amp 43 cache 0x 34 c++multithreading x86

有关c++ - 为什么同一内核(超线程)中的两个线程比两个内核的 L1 写访问权限更差？的更多相关文章

ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - Rails - 子类化模型的设计模式是什么？ - 2
我有一个模型:classItem项目有一个属性“商店”基于存储的值，我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式？如果方法中没有大的if-else语句，这是如何干净利落地完成的？最佳答案通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.co
ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby 中生成它们？ - 2
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
ruby - 解析 RDFa、微数据等的最佳方式是什么，使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2
我主要使用Ruby来执行此操作，但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式，例如使用这个yaml文件，它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i
ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996？但是 4.2%2==0.2 - 2
为什么4.1%2返回0.0999999999999996？但是4.2%2==0.2。最佳答案参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意，这不是与Ruby相关的问题，而是与所有编程语言相关的问题，因为它来自计算机表示实数的方式。关于ruby-为什么4.1%2使用Ruby返
ruby-on-rails - 如何在 ruby 中使用两个参数异步运行 exe？ - 2
exe应该在我打开页面时运行。异步进程需要运行。有什么方法可以在ruby中使用两个参数异步运行exe吗？我已经尝试过ruby命令-system()、exec()但它正在等待过程完成。我需要用参数启动exe，无需等待进程完成是否有任何rubygems会支持我的问题？最佳答案您可以使用Process.spawn和Process.wait2:pid=Process.spawn'your.exe','--option'#Later...pid,status=Process.wait2pid您的程序将作为解释器的子进程执行。除
ruby-on-rails - 在混合/模块中覆盖模型的属性访问器 - 2
我有一个包含模块的模型。我想在模块中覆盖模型的访问器方法。例如:classBlah这显然行不通。有什么想法可以实现吗？最佳答案您的代码看起来是正确的。我们正在毫无困难地使用这个确切的模式。如果我没记错的话，Rails使用#method_missing作为属性setter，因此您的模块将优先，阻止ActiveRecord的setter。如果您正在使用ActiveSupport::Concern(参见thisblogpost)，那么您的实例方法需要进入一个特殊的模块:classBlah
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 续集在添加关联时访问many_to_many连接表 - 2
我正在使用Sequel构建一个愿望list系统。我有一个wishlists和itemstable和一个items_wishlists连接表(该名称是续集选择的名称)。items_wishlists表还有一个用于facebookid的额外列(因此我可以存储opengraph操作)，这是一个NOTNULL列。我还有Wishlist和Item具有续集many_to_many关联的模型已建立。Wishlist类也有:selectmany_to_many关联的选项设置为select:[:items.*,:items_wishlists__facebook_action_id].有没有一种方法可以
ruby - ruby 中的 TOPLEVEL_BINDING 是什么？ - 2
它不等于主线程的binding，这个toplevel作用域是什么？此作用域与主线程中的binding有何不同？>ruby-e'putsTOPLEVEL_BINDING===binding'false 最佳答案事实是，TOPLEVEL_BINDING始终引用Binding的预定义全局实例，而Kernel#binding创建的新实例>Binding每次封装当前执行上下文。在顶层，它们都包含相同的绑定(bind)，但它们不是同一个对象，您无法使用==或===测试它们的绑定(bind)相等性。putsTOPLEVEL_BINDINGput

c++ - 为什么同一内核(超线程)中的两个线程比两个内核的 L1 写访问权限更差？

有关c++ - 为什么同一内核(超线程)中的两个线程比两个内核的 L1 写访问权限更差？的更多相关文章

随机推荐