草庐IT

mongodb - 在单个结果中组合聚合操作

coder 2023-11-03 原文

我想合并两个聚合操作。第一个操作返回,例如:

{ "_id" : "Colors", "count" : 12 }
{ "_id" : "Animals", "count" : 6 }

第二个操作返回,例如:

{ "_id" : "Red", "count" : 10 }
{ "_id" : "Blue", "count" : 9 }
{ "_id" : "Green", "count" : 9 }
{ "_id" : "White", "count" : 7 }
{ "_id" : "Yellow", "count" : 7 }
{ "_id" : "Orange", "count" : 7 }
{ "_id" : "Black", "count" : 5 }
{ "_id" : "Goose", "count" : 4 }
{ "_id" : "Chicken", "count" : 3 }
{ "_id" : "Grey", "count" : 3 }
{ "_id" : "Cat", "count" : 3 }
{ "_id" : "Rabbit", "count" : 3 }
{ "_id" : "Duck", "count" : 3 }
{ "_id" : "Turkey", "count" : 2 }
{ "_id" : "Elephant", "count" : 2 }
{ "_id" : "Shark", "count" : 2 }
{ "_id" : "Fish", "count" : 2 }
{ "_id" : "Tiger", "count" : 2 }
{ "_id" : "Purple", "count" : 1 }
{ "_id" : "Pink", "count" : 1 }

如何结合这 2 个操作来实现以下目标?

{ "_id" : "Colors", "count" : 12, "items" :
    [
        { "_id" : "Red", "count" : 10 },
        { "_id" : "Blue", "count" : 9 },
        { "_id" : "Green", "count" : 9 },
        { "_id" : "White", "count" : 7 },
        { "_id" : "Yellow", "count" : 7 },
        { "_id" : "Orange", "count" : 7 },
        { "_id" : "Black", "count" : 5 },
        { "_id" : "Grey", "count" : 3 },
        { "_id" : "Purple", "count" : 1 },
        { "_id" : "Pink", "count" : 1 }
    ]
},
{ "_id" : "Animals", "count" : 6, "items" :
    [
        { "_id" : "Goose", "count" : 4 },
        { "_id" : "Chicken", "count" : 3 },
        { "_id" : "Cat", "count" : 3 },
        { "_id" : "Rabbit", "count" : 3 },
        { "_id" : "Duck", "count" : 3 },
        { "_id" : "Turkey", "count" : 2 },
        { "_id" : "Elephant", "count" : 2 },
        { "_id" : "Shark", "count" : 2 },
        { "_id" : "Fish", "count" : 2 },
        { "_id" : "Tiger", "count" : 2 }
    ]
}

架构

var ListSchema = new Schema({
    created: {
        type: Date,
        default: Date.now
    },
    title: {
        type: String,
        default: '',
        trim: true,
        required: 'Title cannot be blank'
    },
    items: {
        type: Array,
        default: [String],
        trim: true
    },
    creator: {
        type: Schema.ObjectId,
        ref: 'User'
    }
});

操作一

db.lists.aggregate(
      [
        { $group: { _id: "$title", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ]
    )

操作2

db.lists.aggregate(
      [
        { $unwind: "$items" },
        { $group: { _id: "$items", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ]
    )

最佳答案

这实际上取决于您在响应中所追求的结果类型。您询问的内容似乎表明您正在寻找结果中的“方面计数”,但稍后我会谈到这一点。

作为基本结果,这种方法没有任何问题:

    Thing.aggregate(
      [
        { "$group": {
          "_id": {
            "type": "$type", "name": "$name"
          },
          "count": { "$sum": 1 }
        }},
        { "$group": {
          "_id": "$_id.type",
          "count": { "$sum": "$count" },
          "names": {
            "$push": { "name": "$_id.name", "count": "$count" }
          }
        }}
      ],
      function(err,results) {
        console.log(JSON.stringify(results, undefined, 2));
        callback(err);
      }
    )

这应该会给你这样的结果:

[
  {
    "_id": "colours",
    "count": 50102,
    "names": [
      { "name": "Green",  "count": 9906  },
      { "name": "Yellow", "count": 10093 },
      { "name": "Red",    "count": 10083 },
      { "name": "Orange", "count": 9997  },
      { "name": "Blue",   "count": 10023 }
    ]
  },
  {
    "_id": "animals",
    "count": 49898,
    "names": [
      { "name": "Tiger",    "count": 9710  },
      { "name": "Lion",     "count": 10058 },
      { "name": "Elephant", "count": 10069 },
      { "name": "Monkey",   "count": 9963  },
      { "name": "Bear",     "count": 10098 }
    ]
  }
]

这里最基本的方法是简单地 $group分两个阶段,第一阶段将键组合聚合到最低(最细粒度)分组级别,然后再次处理 $group 以基本上“加起来”最高级别的总数(最小粒度)分组级别,因此也将较低的结果添加到项目数组中。

但这并不像在“方面计数”中那样“分离”,因此这样做变得有点复杂,也有点疯狂。但首先是示例:

    Thing.aggregate(
      [
        { "$group": {
          "_id": {
            "type": "$type",
            "name": "$name"
          },
          "count": { "$sum": 1 }
        }},
        { "$group": {
          "_id": "$_id.type",
          "count": { "$sum": "$count" },
          "names": {
            "$push": { "name": "$_id.name", "count": "$count" }
          }
        }},
        { "$group": {
          "_id": null,
          "types": {
            "$push": {
              "type": "$_id", "count": "$count"
            }
          },
          "names": { "$push": "$names" }
        }},
        { "$unwind": "$names" },
        { "$unwind": "$names" },
        { "$group": {
          "_id": "$types",
          "names": { "$push": "$names" }
        }},
        { "$project": {
          "_id": 0,
          "facets": {
            "types": "$_id",
            "names": "$names",
          },
          "data": { "$literal": [] }
        }}
      ],
      function(err,results) {
        console.log(JSON.stringify(results[0], undefined, 2));
        callback(err);
      }
    );

这将产生这样的输出:

{
  "facets": {
    "types": [
      { "type": "colours", "count": 50102 },
      { "type": "animals", "count": 49898 }
    ],
    "names": [
      { "name": "Green",    "count": 9906  },
      { "name": "Yellow",   "count": 10093 },
      { "name": "Red",      "count": 10083 },
      { "name": "Orange",   "count": 9997  },
      { "name": "Blue",     "count": 10023 },
      { "name": "Tiger",    "count": 9710  },
      { "name": "Lion",     "count": 10058 },
      { "name": "Elephant", "count": 10069 },
      { "name": "Monkey",   "count": 9963  },
      { "name": "Bear",     "count": 10098 }
    ]
  },
  "data": []
}

但显而易见的是,虽然“可能”,但在管道中进行的这种“杂耍”以产生这种输出格式并不是真正有效的。与第一个示例相比,这里有很多开销只是简单地将结果拆分为它们自己的数组响应并且独立于分组键。随着要生成的“方面”越来越多,这明显变得更加复杂。

此外,正如此处输出中所暗示的,人们通常对“方面计数”的要求是,除了聚合方面之外,结果“数据”也包含在响应(可能已分页)中。因此,进一步的并发症应该在这里显而易见:

        { "$group": {
          "_id": null,
          (...)

此类操作的要求基本上是将每条数据“填充”到单个对象中。在大多数情况下,当然在您想要结果中的实际数据(在此示例中使用 100,000)的情况下,遵循这种方法变得完全不切实际,并且几乎肯定会超过 16MB 的 BSON 文档大小限制。

在这种情况下,如果您想在响应中生成结果和该数据的“方面”,那么这里最好的方法是将每个聚合和输出页面作为单独的查询操作运行,并“流式传输”输出JSON(或其他格式)返回给接收客户端。

作为一个独立的例子:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;


mongoose.connect('mongodb://localhost/things');

var data = {
      "colours": [
        "Red","Blue","Green","Yellow","Orange"
      ],
      "animals": [
        "Lion","Tiger","Bear","Elephant","Monkey"
      ]
    },
    dataKeys = Object.keys(data);

var thingSchema = new Schema({
  "name": String,
  "type": String
});

var Thing = mongoose.model( 'Thing', thingSchema );

var writer = process.stdout;

mongoose.connection.on("open",function(err) {
  if (err) throw err;
  async.series(
    [
      function(callback) {
        process.stderr.write("removing\n");
        Thing.remove({},callback);
      },
      function(callback) {
        process.stderr.write("inserting\n");
        var bulk = Thing.collection.initializeUnorderedBulkOp(),
            count = 0;

        async.whilst(
          function() { return count < 100000; },
          function(callback) {
            var keyLen    = dataKeys.length,
                keyIndex  = Math.floor(Math.random(keyLen)*keyLen),
                type      = dataKeys[keyIndex],
                types     = data[type],
                typeLen   = types.length,
                nameIndex = Math.floor(Math.random(typeLen)*typeLen),
                name      = types[nameIndex];

            var obj = { "type": type, "name": name };
            bulk.insert(obj);
            count++;

            if ( count % 1000 == 0 ) {
              process.stderr.write('insert count: ' + count + "\n");
              bulk.execute(function(err,resp) {
                bulk = Thing.collection.initializeUnorderedBulkOp();
                callback(err);
              });
            } else {
              callback();
            }

          },
          callback
        );
      },

      function(callback) {
        writer.write("{ \n  \"page\": 1,\n  \"pageSize\": 25,\n")
        writer.write("  \"facets\":  {\n");      // open object response

        var stream = Thing.collection.aggregate(
          [
            { "$group": {
              "_id": "$name",
              "count": { "$sum": 1 }
            }}
          ],
          {
            "cursor": {
              "batchSize": 1000
            }
          }
        );

        var counter = 0;

        stream.on("data",function(data) {
          stream.pause();

          if ( counter == 0 ) {
            writer.write("    \"names\": [\n");
          } else {
            writer.write(",\n");
          }

          data = { "name": data._id, "count": data.count };

          writer.write("      " + JSON.stringify(data));

          counter++;
          stream.resume();
        });

        stream.on("end",function() {
          writer.write("\n    ],\n");

          var stream = Thing.collection.aggregate(
            [
              { "$group": {
                "_id": "$type",
                "count": { "$sum": 1 }
              }}
            ],
            {
              "cursor": {
                "batchSize": 1000
              }
            }
          );

          var counter = 0;
          stream.on("data",function(data) {
            stream.pause();

            if ( counter == 0 ) {
              writer.write("    \"types\": [\n");
            } else {
              writer.write(",\n");
            }

            data = { "name": data._id, "count": data.count };

            writer.write("      " + JSON.stringify(data));

            counter++;
            stream.resume();
          });

          stream.on("end",function() {
            writer.write("\n    ]\n  },\n");

            var stream = Thing.find({}).limit(25).stream();
            var counter = 0;

            stream.on("data",function(data) {
              stream.pause();
              if ( counter == 0 ) {
                writer.write("  \"data\": [\n");
              } else {
                writer.write(",\n");
              }

              writer.write("    " + JSON.stringify(data));

              counter++;
              stream.resume();

            });

            stream.on("end",function() {
                writer.write("\n  ]\n}\n");
                callback();
            });

          });

        });
      }
    ],
    function(err) {
      if (err) throw err;
      process.exit();
    }
  );
});

输出如下:

{
  "page": 1,
  "pageSize": 25,
  "facets":  {
    "names": [
      {"name":"Red","count":10007},
      {"name":"Tiger","count":10012},
      {"name":"Yellow","count":10119},
      {"name":"Monkey","count":9970},
      {"name":"Elephant","count":10046},
      {"name":"Bear","count":10082},
      {"name":"Orange","count":9982},
      {"name":"Green","count":10005},
      {"name":"Blue","count":9884},
      {"name":"Lion","count":9893}
    ],
    "types": [
      {"name":"colours","count":49997},
      {"name":"animals","count":50003}
    ]
  },
  "data": [
    {"_id":"55bf141f3edc150b6abdcc02","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc81b","type":"colours","name":"Blue"},
    {"_id":"55bf141f3edc150b6abdc81c","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc81d","type":"animals","name":"Bear"},
    {"_id":"55bf141f3edc150b6abdc81e","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc81f","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc820","type":"colours","name":"Green"},
    {"_id":"55bf141f3edc150b6abdc821","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc822","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc823","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc824","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc825","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc826","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc827","type":"colours","name":"Blue"},
    {"_id":"55bf141f3edc150b6abdc828","type":"animals","name":"Tiger"},
    {"_id":"55bf141f3edc150b6abdc829","type":"colours","name":"Red"},
    {"_id":"55bf141f3edc150b6abdc82a","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc82b","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc82c","type":"animals","name":"Tiger"},
    {"_id":"55bf141f3edc150b6abdc82d","type":"animals","name":"Bear"},
    {"_id":"55bf141f3edc150b6abdc82e","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc82f","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc830","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc831","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc832","type":"animals","name":"Elephant"}
  ]
}

这里有一些注意事项,特别是 Mongoose .aggregate()并不真正直接支持标准节点流接口(interface)。 .cursor() 提供了一个 .each() 方法在聚合方法上,但是 core API method 隐含的“流”这里提供了更多的控制权,所以 .collection 方法在这里获取底层 driver object是可取的。希望 future 的 Mongoose 版本会考虑这一点。

因此,如果您的最终目标是这样一个“方面计数”以及此处演示的结果,那么每个聚合和结果都最适合以演示的方式“流式传输”。如果没有它,聚合就会变得过于复杂,并且很可能会超过 BSON 限制,就像在这种情况下这样做一样。

关于mongodb - 在单个结果中组合聚合操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31777590/

有关mongodb - 在单个结果中组合聚合操作的更多相关文章

  1. 报告回顾丨模型进化狂飙,DetectGPT能否识别最新模型生成结果? - 2

    导读语言模型给我们的生产生活带来了极大便利,但同时不少人也利用他们从事作弊工作。如何规避这些难辨真伪的文字所产生的负面影响也成为一大难题。在3月9日智源Live第33期活动「DetectGPT:判断文本是否为机器生成的工具」中,主讲人Eric为我们讲解了DetectGPT工作背后的思路——一种基于概率曲率检测的用于检测模型生成文本的工具,它可以帮助我们更好地分辨文章的来源和可信度,对保护信息真实、防止欺诈等方面具有重要意义。本次报告主要围绕其功能,实现和效果等展开。(文末点击“阅读原文”,查看活动回放。)Ericmitchell斯坦福大学计算机系四年级博士生,由ChelseaFinn和Chri

  2. ruby - 如何使用 Selenium Webdriver 根据 div 的内容执行操作? - 2

    我有一个使用SeleniumWebdriver和Nokogiri的Ruby应用程序。我想选择一个类,然后对于那个类对应的每个div,我想根据div的内容执行一个Action。例如,我正在解析以下页面:https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=puppies这是一个搜索结果页面,我正在寻找描述中包含“Adoption”一词的第一个结果。因此机器人应该寻找带有className:"result"的div,对于每个检查它的.descriptiondiv是否包含单词“adoption

  3. ruby-on-rails - 如何处理 Grape 中特定操作的过滤器之前? - 2

    我正在我的Rails项目中安装Grape以构建RESTfulAPI。现在一些端点的操作需要身份验证,而另一些则不需要身份验证。例如,我有users端点,看起来像这样:moduleBackendmoduleV1classUsers现在如您所见,除了password/forget之外的所有操作都需要用户登录/验证。创建一个新的端点也没有意义,比如passwords并且只是删除password/forget从逻辑上讲,这个端点应该与用户资源。问题是Grapebefore过滤器没有像except,only这样的选项,我可以在其中说对某些操作应用过滤器。您通常如何干净利落地处理这种情况?

  4. ruby-on-rails - 在 Ruby on Rails 中发送响应之前如何等待多个异步操作完成? - 2

    在我做的一些网络开发中,我有多个操作开始,比如对外部API的GET请求,我希望它们同时开始,因为一个不依赖另一个的结果。我希望事情能够在后台运行。我找到了concurrent-rubylibrary这似乎运作良好。通过将其混合到您创建的类中,该类的方法具有在后台线程上运行的异步版本。这导致我编写如下代码,其中FirstAsyncWorker和SecondAsyncWorker是我编写的类,我在其中混合了Concurrent::Async模块,并编写了一个名为“work”的方法来发送HTTP请求:defindexop1_result=FirstAsyncWorker.new.async.

  5. ruby - 在 Ruby 中是否有一种惯用的方法来操作 2 个数组? - 2

    a=[3,4,7,8,3]b=[5,3,6,8,3]假设数组长度相同,是否有办法使用each或其他一些惯用方法从两个数组的每个元素中获取结果?不使用计数器?例如获取每个元素的乘积:[15,12,42,64,9](0..a.count-1).eachdo|i|太丑了...ruby1.9.3 最佳答案 使用Array.zip怎么样?:>>a=[3,4,7,8,3]=>[3,4,7,8,3]>>b=[5,3,6,8,3]=>[5,3,6,8,3]>>c=[]=>[]>>a.zip(b)do|i,j|c[[3,5],[4,3],[7,6],

  6. ruby - 最多 n 的组合 - 2

    给定一个数组a,什么是实现其组合直到第n的最佳方法?例如:a=%i[abc]n=2#Expected=>[[],[:a],[:b],[:c],[:a,b],[:b,:c],[:c,:a]] 最佳答案 做如下:a=%w[abc]n=30.upto(n).flat_map{|i|a.combination(i).to_a}#=>[[],["a"],["b"],["c"],["a","b"],#["a","c"],["b","c"],["a","b","c"]] 关于ruby-最多n的组合,我

  7. ruby-on-rails - 如何让 Rails View 返回其关联的操作名称? - 2

    我有一个非常简单的Controller来管理我的Rails应用程序中的静态页面:classPagesController我怎样才能让View模板返回它自己的名字,这样我就可以做这样的事情:#pricing.html.erb#-->"Pricing"感谢您的帮助。 最佳答案 4.3RoutingParametersTheparamshashwillalwayscontainthe:controllerand:actionkeys,butyoushouldusethemethodscontroller_nameandaction_nam

  8. ruby-on-rails - 将保存回调添加到单个 ActiveRecord 实例,可以吗? - 2

    是否可以为单个ActiveRecord实例添加回调?作为进一步的限制,这是继续使用库,所以我无法控制该类(除了对其进行猴子修补)。这或多或少是我想做的:defdo_something_creazymessage=Message.newmessage.on_save_call:do_even_more_crazy_stuffenddefdo_even_more_crazy_stuff(message)puts"Message#{message}hasbeensaved!Hallelujah!"end 最佳答案 你可以通过在创建对象后立

  9. ruby - Rails 组合多个 activerecord 关系 - 2

    我想合并多个事件记录关系例如,apple_companies=Company.where("namelike?","%apple%")banana_companies=Company.where("namelike?","%banana%")我想结合这两个关系。不是合并,合并是apple_companies.merge(banana_companies)=>Company.where("namelike?andnamelike?","%apple%","%banana%")我要Company.where("名字像?还是名字像?","%apple%","%banana%")之后,我会写代

  10. ruby - Rails Elasticsearch 聚合 - 2

    不知何故,我似乎无法获得包含我的聚合的响应...使用curl它按预期工作:HBZUMB01$curl-XPOST"http://localhost:9200/contents/_search"-d'{"size":0,"aggs":{"sport_count":{"value_count":{"field":"dwid"}}}}'我收到回复:{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":90,"max_score":0.0,"hits":[]},"a

随机推荐