草庐IT

elasticsearch之多索引查询

无风听海 2024-01-07 原文

一、问题源起

在elasticsearch的查询中,我们一般直接通过URL来设置要search的index; 如果我们需要查询的索引比较多并且没有什么规律的话,就会面临一个尴尬的局面,超过URL的长度限制;

二、测试环境

elasticsearch 6.8.12

测试数据

新增三个测试的index,每个index里边一个document;

PUT test1/_doc/1
{
  "id":1,
  "name":"test1-1"
}


# {
#   "_index" : "test1",
#   "_type" : "_doc",
#   "_id" : "1",
#   "_version" : 1,
#   "result" : "created",
#   "_shards" : {
#     "total" : 2,
#     "successful" : 1,
#     "failed" : 0
#   },
#   "_seq_no" : 0,
#   "_primary_term" : 1
# }

PUT test2/_doc/1
{
  "id":1,
  "name":"test2-1"
}


# {
#   "_index" : "test2",
#   "_type" : "_doc",
#   "_id" : "1",
#   "_version" : 1,
#   "result" : "created",
#   "_shards" : {
#     "total" : 2,
#     "successful" : 1,
#     "failed" : 0
#   },
#   "_seq_no" : 0,
#   "_primary_term" : 1
# }

PUT test3/_doc/1
{
  "id":1,
  "name":"test3-1"
}

# {
#   "_index" : "test3",
#   "_type" : "_doc",
#   "_id" : "1",
#   "_version" : 1,
#   "result" : "created",
#   "_shards" : {
#     "total" : 2,
#     "successful" : 1,
#     "failed" : 0
#   },
#   "_seq_no" : 0,
#   "_primary_term" : 1
# }

三、URL中指定multi index

直接在URL中指定搜索特定的index

POST test1/_search 
{
    "query": {
        "match_all": {}
    }
}


# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 5,
#     "successful" : 5,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 1,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       }
#     ]
#   }
# }

可以通过都好分割同时搜索多个index;

POST test1,test2/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 1,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 10,
#     "successful" : 10,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 2,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       }
#     ]
#   }
# }

我们可以使用关键字_all指定搜索所有的index;

POST _all/_search 
{
    "query": {
        "match_all": {}
    }
}

{
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 15,
#     "successful" : 15,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 3,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       },
#       {
#         "_index" : "test3",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test3-1"
#         }
#       }
#     ]
#   }
# }

也可以使用通配符*来匹配一些名字有共同特征的index;

POST test*/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 1,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 15,
#     "successful" : 15,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 3,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       },
#       {
#         "_index" : "test3",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test3-1"
#         }
#       }
#     ]
#   }
# }

还可以使用-来排除某个index;

POST test*,-test2/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 10,
#     "successful" : 10,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 2,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test3",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test3-1"
#         }
#       }
#     ]
#   }
# }

四、URL中multi index的一些控制选项

如果我们显示search一个不存在的或者关闭的index就会报错;

POST test4/_search
{
    "query": {
        "match_all": {}
    }
}


# {
#   "error" : {
#     "root_cause" : [
#       {
#         "type" : "index_not_found_exception",
#         "reason" : "no such index",
#         "resource.type" : "index_or_alias",
#         "resource.id" : "test4",
#         "index_uuid" : "_na_",
#         "index" : "test4"
#       }
#     ],
#     "type" : "index_not_found_exception",
#     "reason" : "no such index",
#     "resource.type" : "index_or_alias",
#     "resource.id" : "test4",
#     "index_uuid" : "_na_",
#     "index" : "test4"
#   },
#   "status" : 404
# }

POST test3/_close
# 
# {
#   "acknowledged" : true
# }

POST test3/_search
{
    "query": {
        "match_all": {}
    }
}


# {
#   "error": {
#     "root_cause": [
#       {
#         "type": "index_closed_exception",
#         "reason": "closed",
#         "index_uuid": "KI7Iv4eGRIOk6MsycXokNQ",
#         "index": "test3"
#       }
#     ],
#     "type": "index_closed_exception",
#     "reason": "closed",
#     "index_uuid": "KI7Iv4eGRIOk6MsycXokNQ",
#     "index": "test3"
#   },
#   "status": 400
# }

我们可以使用ignore_unavailable来忽略不存在或者关闭的index;


POST test4/_search?ignore_unavailable=true
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 0,
#     "successful" : 0,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 0,
#     "max_score" : 0.0,
#     "hits" : [ ]
#   }
# }


POST test3/_search?ignore_unavailable=true
{
    "query": {
        "match_all": {}
    }
}


# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 0,
#     "successful" : 0,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 0,
#     "max_score" : 0.0,
#     "hits" : [ ]
#   }
# }

如果通过通配符、_all隐式的指定search的index,如果不存在则默认不会报错,不过可以通过allow_no_indices=false来让elasticsearch报错;

POST noexist*/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 0,
#     "successful" : 0,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 0,
#     "max_score" : 0.0,
#     "hits" : [ ]
#   }
# }


POST noexist*/_search?allow_no_indices=false
{
    "query": {
        "match_all": {}
    }
}

# {
#   "error" : {
#     "root_cause" : [
#       {
#         "type" : "index_not_found_exception",
#         "reason" : "no such index",
#         "resource.type" : "index_or_alias",
#         "resource.id" : "noexist*",
#         "index_uuid" : "_na_",
#         "index" : "noexist*"
#       }
#     ],
#     "type" : "index_not_found_exception",
#     "reason" : "no such index",
#     "resource.type" : "index_or_alias",
#     "resource.id" : "noexist*",
#     "index_uuid" : "_na_",
#     "index" : "noexist*"
#   },
#   "status" : 404
# }



POST test3*/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 0,
#     "successful" : 0,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 0,
#     "max_score" : 0.0,
#     "hits" : [ ]
#   }
# }

POST test3*/_search?allow_no_indices=false
{
    "query": {
        "match_all": {}
    }
}

# {
#   "error" : {
#     "root_cause" : [
#       {
#         "type" : "index_not_found_exception",
#         "reason" : "no such index",
#         "resource.type" : "index_or_alias",
#         "resource.id" : "test3*"
#       }
#     ],
#     "type" : "index_not_found_exception",
#     "reason" : "no such index",
#     "resource.type" : "index_or_alias",
#     "resource.id" : "test3*"
#   },
#   "status" : 404
# }


我们也可以使用expand_wildcards来控制展开哪些index,可选值open、closed、none、all;

默认只扩展open;

POST test*/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 10,
#     "successful" : 10,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 2,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       }
#     ]
#   }
# }


POST test*/_search?expand_wildcards=all
{
    "query": {
        "match_all": {}
    }
}

# {
#   "error": {
#     "root_cause": [
#       {
#         "type": "index_closed_exception",
#         "reason": "closed",
#         "index_uuid": "KI7Iv4eGRIOk6MsycXokNQ",
#         "index": "test3"
#       }
#     ],
#     "type": "index_closed_exception",
#     "reason": "closed",
#     "index_uuid": "KI7Iv4eGRIOk6MsycXokNQ",
#     "index": "test3"
#   },
#   "status": 400
# }

POST test*/_search?expand_wildcards=all&ignore_unavailable=true
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 10,
#     "successful" : 10,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 2,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       }
#     ]
#   }
# }

五、使用index aliases封装物理index

aliases是物理索引的别名,请求api的时候,elasticsearch会自动将aliases转化为对应的物理index name;

别名既可以映射到某个特定的index,也可以映射到多个index;

别名也可以同时应用过滤条件,实现只对index的局部数据进行搜索;

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test*", "alias" : "all_test_indices" } }
    ]
}

# {
#   "acknowledged" : true
# }

POST all_test_indices/_search
{
    "query": {
        "match_all": {}
    }
}

# {
#   "took" : 0,
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 10,
#     "successful" : 10,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : {
#     "total" : 2,
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "test1",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test1-1"
#         }
#       },
#       {
#         "_index" : "test2",
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1,
#           "name" : "test2-1"
#         }
#       }
#     ]
#   }
# }

六、multi search–通过body指定index

Multi Search API的主要目的是实现在一个API里边实现多个search请求,其通过如下格式分别通过header指定index,body指定查询语句;

header\n
body\n
header\n
body\n

Multi Search API除了与前两者具有相同的指定index name的能力,最大的优势就是通过body传递index name,轻松突破URL的长度限制的局限性;

还有一点就是Multi Search API支持大量的没有特定规律的index name,例如跟时间序列有关的index name等;

GET _msearch
{"index":"test*"}
{"query" : {"match_all" : {}}}

# {
#   "responses" : [
#     {
#       "took" : 0,
#       "timed_out" : false,
#       "_shards" : {
#         "total" : 10,
#         "successful" : 10,
#         "skipped" : 0,
#         "failed" : 0
#       },
#       "hits" : {
#         "total" : 2,
#         "max_score" : 1.0,
#         "hits" : [
#           {
#             "_index" : "test1",
#             "_type" : "_doc",
#             "_id" : "1",
#             "_score" : 1.0,
#             "_source" : {
#               "id" : 1,
#               "name" : "test1-1"
#             }
#           },
#           {
#             "_index" : "test2",
#             "_type" : "_doc",
#             "_id" : "1",
#             "_score" : 1.0,
#             "_source" : {
#               "id" : 1,
#               "name" : "test2-1"
#             }
#           }
#         ]
#       },
#       "status" : 200
#     }
#   ]
# }

有关elasticsearch之多索引查询的更多相关文章

  1. ruby - ECONNRESET (Whois::ConnectionError) - 尝试在 Ruby 中查询 Whois 时出错 - 2

    我正在用Ruby编写一个简单的程序来检查域列表是否被占用。基本上它循环遍历列表,并使用以下函数进行检查。require'rubygems'require'whois'defcheck_domain(domain)c=Whois::Client.newc.query("google.com").available?end程序不断出错(即使我在google.com中进行硬编码),并打印以下消息。鉴于该程序非常简单,我已经没有什么想法了-有什么建议吗?/Library/Ruby/Gems/1.8/gems/whois-2.0.2/lib/whois/server/adapters/base.

  2. ruby-on-rails - 在 Rails 和 ActiveRecord 中查询时忽略某些字段 - 2

    我知道我可以指定某些字段来使用pluck查询数据库。ids=Item.where('due_at但是我想知道,是否有一种方法可以指定我想避免从数据库查询的某些字段。某种反拔?posts=Post.where(published:true).do_not_lookup(:enormous_field) 最佳答案 Model#attribute_names应该返回列/属性数组。您可以排除其中一些并传递给pluck或select方法。像这样:posts=Post.where(published:true).select(Post.attr

  3. sql - 查询忽略时间戳日期的时间范围 - 2

    我正在尝试查询我的Rails数据库(Postgres)中的购买表,我想查询时间范围。例如,我想知道在所有日期的下午2点到3点之间进行了多少次购买。此表中有一个created_at列,但我不知道如何在不搜索特定日期的情况下完成此操作。我试过:Purchases.where("created_atBETWEEN?and?",Time.now-1.hour,Time.now)但这最终只会搜索今天与那些时间的日期。 最佳答案 您需要使用PostgreSQL'sdate_part/extractfunction从created_at中提取小时

  4. ruby-on-rails - solr 清理查询 - 2

    我在Rails上使用带有ruby​​的solr。一切正常,我只需要知道是否有任何现有代码来清理用户输入,比如以?开头的查询。或* 最佳答案 我不知道执行此操作的任何代码,但理论上可以通过查看parsingcodeinLucene来完成并搜索thrownewParseException(只有16个匹配!)。在实践中,我认为您最好只捕获代码中的任何solr异常并显示“无效查询”消息或类似信息。编辑:这里有几个“sanitizer”:http://pivotallabs.com/users/zach/blog/articles/937-s

  5. ruby-on-rails - Rails 3 在一个查询中包含多个表 - 2

    我正在为锦标赛开发一个Rails应用程序。我在这个查询中使用了三个模型:classPlayertruehas_and_belongs_to_many:tournamentsclassTournament:destroyclassPlayerMatch"Player",:foreign_key=>"player_one"belongs_to:player_two,:class_name=>"Player",:foreign_key=>"player_two"在tournaments_controller的显示操作中,我调用以下查询:Tournament.where(:id=>params

  6. ruby-on-rails - Sunspot:如何对具有不同值的多个字段进行全文查询? - 2

    我想用sunspot重现以下原始solr查询q=exact_term_text:fooORterm_textv:foo*ORalternate_text:bar*但我无法通过标准的太阳黑子界面理解这是否可能以及如何实现,因为看起来:fulltext方法似乎不接受多个文本/搜索字段参数我不知道将什么参数作为第一个参数传递给fulltext,就好像我通过了"foo"或"bar"结果不匹配如果我传递一个空参数,我得到一个q=*:*范围过滤器(例如with(:term).starting_with('foo*')(顾名思义)作为过滤器查询应用,因此不参与评分。似乎可以手动编写字符串(或者可能使

  7. ruby-on-rails - 在不重新查询数据库的情况下重新排序 Rails 中的事件记录? - 2

    例如,假设我有一个名为Products的模型,并且在ProductsController中,我有以下代码用于product_listView以显示已排序的产品。@products=Product.order(params[:order_by])让我们想象一下,在product_listView中,用户可以使用下拉菜单按价格、评级、重量等进行排序。数据库中的产品不会经常更改。我很难理解的是,每次用户选择新的order_by过滤器时,rails是否必须查询,或者rails是否能够以某种方式缓存事件记录以在服务器端重新排序?有没有一种方法可以编写它,以便在用户排序时rails不会重新查询结果

  8. ruby-on-rails - 带句点(或句号)的 Rails 查询字符串。 - 2

    我目前正在尝试了解RoR。我将两个字符串传递到我的Controller中。一个是随机的十六进制字符串,另一个是电子邮件。该项目用于对数据库进行简单的电子邮件验证。我遇到的问题是当我输入如下内容来测试我的页面时:http://signup.testsite.local/confirm/da2fdbb49cf32c6848b0aba0f80fb78c/bob.villa@gmailcom我在:email的参数散列中得到的全部是'bob'。我在gmail和com之间留下了.,因为那样会导致匹配根本不起作用。我的路由匹配如下:match"confirm/:code/:email"=>"conf

  9. ruby - Rails Elasticsearch 聚合 - 2

    不知何故,我似乎无法获得包含我的聚合的响应...使用curl它按预期工作:HBZUMB01$curl-XPOST"http://localhost:9200/contents/_search"-d'{"size":0,"aggs":{"sport_count":{"value_count":{"field":"dwid"}}}}'我收到回复:{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":90,"max_score":0.0,"hits":[]},"a

  10. ruby - 如何将编码的查询值添加到 URL? - 2

    我正在寻找一种方便实用的方法来将编码值添加到Ruby中的URL查询字符串。目前,我有:require'open-uri'u=URI::HTTP.new("http",nil,"mydomain.example",nil,nil,"/tv",nil,"show="+URI::encode("Rosie&Jim"),nil)pu.to_s#=>"http://mydomain.example/tv?show=Rosie%20&%20Jim"这不是我要找的,因为我需要得到“http://mydomain.example/tv?show=Rosie%20%26%20Jim”,这样show=值就

随机推荐