ElasticSearch必知必会-基础篇

Jcloud 2023-03-28 原文

商业发展与职能技术部-体验保障研发组康睿姚再毅李振刘斌王北永

说明：以下全部均基于eslaticsearch 8.1 版本

一.索引的定义

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/indices.html

索引的全局认知

ElasticSearch	Mysql
Index	Table
Type废弃	Table废弃
Document	Row
Field	Column
Mapping	Schema
Everything is indexed	Index
Query DSL	SQL
GET http://...	select * from
POST http://...	update table set ...
Aggregations	group by\sum\sum
cardinality	去重 distinct
reindex	数据迁移

索引的定义

定义：相同文档结构（Mapping）文档的结合由唯一索引名称标定一个集群中有多个索引不同的索引代表不同的业务类型数据注意事项：索引名称不支持大写索引名称最大支持255个字符长度字段的名称，支持大写，不过建议全部统一小写

索引的创建

index-settings 参数解析

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-modules.html

注意：静态参数索引创建后，不再可以修改，动态参数可以修改思考：一、为什么主分片创建后不可修改？ A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the defalue value userd for _routing is the document`s _id es中写入数据，是根据上述的公式计算文档应该存储在哪个分片中，后续的文档读取也是根据这个公式，一旦分片数改变，数据也就找不到了简单理解根据ID做Hash 然后再除以主分片数取余，被除数改变，结果就不一样了二、如果业务层面根据数据情况，确实需要扩展主分片数，那怎么办？ reindex 迁移数据到另外一个索引 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html

索引的基本操作

二.Mapping-Param之dynamic

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic.html

核心功能

自动检测字段类型后添加字段也就是哪怕你没有在es的mapping中定义该字段，es也会动态的帮你检测字段类型

初识dynamic

// 删除test01索引，保证这个索引现在是干净的
DELETE test01

// 不定义mapping，直接一条插入数据试试看,
POST test01/_doc/1
{
  "name":"kangrui10"
}

// 然后我们查看test01该索引的mapping结构 看看name这个字段被定义成了什么类型
// 由此可以看出，name一级为text类型，二级定义为keyword，但其实这并不是我们想要的结果，
// 我们业务查询中name字段并不会被分词查询，一般都是全匹配(and name = xxx)
// 以下的这种结果，我们想要实现全匹配 就需要 name.keyword = xxx  反而麻烦
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

dynamic的可选值

可选值	说明	解释
true	New fields are added to the mapping (default).	创建mapping时，如果不指定dynamic的值，默认true，即如果你的字段没有收到指定类型，就会es帮你动态匹配字段类型
false	New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly.	若设置为false，如果你的字段没有在es的mapping中创建，那么新的字段，一样可以写入，但是不能被查询，mapping中也不会有这个字段，也就是被写入的字段，不会被创建索引
strict	If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping.	若设置为strict，如果新的字段，没有在mapping中创建字段，添加会直接报错，生产环境推荐，更加严谨。示例如下,如要新增字段，就必须手动的新增字段

动态映射的弊端

字段匹配相对准确，但不一定是用户期望的
比如现在有一个text字段，es只会给你设置为默认的standard分词器，但我们一般需要的是ik中文分词器
占用多余的存储空间
string类型匹配为text和keyword两种类型，意味着会占用更多的存储空间
mapping爆炸
如果不小心写错了查询语句，get用成了put误操作，就会错误创建很多字段

三.Mapping-Param之doc_values

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html

核心功能

DocValue其实是Lucene在构建倒排索引时，会额外建立一个有序的正排索引（基于document => field value的映射列表） DocValue本质上是一个序列化的列式存储，这个结构非常适用于聚合（aggregations）、排序（Sorting）、脚本（scripts access to field）等操作。而且，这种存储方式也非常便于压缩，特别是数字类型。这样可以减少磁盘空间并且提高访问速度。几乎所有字段类型都支持DocValue，除了text和annotated_text字段。

何为正排索引

正排索引其实就是类似于数据库表，通过id和数据进行关联，通过搜索文档id，来获取对应的数据

doc_values可选值

true：默认值，默认开启
false：需手动指定，设置为false后，sort、aggregate、access the field from script将会无法使用，但会节省磁盘空间

真题演练

// 创建一个索引，test03，字段满足以下条件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer
PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{
        "type": "integer"
      }
    }
  }
}

四.分词器analyzers

ik中文分词器安装

https://github.com/medcl/elasticsearch-analysis-ik

何为倒排索引

数据索引化的过程

分词器的分类

官网地址: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-analyzers.html

五.自定义分词

自定义分词器三段论

1.Character filters 字符过滤

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-charfilters.html 可配置0个或多个

HTML Strip Character Filter：用途：删除HTML元素，如 ，并解码HTML实体，如＆amp

Mapping Character Filter：用途：替换指定字符

Pattern Replace Character Filter：用途：基于正则表达式替换指定字符

2.Tokenizer 文本切为分词

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenizers.html#_word_oriented_tokenizers 只能配置一个用分词器对文本进行分词

3.Token filters 分词后再过滤

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenfilters.html 可配置0个或多个分词后再加工，比如转小写、删除某些特殊的停用词、增加同义词等

真题演练

有一个文档，内容类似 dag & cat, 要求索引这个文档，并且使用match_parase_query, 查询dag & cat 或者 dag and cat,都能够查到题目分析： 1.何为match_parase_query：match_phrase 会将检索关键词分词。match_phrase的分词结果必须在被检索字段的分词中都包含，而且顺序必须相同，而且默认必须都是连续的。 2.要实现 & 和 and 查询结果要等价，那么就需要自定义分词器来实现了，定制化的需求 3.如何自定义一个分词器：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-custom-analyzer.html 4.解法1核心使用功能点，Mapping Character Filter 5.解法2核心使用功能点，https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-synonym-tokenfilter.html

解法1

# 新建索引 PUT /test01 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "char_filter": [ "my_mappings_char_filter" ], "tokenizer": "standard", } }, "char_filter": { "my_mappings_char_filter": { "type": "mapping", "mappings": [ "& => and" ] } } } }, "mappings": { "properties": { "content":{ "type": "text", "analyzer": "my_analyzer" } } } } // 说明 // 三段论之Character filters，使用char_filter进行文本替换 // 三段论之Token filters，使用默认分词器 // 三段论之Token filters，未设定 // 字段content 使用自定义分词器my_analyzer # 填充测试数据 PUT test01/_bulk {"index":{"_id":1}} {"content":"doc & cat"} {"index":{"_id":2}} {"content":"doc and cat"} # 执行测试,doc & cat || oc and cat 结果输出都为两条 POST test01/_search { "query": { "bool": { "must": [ { "match_phrase": { "content": "doc & cat" } } ] } } }

解法2

# 解题思路，将& 和 and 设定为同义词，使用Token filters # 创建索引 PUT /test02 { "settings": { "analysis": { "analyzer": { "my_synonym_analyzer": { "tokenizer": "whitespace", "filter": [ "my_synonym" ] } }, "filter": { "my_synonym": { "type": "synonym", "lenient": true, "synonyms": [ "& => and" ] } } } }, "mappings": { "properties": { "content": { "type": "text", "analyzer": "my_synonym_analyzer" } } } } // 说明 // 三段论之Character filters，未设定 // 三段论之Token filters，使用whitespace空格分词器，为什么不用默认分词器？因为默认分词器会把&分词后剔除了，就无法在去做分词后的过滤操作了 // 三段论之Token filters，使用synony分词后过滤器，对&和and做同义词 // 字段content 使用自定义分词器my_synonym_analyzer # 填充测试数据 PUT test02/_bulk {"index":{"_id":1}} {"content":"doc & cat"} {"index":{"_id":2}} {"content":"doc and cat"} # 执行测试 POST test02/_search { "query": { "bool": { "must": [ { "match_phrase": { "content": "doc & cat" } } ] } } }

六.multi-fields

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/multi-fields.html

// 单字段多类型,比如一个字段我想设置两种分词器 PUT my-index-000001 { "mappings": { "properties": { "city": { "type": "text", "analyzer":"standard", "fields": { "fieldText": { "type": "text", "analyzer":"ik_smart", } } } } } }

七.runtime_field 运行时字段

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime.html

产生背景

假如业务中需要根据某两个数字类型字段的差值来排序，也就是我需要一个不存在的字段, 那么此时应该怎么办？当然你可以刷数，新增一个差值结果字段来实现，假如此时不允许你刷数新增字段怎么办？

解决方案

应用场景

在不重新建立索引的情况下，向现有文档新增字段

在不了解数据结构的情况下处理数据

在查询时覆盖从原索引字段返回的值

为特定用途定义字段而不修改底层架构

功能特性

Lucene完全无感知，因没有被索引化，没有doc_values

不支持评分，因为没有倒排索引

打破传统先定义后使用的方式

能阻止mapping爆炸

增加了API的灵活性

注意，会使得搜索变慢

实际使用

运行时检索指定，即检索环节可使用（也就是哪怕mapping中没有这个字段，我也可以查询）

动态或静态mapping指定，即mapping环节可使用（也就是在mapping中添加一个运行时的字段）

真题演练1

# 假定有以下索引和数据 PUT test03 { "mappings": { "properties": { "emotion": { "type": "integer" } } } } POST test03/_bulk {"index":{"_id":1}} {"emotion":2} {"index":{"_id":2}} {"emotion":5} {"index":{"_id":3}} {"emotion":10} {"index":{"_id":4}} {"emotion":3} # 要求：emotion > 5, 返回emotion_falg = '1', # 要求：emotion < 5, 返回emotion_falg = '-1', # 要求：emotion = 5, 返回emotion_falg = '0',

解法1

检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html 该字段本质上是不存在的，所以需要检索时要加上 fields *

GET test03/_search { "fields": [ "*" ], "runtime_mappings": { "emotion_falg": { "type": "keyword", "script": { "source": """ if(doc['emotion'].value>5)emit('1'); if(doc['emotion'].value<5)emit('-1'); if(doc['emotion'].value==5)emit('0'); """ } } } }

解法2

创建索引时指定运行时字段：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-mapping-fields.html 该方式支持通过运行时字段做检索

# 创建索引并指定运行时字段 PUT test03_01 { "mappings": { "runtime": { "emotion_falg": { "type": "keyword", "script": { "source": """ if(doc['emotion'].value>5)emit('1'); if(doc['emotion'].value<5)emit('-1'); if(doc['emotion'].value==5)emit('0'); """ } } }, "properties": { "emotion": { "type": "integer" } } } } # 导入测试数据 POST test03_01/_bulk {"index":{"_id":1}} {"emotion":2} {"index":{"_id":2}} {"emotion":5} {"index":{"_id":3}} {"emotion":10} {"index":{"_id":4}} {"emotion":3} # 查询测试 GET test03_01/_search { "fields": [ "*" ] }

真题演练2

# 有以下索引和数据 PUT test04 { "mappings": { "properties": { "A":{ "type": "long" }, "B":{ "type": "long" } } } } PUT task04/_bulk {"index":{"_id":1}} {"A":100,"B":2} {"index":{"_id":2}} {"A":120,"B":2} {"index":{"_id":3}} {"A":120,"B":25} {"index":{"_id":4}} {"A":21,"B":25} # 需求：在task04索引里，创建一个runtime字段，其值是A-B，名称为A_B；创建一个range聚合，分为三级：小于0，0-100，100以上；返回文档数 // 使用知识点： // 1.检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html // 2.范围聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html

解法

# 结果测试 GET task04/_search { "fields": [ "*" ], "size": 0, "runtime_mappings": { "A_B": { "type": "long", "script": { "source": """ emit(doc['A'].value - doc['B'].value); """ } } }, "aggs": { "price_ranges_A_B": { "range": { "field": "A_B", "ranges": [ { "to": 0 }, { "from": 0, "to": 100 }, { "from": 100 } ] } } } }

八.Search-highlighted

highlighted语法初识

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/highlighting.html

九.Search-Order

Order语法初识

官网文档地址： https://www.elastic.co/guide/en/elasticsearch/reference/8.1/sort-search-results.html

// 注意：text类型默认是不能排或聚合的，如果非要排序或聚合，需要开启fielddata GET /kibana_sample_data_ecommerce/_search { "query": { "match": { "customer_last_name": "wood" } }, "highlight": { "number_of_fragments": 3, "fragment_size": 150, "fields": { "customer_last_name": { "pre_tags": [ "<em>" ], "post_tags": [ "</em>" ] } } }, "sort": [ { "currency": { "order": "desc" }, "_score": { "order": "asc" } } ] }

十.Search-Page

page语法初识

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/paginate-search-results.html

# 注意 from的起始值是 0 不是 1 GET kibana_sample_data_ecommerce/_search { "from": 5, "size": 20, "query": { "match": { "customer_last_name": "wood" } } }

真题演练1

# 题目 In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#" return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40' # highlight 处理 text_entry 字段；关键词 Hamlet 高亮 # page分页：from：40；size:20 # speech_number：倒序 POST test09/_search { "from": 40, "size": 20, "query": { "bool": { "must": [ { "match": { "text_entry": "Hamlet" } } ] } }, "highlight": { "fields": { "text_entry": { "pre_tags": [ "#aaa#" ], "post_tags": [ "#bbb#" ] } } }, "sort": [ { "speech_number.keyword": { "order": "desc" } } ] }

十一.Search-AsyncSearch

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html

发行版本

7.7.0

适用场景

允许用户在异步搜索结果时可以检索，从而消除了仅在查询完成后才等待最终响应的情况

常用命令

执行异步检索

POST /sales*/_async_search?size=0

查看异步检索

GET /_async_search/id值

查看异步检索状态

GET /_async_search/id值

删除、终止异步检索

DELETE /_async_search/id值

异步查询结果说明

返回值含义

id 异步检索返回的唯一标识符

is_partial 当查询不再运行时，指示再所有分片上搜索是成功还是失败。在执行查询时，is_partial=true

is_running 搜索是否仍然再执行

total 将在多少分片上执行搜索

successful 有多少分片已经成功完成搜索

十二.Aliases索引别名

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/aliases.html

Aliases的作用

在ES中，索引别名（index aliases）就像一个快捷方式或软连接，可以指向一个或多个索引。别名带给我们极大的灵活性，我们可以使用索引别名实现以下功能：

在一个运行中的ES集群中无缝的切换一个索引到另一个索引上（无需停机）

分组多个索引，比如按月创建的索引，我们可以通过别名构造出一个最近3个月的索引

查询一个索引里面的部分数据构成一个类似数据库的视图（views

假设没有别名，如何处理多索引的检索

方式1：POST index_01,index_02.index_03/_search 方式2：POST index*/search

创建别名的三种方式

创建索引的同时指定别名

# 指定test05的别名为 test05_aliases PUT test05 { "mappings": { "properties": { "name":{ "type": "keyword" } } }, "aliases": { "test05_aliases": {} } }

使用索引模板的方式指定别名

PUT _index_template/template_1 { "index_patterns": ["te*", "bar*"], "template": { "settings": { "number_of_shards": 1 }, "mappings": { "_source": { "enabled": true }, "properties": { "host_name": { "type": "keyword" }, "created_at": { "type": "date", "format": "EEE MMM dd HH:mm:ss Z yyyy" } } }, "aliases": { "mydata": { } } }, "priority": 500, "composed_of": ["component_template1", "runtime_component_template"], "version": 3, "_meta": { "description": "my custom" } }

对已有的索引创建别名

POST _aliases { "actions": [ { "add": { "index": "logs-nginx.access-prod", "alias": "logs" } } ] }

删除别名

POST _aliases { "actions": [ { "remove": { "index": "logs-nginx.access-prod", "alias": "logs" } } ] }

真题演练1

# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners # 为'accounts-row'定义一个索引别名，称为'accounts-male':应用一个过滤器，只显示男性账户所有者 POST _aliases { "actions": [ { "add": { "index": "accounts-row", "alias": "accounts-male", "filter": { "bool": { "filter": [ { "term": { "gender.keyword": "male" } } ] } } } } ] }

十三.Search-template

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-template.html

功能特点

模板接受在运行时指定参数。搜索模板存储在服务器端，可以在不更改客户端代码的情况下进行修改。

初识search-template

# 创建检索模板 PUT _scripts/my-search-template { "script": { "lang": "mustache", "source": { "query": { "match": { "{{query_key}}": "{{query_value}}" } }, "from": "{{from}}", "size": "{{size}}" } } } # 使用检索模板查询 GET my-index/_search/template { "id": "my-search-template", "params": { "query_key": "your filed", "query_value": "your filed value", "from": 0, "size": 10 } }

索引模板的操作

创建索引模板

PUT _scripts/my-search-template { "script": { "lang": "mustache", "source": { "query": { "match": { "message": "{{query_string}}" } }, "from": "{{from}}", "size": "{{size}}" }, "params": { "query_string": "My query string" } } }

验证索引模板

POST _render/template { "id": "my-search-template", "params": { "query_string": "hello world", "from": 20, "size": 10 } }

执行检索模板

GET my-index/_search/template { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 } }

获取全部检索模板

GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts

删除检索模板

DELETE _scripts/my-search-templateath=metadata.stored_scripts

十四.Search-dsl 简单检索

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl.html

检索选型

检索分类

自定义评分

如何自定义评分

1.index Boost索引层面修改相关性

// 一批数据里，有不同的标签，数据结构一致，不同的标签存储到不同的索引（A、B、C），最后要严格按照标签来分类展示的话，用什么查询比较好? // 要求：先展示A类，然后B类，然后C类 # 测试数据如下 put /index_a_123/_doc/1 { "title":"this is index_a..." } put /index_b_123/_doc/1 { "title":"this is index_b..." } put /index_c_123/_doc/1 { "title":"this is index_c..." } # 普通不指定的查询方式，该查询方式下，返回的三条结果数据评分是相同的 POST index_*_123/_search { "query": { "bool": { "must": [ { "match": { "title": "this" } } ] } } } 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html indices_boost # 也就是索引层面提升权重 POST index_*_123/_search { "indices_boost": [ { "index_a_123": 10 }, { "index_b_123": 5 }, { "index_c_123": 1 } ], "query": { "bool": { "must": [ { "match": { "title": "this" } } ] } } }

2.boosting 修改文档相关性

某索引index_a有多个字段，要求实现如下的查询： 1）针对字段title，满足'ssas'或者'sasa’。 2）针对字段tags（数组字段），如果tags字段包含'pingpang', 则提升评分。要求：写出实现的DSL？ # 测试数据如下 put index_a/_bulk {"index":{"_id":1}} {"title":"ssas","tags":"basketball"} {"index":{"_id":2}} {"title":"sasa","tags":"pingpang; football"} # 解法1 POST index_a/_search { "query": { "bool": { "must": [ { "bool": { "should": [ { "match": { "title": "ssas" } }, { "match": { "title": "sasa" } } ] } } ], "should": [ { "match": { "tags": { "query": "pingpang", "boost": 1 } } } ] } } } # 解法2 // https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html POST index_a/_search { "query": { "bool": { "should": [ { "function_score": { "query": { "match": { "tags": { "query": "pingpang" } } }, "boost": 1 } } ], "must": [ { "bool": { "should": [ { "match": { "title": "ssas" } }, { "match": { "title": "sasa" } } ] } } ] } } }

3.negative_boost降低相关性

对于某些结果不满意，但又不想通过 must_not 排除掉，可以考虑可以考虑boosting query的negative_boost。即：降低评分 negative_boost (Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query. 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html POST index_a/_search { "query": { "boosting": { "positive": { "term": { "tags": "football" } }, "negative": { "term": { "tags": "pingpang" } }, "negative_boost": 0.5 } } }

4.function_score 自定义评分

如何同时根据销量和浏览人数进行相关度提升？问题描述：针对商品，例如有想要有一个提升相关度的计算，同时针对销量和浏览人数？例如oldScore*(销量+浏览人数) ************************** 商品销量浏览人数 A 10 10 B 20 20 C 30 30 ************************** # 示例数据如下 put goods_index/_bulk {"index":{"_id":1}} {"name":"A","sales_count":10,"view_count":10} {"index":{"_id":2}} {"name":"B","sales_count":20,"view_count":20} {"index":{"_id":3}} {"name":"C","sales_count":30,"view_count":30} 官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html 知识点：script_score POST goods_index/_search { "query": { "function_score": { "query": { "match_all": {} }, "script_score": { "script": { "source": "_score * (doc['sales_count'].value+doc['view_count'].value)" } } } } }

十五.Search-del Bool复杂检索

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html

基本语法

真题演练

写一个查询，要求某个关键字再文档的四个字段中至少包含两个以上功能点：bool 查询，should / minimum_should_match 1.检索的bool查询 2.细节点 minimum_should_match 注意：minimum_should_match 当有其他子句的时候，默认值为0，当没有其他子句的时候默认值为1 POST test_index/_search { "query": { "bool": { "should": [ { "match": { "filed1": "kr" } }, { "match": { "filed2": "kr" } }, { "match": { "filed3": "kr" } }, { "match": { "filed4": "kr" } } ], "minimum_should_match": 2 } } }

十六.Search-Aggregations

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html

聚合分类

分桶聚合（bucket）

terms

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html # 按照作者统计文档数 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_user": { "terms": { "field": "user", "size": 1 } } } }

date_histogram

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html # 按照up_time 按月进行统计 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_up_time": { "date_histogram": { "field": "up_time", "calendar_interval": "month" } } } }

指标聚合（metrics）

Max

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html # 获取up_time最大的 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_max_up_time": { "max": { "field": "up_time" } } } }

Top_hits

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html # 根据user聚合只取一个聚合结果，并且获取命中数据的详情前3条，并按照指定字段排序 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "terms_agg_user": { "terms": { "field": "user", "size": 1 }, "aggs": { "top_user_hits": { "top_hits": { "_source": { "includes": [ "video_time", "title", "see", "user", "up_time" ] }, "sort": [ { "see":{ "order": "desc" } } ], "size": 3 } } } } } } // 返回结果如下 { "took" : 91, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "terms_agg_user" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 975, "buckets" : [ { "key" : "Elastic搜索", "doc_count" : 25, "top_user_hits" : { "hits" : { "total" : { "value" : 25, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bilili_elasticsearch", "_id" : "5ccCVoQBUyqsIDX6wIcm", "_score" : null, "_source" : { "video_time" : "03:45", "see" : "92", "up_time" : "2021-03-19", "title" : "Elastic 社区大会2021: 用加 Gatling 进行Elasticsearch的负载测试，寓教于乐。", "user" : "Elastic搜索" }, "sort" : [ "92" ] }, { "_index" : "bilili_elasticsearch", "_id" : "8scCVoQBUyqsIDX6wIgn", "_score" : null, "_source" : { "video_time" : "10:18", "see" : "79", "up_time" : "2020-10-20", "title" : "为Elasticsearch启动htpps访问", "user" : "Elastic搜索" }, "sort" : [ "79" ] }, { "_index" : "bilili_elasticsearch", "_id" : "7scCVoQBUyqsIDX6wIcm", "_score" : null, "_source" : { "video_time" : "04:41", "see" : "71", "up_time" : "2021-03-19", "title" : "Elastic 社区大会2021: Elasticsearch作为一个地理空间的数据库", "user" : "Elastic搜索" }, "sort" : [ "71" ] } ] } } } ] } } }

子聚合（Pipeline）

Pipeline：基于聚合的聚合官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline.html

bucket_selector

官网文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-bucket-selector-aggregation.html

# 根据order_date按月分组，并且求销售总额大于1000 POST kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "date_his_aggs": { "date_histogram": { "field": "order_date", "calendar_interval": "month" }, "aggs": { "sum_aggs": { "sum": { "field": "total_unique_products" } }, "sales_bucket_filter": { "bucket_selector": { "buckets_path": { "totalSales": "sum_aggs" }, "script": "params.totalSales > 1000" } } } } } }

真题演练

earthquakes索引中包含了过去30个月的地震信息，请通过一句查询，获取以下信息 l 过去30个月，每个月的平均 mag l 过去30个月里，平均mag最高的一个月及其平均mag l 搜索不能返回任何文档 max_bucket 官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html POST earthquakes/_search { "size": 0, "query": { "range": { "time": { "gte": "now-30M/d", "lte": "now" } } }, "aggs": { "agg_time_his": { "date_histogram": { "field": "time", "calendar_interval": "month" }, "aggs": { "avg_aggs": { "avg": { "field": "mag" } } } }, "max_mag_sales": { "max_bucket": { "buckets_path": "agg_time_his>avg_aggs" } } } }

返回值	含义
id	异步检索返回的唯一标识符
is_partial	当查询不再运行时，指示再所有分片上搜索是成功还是失败。在执行查询时，is_partial=true
is_running	搜索是否仍然再执行
total	将在多少分片上执行搜索
successful	有多少分片已经成功完成搜索

ElasticSearch 必会 elasticsearch reference https 大数据

有关ElasticSearch必知必会-基础篇的更多相关文章

postman接口测试工具-基础使用教程 - 2
1.postman介绍Postman一款非常流行的API调试工具。其实，开发人员用的更多。因为测试人员做接口测试会有更多选择，例如Jmeter、soapUI等。不过，对于开发过程中去调试接口，Postman确实足够的简单方便，而且功能强大。2.下载安装官网地址:https://www.postman.com/下载完成后双击安装吧，安装过程极其简单，无需任何操作3.使用教程这里以百度为例,工具使用简单，填写URL地址即可发送请求，在下方查看响应结果和响应状态码常用方法都有支持请求方法:getpostputdeleteGet、Post、Put与Delete的作用get：请求方法一般是用于数据查询，
软件测试基础 - 2
Ⅰ软件测试基础一、软件测试基础理论1、软件测试的必要性所有的产品或者服务上线都需要测试2、测试的发展过程3、什么是软件测试找bug，发现缺陷4、测试的定义使用人工或自动的手段来运行或者测试某个系统的过程。目的在于检测它是否满足规定的需求。弄清预期结果和实际结果的差别。5、测试的目的以最小的人力、物力和时间找出软件中潜在的错误和缺陷6、测试的原则28原则：20%的主要功能要重点测（eg：支付宝的支付功能，其他功能都是次要的）80%的错误存在于20%的代码中7、测试标准8、测试的基本要求功能测试性能测试安全性测试兼容性测试易用性测试外观界面测试可靠性测试二、质量模型衡量一个优秀软件的维度①功能性功
ES基础入门 - 2
ES一、简介1、ElasticStackES技术栈：ElasticSearch：存数据+搜索；QL；Kibana：Web可视化平台，分析。LogStash：日志收集，Log4j:产生日志；log.info(xxx)。。。。使用场景：metrics：指标监控…2、基本概念Index（索引）动词：保存（插入）名词：类似MySQL数据库，给数据Type（类型）已废弃，以前类似MySQL的表现在用索引对数据分类Document（文档）真正要保存的一个JSON数据{name:"tcx"}二、入门实战{"name":"DESKTOP-1TSVGKG","cluster_name":"elasticsear
ruby - Rails Elasticsearch 聚合 - 2
不知何故，我似乎无法获得包含我的聚合的响应...使用curl它按预期工作:HBZUMB01$curl-XPOST"http://localhost:9200/contents/_search"-d'{"size":0,"aggs":{"sport_count":{"value_count":{"field":"dwid"}}}}'我收到回复:{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":90,"max_score":0.0,"hits":[]},"a
【网络】-- 网络基础 - 2
（本文是网络的宏观的概念铺垫）目录计算机网络背景网络发展认识"协议"网络协议初识协议分层OSI七层模型TCP/IP五层(或四层)模型报头以太网碰撞路由器IP地址和MAC地址IP地址与MAC地址总结IP地址MAC地址计算机网络背景网络发展是最开始先有的计算机，计算机后来因为多项技术的水平升高，逐渐的计算机变的小型化、高效化。后来因为计算机其本身的计算能力比较的快速：独立模式：计算机之间相互独立。如：有三个人，每个人做的不同的事物，但是是需要协作的完成。而这三个人所做的事是需要进行协作的，然而刚开始因为每一台计算机之间都是互相独立的。所以前面的人处理完了就需要将数据
elasticsearch源码关于TransportSearchAction【阶段三】 - 2
1.回顾.TransportServicepublicclassTransportServiceextendsAbstractLifecycleComponentTransportService：方法：1publicfinalTextendsTransportResponse>voidsendRequest(finalTransport.Connectionconnection,finalStringaction,finalTransportRequestrequest,finalTransportRequestOptionsoptions,TransportResponseHandlerT>
ruby-on-rails - 使用 Rails (Tire) 和 ElasticSearch 进行模糊字符串匹配 - 2
我有一个Rails应用程序，现在设置了ElasticSearch和Tiregem以在模型上进行搜索，我想知道我应该如何设置我的应用程序以对模型中的某些索引进行模糊字符串匹配。我将我的模型设置为索引标题、描述等内容，但我想对其中一些进行模糊字符串匹配，但我不确定在何处进行此操作。如果您想发表评论，我将在下面包含我的代码!谢谢!在Controller中:defsearch@resource=Resource.search(params[:q],:page=>(params[:page]||1),:per_page=>15,load:true)end在模型中:classResource'Us
美团外卖搜索基于Elasticsearch的优化实践 - 2
美团外卖搜索工程团队在Elasticsearch的优化实践中，基于Location-BasedService（LBS）业务场景对Elasticsearch的查询性能进行优化。该优化基于Run-LengthEncoding（RLE）设计了一款高效的倒排索引结构，使检索耗时（TP99）降低了84%。本文从问题分析、技术选型、优化方案等方面进行阐述，并给出最终灰度验证的结论。1.前言最近十年，Elasticsearch已经成为了最受欢迎的开源检索引擎，其作为离线数仓、近线检索、B端检索的经典基建，已沉淀了大量的实践案例及优化总结。然而在高并发、高可用、大数据量的C端场景，目前可参考的资料并不多。因此
【详解】Docker安装Elasticsearch7.16.1集群 - 2
开门见山|拉取镜像dockerpullelasticsearch:7.16.1|配置存放的目录#存放配置文件的文件夹mkdir-p/opt/docker/elasticsearch/node-1/config#存放数据的文件夹mkdir-p/opt/docker/elasticsearch/node-1/data#存放运行日志的文件夹mkdir-p/opt/docker/elasticsearch/node-1/log#存放IK分词插件的文件夹mkdir-p/opt/docker/elasticsearch/node-1/plugins若你使用了moba，直接右键新建即可如上图所示依次类推创建
【Elasticsearch基础】Elasticsearch索引、文档以及映射操作详解 - 2
文章目录概念索引相关操作创建索引更新副本查看索引删除索引索引的打开与关闭收缩索引索引别名查询索引别名文档相关操作新建文档查询文档更新文档删除文档映射相关操作查询文档映射创建静态映射创建索引并添加映射概念es中有三个概念要清楚，分别为索引、映射和文档（不用死记硬背，大概有个印象就可以）索引可理解为MySQL数据库；映射可理解为MySQL的表结构；文档可理解为MySQL表中的每行数据静态映射和动态映射上面已经介绍了，映射可理解为MySQL的表结构，在MySQL中，向表中插入数据是需要先创建表结构的；但在es中不必这样，可以直接插入文档，es可以根据插入的文档（数据），动态的创建映射（表结构），这就

ElasticSearch必知必会-基础篇

商业发展与职能技术部-体验保障研发组 康睿 姚再毅 李振 刘斌 王北永

一.索引的定义

索引的全局认知

索引的定义

索引的创建

index-settings 参数解析

索引的基本操作

二.Mapping-Param之dynamic

核心功能

初识dynamic

dynamic的可选值

动态映射的弊端

三.Mapping-Param之doc_values

核心功能

何为正排索引

doc_values可选值

真题演练

四.分词器analyzers

ik中文分词器安装

何为倒排索引

数据索引化的过程

分词器的分类

五.自定义分词

自定义分词器三段论

1.Character filters 字符过滤

2.Tokenizer 文本切为分词

3.Token filters 分词后再过滤

真题演练

解法1

解法2

六.multi-fields

七.runtime_field 运行时字段

产生背景

解决方案

应用场景

功能特性

实际使用

真题演练1

解法1

解法2

真题演练2

解法

八.Search-highlighted

highlighted语法初识

九.Search-Order

Order语法初识

十.Search-Page

page语法初识

真题演练1

十一.Search-AsyncSearch

发行版本

适用场景

常用命令

异步查询结果说明

十二.Aliases索引别名

Aliases的作用

假设没有别名，如何处理多索引的检索

创建别名的三种方式

删除别名

真题演练1

十三.Search-template

功能特点

初识search-template

索引模板的操作

创建索引模板

验证索引模板

执行检索模板

获取全部检索模板

删除检索模板

十四.Search-dsl 简单检索

检索选型

检索分类

自定义评分

如何自定义评分

1.index Boost索引层面修改相关性

2.boosting 修改文档相关性

3.negative_boost降低相关性

4.function_score 自定义评分

十五.Search-del Bool复杂检索

商业发展与职能技术部-体验保障研发组康睿姚再毅李振刘斌王北永

指标聚合（metrics）

子聚合（Pipeline）