/ Elasticsearch

Elasticsearch相关汇总

在使用Elasticsearch的过程中,难免会要去官网翻看文档,由于目前还没有读完文档,以至于在找一些没用过的API时还挺费劲,有时候甚至还可能找不到。因此就把目前已经用到过的地方在这里汇总记录一下,方便以后碰到的话可以直接去查看。

Index

Aliases

索引的一个别名,在某些情况下非常有用,比如在无缝切换索引的时候。

Mappings

索引的mapping定义十分重要,他决定了我们的数据是如何保存在索引内,以及保存的数据都有什么字段,各个字段的数据类型又是什么。

Setting

Query

Full text queries

全文检索相关,主要包含match querymatch_bool_prefix querymatch_phrase querymatch_phrase_prefix querymulti_match queryquery_string querymatch_bool_prefix querymatch_phrase querymatch_phrase_prefix querymulti_match queryquery_string query

Compound queries

混合索引,包含bool queryboosting queryconstant_score querydis_max queryfunction_score query

Function score query

用户可以通过自定义一个或多个查询语句来提高某些文档的比分权重,

还可以通过script_score使用脚本给每个文档重新打分

Highlight

Prefix query

使用前缀查询可以返回前缀为指定前缀的文档,多用于即时搜索一类的提示。

Match phrase prefix query

当需要对一个短语或词组进行前缀查询时,就需要用到来进行搜索了

Named query

通过使用_name参数可以在多字段查询时知道是哪个子查询语句命中了该文档,并将结果返回在每个响应文档的matched_queries字段内。

Nested query

Exists query

在某些情况下,并不是所有的字段都存在确切的值,可以通过Exists来或者筛选包含某些字段的文档,同时配合must_not可以来筛选所有存在该字段的文档。

Scripts

ES的脚本语言是painless,语法与Java类似,可直接按照Java的语法来编写检索脚本,具体可见地址:Shard API

这里只记录一下自己使用到的,以便以后再遇到可直接CV。

删除数组内满足条件的元素

使用removeIf来完成,例如删除ID为10的元素

{
  "script": {
    "source": "ctx._source.members.removeIf(list_item -> list_item.id == params.member_id)",
    "lang": "painless",
    "params": {
      "member_id": 10
    }
  }
}

判断数组内是否包含某一个对象

使用contains来完成,返回包含name张三的文档

{
  "script": {
    "source": "ctx._source.members.contains(params.name)",
    "lang": "painless",
    "params": {
      "name": "张三"
    }
  }
}

根据时间提高某些文档的权重

使用时间格式化方法toInstanttoEpochMilli来完成,将时间转换成毫秒级权重因子

{
    "script": {
        "lang": "painless", 
        "source": "double dateScore; try {dateScore = Math.abs(doc['enforcementDate'].value.toInstant().toEpochMilli()/1e12);} catch (Exception e) {dateScore=0;} return dateScore;"
    }
}

Run Elasticsearch by docker

1. Download the docker image of Elasticsearch, taking version 7.6.0 as an example

    docker pull elasticsearch:7.6.0

2. Create a container and run it.

if your command is:

   docker run -d --name es -p 9200:9200 -p 9300:9300 elasticsearch:7.6.0

It may exit shortly after starting. To find out the reason, view logs by log command:

   docker logs es

Some error message like max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] may appear. Go the following command:

   # change the variable
   sysctl -w vm.max_map_count=262144
   # check the variable value
   sysctl -n vm.max_map_count

If the error message is the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured, which means the container has wrong configuration and can be corrected by setting those prompted environment variables or setting to standalone development mode by discovery.type=single-node.

In short, the correct command is:

   docker run -d --name es -p 9200:9200 -p 9300:9300 -e discovery.type=single-node elasticsearch:7.6.0

Using IK Chinese segmentation plugin.

1. Download the plugin.

The plugin version must equal the Elasticsearch version. Version 7.6.0 download link is https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.0/elasticsearch-analysis-ik-7.6.0.zip

2. Unzip to an empty directory, which is referred to as *$IK*.

3. Copy the $IK into the directory of the container's plugin.

   docker cp $IK es:/usr/share/elasticsearch/plugins/ik

This plugin provides analyzer and tokenizer named ik_smart and ik_max_word where ik_smart splits by the coarsest granularity, while ik_max_word will exhaust all kinds of split combinations.

Find more info at IK.