分布式搜索引擎-ElasticSearch（上集）-伙伴云

分布式搜索引擎-ElasticSearch（上集）

网友投稿 738 2025-04-01

个人简介

@[toc]

分布式搜索引擎-elasticsearch（上集）

注意：elasticsearch版本为7.6.1

什么是ElasticSearch

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

我们建立一个网站或应用程序，并要添加搜索功能，但是想要完成搜索工作的创建是非常困难的。我们希望搜索解决方案要运行速度快，我们希望能有一个零配置和一个完全免费的搜索模式，我们希望能够简单地使用JSON通过HTTP来索引数据，我们希望我们的搜索服务器始终可用，我们希望能够从一台开始并扩展到数百台，我们要实时搜索，我们要简单的多租户，我们希望建立一个云的解决方案。因此我们利用Elasticsearch来解决所有这些问题及可能出现的更多其它问题。摘选自《百度百科》

ElasticSearch概念

elasticsearch是一个实时的分布式全文检索引擎，elasticsearch是由Lucene作为底层构建的，elasticsearch采用的不是一般的正排索引（类似于mysql索引），而是用倒排索引，好处是模糊搜索速度极快。。。

elasticsearch的操作都是使用JSON格式发送请求的

ElasticSearch的底层索引

我们知道mysql的like可以作为模糊搜索，但是速度是很慢的，因为mysql的like模糊搜索不走索引，因为底层是正排索引，所谓的正排索引，也就是利用完整的关键字去搜索。。。。而elasticsearch的倒排索引则就是利用不完整的关键字去搜索。原因是elasticsearch利用了“分词器”去对每个document分词（每个字段都建立了一个倒排索引，除了documentid），利用分出来的每个词去匹配各个document

比如：在索引名为hello下，有三个document

documentid age name

1 18 张三

2 20 李四

3 18 李四

此时建立倒排索引：

第一个倒排索引：

age

18 1 , 3

20 2

第二个倒排索引：

name

张三 1

李四 2 , 3

elasticsearch和关系型数据库（MySQL）

我们暂且可以把es和mysql作出如下比较

MySql数据库（database） ========== elasticsearch的索引（index）

mysql的表（table）==============elasticsearch的type（类型）======后面会被废除

mysql的记录 =========== elasticsearch的文档（document）

mysql的字段 ============= elasticsearch的字段（Field）

elasticsearch的一些注意点

打开elasticsearch的config配置文件elasticsearch.yml

并在最下面添加如下：

http.cors.enabled: true http.cors.allow-origin: "*"

因为elasticsearch是一个非常耗资源的，从elasticsearch的配置jvm配置文件就可以看到，elasticsearch默认启动就需要分配给jvm1个g的内存。我们可以对它进行修改

打开elasticsearch的jvm配置文件jvm.options

找到：

-Xms1g //最小内存 -Xms1g //最大内存

修改成如下即可：

-Xms256m -Xms512m

如果在启动就报错，或者其他原因，我们要去看一看es和kibana的版本是否一致，比如es用的是7.6 ，那么kibana也要是7.6

ik分词器

ik分词器是一种中文分词器，但是比如有一些词（例如人名）它是不会分词的，所以我们可以对它进行扩展。

要使用ik分词器，就必须下载ik分词器插件，放到elasticsearch的插件目录中，并以ik为目录名

ik分词器一共有两种分词方式：ik_smart , ik_max_word

ik_smart : 最少切分（尽可能少切分单词）

ik_max_word : 最多切分（尽可能多切分单词）

=============================

ik_smart :

GET _analyze // _analyze 固定写法 { "text": ["分布式搜索"], "analyzer": "ik_smart" }

ik_max_word :

GET _analyze { "text": ["分布式搜索"], "analyzer": "ik_max_word" }

GET _analyze { "text": ["我是张三，very nice"], "analyzer": "ik_max_word" }

人名没有分正确。我们可以新建一个配置文件，去添加我们需要分的词

1.我们先去ik插件目录中找到IKAnalyzer.cfg.xml文件

IK Analyzer 扩展配置 //如果有自己新建的dic扩展，就可以加到xxx.dic

2.创建my.dic，把自己需要分词的添加进去

比如我们想添加多“张三”这个分词，就可以在my.dic输入进去

3.重启所有服务即可

GET _analyze { "text": ["我是张三，very nice"], "analyzer": "ik_max_word" }

{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "张三", "start_offset" : 2, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 }, { "token" : "very", "start_offset" : 6, "end_offset" : 10, "type" : "ENGLISH", "position" : 3 }, { "token" : "nice", "start_offset" : 11, "end_offset" : 15, "type" : "ENGLISH", "position" : 4 } ] }

elasticsearch的操作（REST风格）

下面的操作使用Kibana作为可视化工具去操作es ,也可以使用postman去操作

method url地址描述

PUT localhost:9100/索引名称/类型名称/文档id 创建文档（指定id）

POST localhost:9100/索引名称/类型名称创建文档（随机id）

POST localhost:9100/索引名称/文档类型/文档id/_update 修改文档

DELETE localhost:9100/索引名称/文档类型/文档id 删除文档

GET localhost:9100/索引名称/文档类型/文档id 查询文档通过文档id

POST localhost:9100/索引名称/文档类型/_search 查询所有文档

可以看到，elasticsearch和原生的RESTful风格有点不同，区别是PUT和POST，原生RestFul风格的PUT是用来修改数据的，POST是用来添加数据的，而这里相反

PUT和POST的区别：

PUT具有幂等性，POST不具有幂等性，也就是说利用PUT无论提交多少次，返回结果都不会发生改变，这就是具有幂等性，而POST我们可以把他理解为uuid生成id，每一次的id都不同，所以POST不具有幂等性

模板：PUT /索引名

例1：

创建一个索引名为hello01，类型为_doc，documentid（记录id）为001的记录，PUT一定要指定一个documentid，如果是POST的话可以不写，POST是随机给documentid的，因为post是不具有幂等性的

PUT /hello03 { //请求体，为空就是没有任何数据 }

返回结果

{ "acknowledged" : true, "shards_acknowledged" : true, "index" : "hello03" }

DELETE hello01 { }

PUT /hello03/_doc/1 { "name": "yzj", "age" : 18 }

结果:

{ "_index" : "hello03", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }

然后我们查看一下hello03的索引信息：

{ "state": "open", "settings": { "index": { "creation_date": "1618408917052", "number_of_shards": "1", "number_of_replicas": "1", "uuid": "OEVNL7cCQgG74KMPG5LjLA", "version": { "created": "7060199" }, "provided_name": "hello03" } }, "mappings": { "_doc": { "properties": { "name": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" //name的底层默认用了keyword（不可分词） } } }, "age": { "type": "long" //age用了long } } } }, "aliases": [ ], "primary_terms": { "0": 1 }, "in_sync_allocations": { "0": [ "17d4jyS9RgGEVid4rIANQA" ] } }

我们可以看到，如果我们没有指定字段类型，就会使用es默认提供的

例如上面的name，默认用了keyword，不可分词

所以我们很有必要在创建时就指定类型

DELETE hello01/_doc/004 { }

POST hello02/_update/001 { "doc": { "d2":"Java" } }

DELETE hello02/_doc/001 { }

PUT /hello05 { "mappings": { "properties": { "name":{ "type": "text", "analyzer": "ik_max_word" }, "say":{ "type": "text", "analyzer": "ik_max_word" } } } }

查看一下hello05索引信息：

{ "state": "open", "settings": { "index": { "creation_date": "1618410744334", "number_of_shards": "1", "number_of_replicas": "1", "uuid": "isCuH2wTQ8S3Yw2MSspvGA", "version": { "created": "7060199" }, "provided_name": "hello05" } }, "mappings": { "_doc": { "properties": { "name": { "analyzer": "ik_max_word", //说明指定字段类型成功了 "type": "text" }, "say": { "analyzer": "ik_max_word", "type": "text" } } } }, "aliases": [ ], "primary_terms": { "0": 1 }, "in_sync_allocations": { "0": [ "lh6O9N8KQNKtLqD3PSU-Fg" ] } }

我们再重新往hello05索引添加mapping映射：

PUT /hello05 { "mappings": { "properties": { "name":{ "type": "text", "analyzer": "ik_max_word" }, "say":{ "type": "text", "analyzer": "ik_max_word" }, "age":{ "type": "integer" } } } }

然后，报错了！！！！！！

{ "error" : { "root_cause" : [ { "type" : "resource_already_exists_exception", "reason" : "index [hello05/isCuH2wTQ8S3Yw2MSspvGA] already exists", "index_uuid" : "isCuH2wTQ8S3Yw2MSspvGA", "index" : "hello05" } ], "type" : "resource_already_exists_exception", "reason" : "index [hello05/isCuH2wTQ8S3Yw2MSspvGA] already exists", "index_uuid" : "isCuH2wTQ8S3Yw2MSspvGA", "index" : "hello05" }, "status" : 400 }

特别注意：

在我们创建了索引映射属性后，es底层就会给我们创建倒排索引（不可以再次进行修改），但是可以添加新的字段，或者重新创建一个新索引，用reindex把旧索引的信息放到新索引里面去。

所以：我们在创建索引mapping属性的时候要再三考虑

不然，剩下没有指定的字段就只能使用es默认提供的了

我们上面说过，mapping映射字段不能修改，但是没有说不能添加，添加的方式有一些不同。

PUT hello05/_mapping { "properties": { "ls":{ "type": "keyword" } } }

使用场景：当mapping设置完之后发现有几个字段需要“修改”，此时我们可以先创建一个新的索引，然后定义好字段，然后把旧索引的数据全部导入进新索引

POST _reindex { "source": { "index": "hello05", "type": "_doc" }, "dest": { "index": "hello06" } }

#! Deprecation: [types removal] Specifying types in reindex requests is deprecated. { "took" : 36, "timed_out" : false, "total" : 5, "updated" : 0, "created" : 5, "deleted" : 0, "batches" : 1, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] }

GET hello05 { }

GET hello05/_search { "query": { "match_all": {} } }

GET hello05/_doc/1 { }

GET hello05/_search { }

和上面的是一样的

GET hello05/_search { "query": { "match_all": {} } }

match查询是可以把查询条件进行分词的。

GET hello05/_search { "query": { "match": { "name": "李" //查询条件 } } }

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.9395274, "hits" : [ { "_index" : "hello05", "_type" : "_doc", "_id" : "2", "_score" : 0.9395274, "_source" : { "name" : "李四", "age" : 3 } }, { "_index" : "hello05", "_type" : "_doc", "_id" : "4", "_score" : 0.79423964, "_source" : { "name" : "李小龙", "age" : 45 } } ] } }

GET hello05/_search { "query": { "match": { "name": "李" , "age": 45 } } }

就会报错，原因是match只允许一个查询条件，多条件可以用query bool must 来实现

{ "error" : { "root_cause" : [ { "type" : "parsing_exception", "reason" : "[match] query doesn't support multiple fields, found [name] and [age]", "line" : 6, "col" : 18 } ], "type" : "parsing_exception", "reason" : "[match] query doesn't support multiple fields, found [name] and [age]", "line" : 6, "col" : 18 }, "status" : 400 }

match:

GET hello05/_search { "query": { "match": { "name": "李龙" } } }

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 2.0519087, "hits" : [ { "_index" : "hello05", "_type" : "_doc", "_id" : "4", "_score" : 2.0519087, "_source" : { "name" : "李小龙", "age" : 45 } }, { "_index" : "hello05", "_type" : "_doc", "_id" : "2", "_score" : 0.9395274, "_source" : { "name" : "李四", "age" : 3 } } ] } }

==================

term :

GET hello05/_search { "query": { "term": { "name": "李龙" } } }

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }

区别是：

1：match的查询条件是会经过分词器分词的，然后再去和倒排索引去对比（对比term效率较低）

2：term的查询条件是不会分词的，是直接拿去和倒排索引去对比的，效率较高

3:同样term也是只能支持一个查询条件的

match和multi_match的区别在于match只允许传入的数据在一个字段上搜索，而multi_match可以在多个字段中搜索

例如：我们要实现输入李小龙，然后在title字段和content字段中搜索，就要用到multi_match，普通的match不可以

模拟京东搜索商品

分布式搜索引擎-ElasticSearch（上集）

PUT /goods { "mappings": { "properties": { "title":{ "analyzer": "standard", "type" : "text" }, "content":{ "analyzer": "standard", "type": "text" } } } }

GET goods/_search { "query": { //下面输入华为，会进行分词，然后在title和content两个字段中搜索 "multi_match": { "query": "华为", "fields": ["title","content"] } } }

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.1568705, "hits" : [ { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 1.1568705, "_source" : { "title" : "华为Mate30", "content" : "华为Mate30 8+128G，麒麟990Soc", "price" : "3998" } }, { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 1.0173018, "_source" : { "title" : "华为P40", "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼", "price" : "4999" } } ] } }

GET goods/_search { "query": { "match_phrase": { "content": "华为P40手机" } } }

结果查不到数据，原因是match_phrase是短语搜索，也就是精确搜索

elasticsearch默认的显示字段规则类似于MYSQL的select * from xxx ，我们可以自定义成类似于select id,name from xxx

GET goods/_search { "query": { "multi_match": { "query": "华为", "fields": ["title","content"] } } , "_source" : ["title","content"] //指定只显示title和content }

{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.1568705, "hits" : [ { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 1.1568705, "_source" : { "title" : "华为Mate30", "content" : "华为Mate30 8+128G，麒麟990Soc" } }, { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 1.0173018, "_source" : { "title" : "华为P40", "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼" } } ] } }

因为前面设计索引mapping失误，price没有进行设置，导致price是text类型，无法进行排序和filter range，所以我们再添加一个字段，od

POST goods/_update/1 { "doc": { "od":1 } }

省略2 3 4

GET goods/_search { "query": { "multi_match": { "query": "华为", "fields": ["title","content"] } } , "sort": [ { "od": { "order": "desc" //asc升序，desc降序 } } ] }

Elasticsearch 分布式实时流计算服务 CS 搜索引擎

微服务管理平台nacos虚拟ip负载均衡集群模式搭建

738 2025-04-01

分布式搜索引擎-ElasticSearch（上集）

ELK 设置定时清理脚本清理索引

（转载）ES6、ES7、ES8、ES9、ES10新特性一览

微服务管理平台nacos虚拟ip负载均衡集群模式搭建

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

系统的功能有哪些？餐饮服务系统的构成及工作程序">连锁餐饮管理系统的功能有哪些？餐饮服务系统的构成及工

什么是在线文档？怎么发在线文档

友情链接

分布式搜索引擎-ElasticSearch（上集）

微信扫一扫：分享

推荐文章

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

系统的功能有哪些？餐饮服务系统的构成及工作程序">连锁餐饮管理系统的功能有哪些？餐饮服务系统的构成及工

友情链接