SearchAPI
ES支持两种基本方式检索;
- 通过REST request uri 发送搜索参数 (uri +检索参数);
- 通过REST request body 来发送它们(uri+请求体);
检索信息
检索从_search开始
例如:
shGET bank/_search?q=*&sort=account_number:asc
等价于
shGET bank/_search { "query":{ "match_all":{ } }, "sort":[ { "account_number":{ "order":"desc" } } ] }
响应结果解释: took - Elasticsearch 执行搜索的时间(毫秒) time_out - 告诉我们搜索是否超时 _shards - 告诉我们多少个分片被搜索了,以及统计了成功/失败的搜索分片 hits - 搜索结果 hits.total - 搜索结果 hits.hits - 实际的搜索结果数组(默认为前 10 的文档) sort - 结果的排序 key(键)(没有则按 score 排序) score 和 max_score –相关性得分和最高得分(全文检索用)
查询数据Query DSL
指定字段查询 match
查询 name 中含有 坤
或 小
的
GET /stars/_search
{
"query": {
"match": {
"name": "坤 小"
}
}
}
查询短语匹配 match_phrase
如果我们希望查询的条件是某字段中包含 "坤 小"
也就是将需要匹配的值当成一个整体单词(不分词)进行检索,使用 match_phrase
GET /stars/_search
{
"query": {
"match_phrase": {
"name": "坤 小"
}
}
}
多字段匹配 multi_match
查询 state 或者 address 包含 mill
GET bank/_search
{
"query":{
"multi_match":{
"query":"mill",
"fields":[
"state",
"address"
]
}
}
}
查询结果显示指定字段
例如,查询结果只显示 age 和 desc
GET /stars/_search
{
"query": {
"match": {
"name": "凡"
}
},
"_source":["age", "desc"]
}
排序和分页
- asc 升序、desc 降序
GET /stars/_search
{
"query": { "match_all": {} },
"sort": [
{
"age.keyword": {
"order": "desc"
}
}
]
}
- 分页查询
本质上就是 from(开始位置)和 size(返回数据数目)两个字段
GET /stars/_search
{
"query": { "match_all": {} },
"from": 0,
"size": 2
}
多条件查询
- bool
must:相当于关系型数据库 and
GET stars/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "吴亦凡"
}
},
{
"match": {
"age": "29"
}
}
]
}
}
}
should:相当于关系型数据库 or
GET stars/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "吴亦凡"
}
},
{
"match": {
"age": "19"
}
}
]
}
}
}
must_not:相当于关系型数据库 not
结果过滤查询 filter
例如,查询 10岁<=age=<30岁
GET stars/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"age": {
"gte": 10,
"lte": 30
}
}
}
]
}
}
}
匹配多个条件查询
查询 tags 有 唱跳 的
GET stars/_search
{
"query": {
"match": {
"tags": "唱 跳"
}
}
}
多个条件使用空格隔开
精确查询term
关于分词
- term:直接通过倒排索引指定的词条进行精确查询
GET stars/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "吴亦凡"
}
}
]
}
}
}
返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
- match:先分析文档,再通过分析的文档进行查询
吴小凡 也会被查出来
两个字段类型
- text:会被分词器解析
- keyword:不会被分词器解析
创建索引,用mappings
指明类型,并插入数据
PUT testdb
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"desc": {
"type": "keyword"
}
}
}
}
PUT testdb/_doc/1
{
"name": "赵深宸 name",
"desc": "致远 desc"
}
PUT testdb/_doc/2
{
"name": "赵深宸 name2",
"desc": "致远 desc2"
}
上述中 testdb 索引中,字段name在被查询时会被分析器进行分析后匹配查询。而属于keyword类型不会被分析器处理。
我们来验证一下:
- keyword
GET _analyze
{
"analyzer": "keyword",
"text": "赵深宸 name"
}
返回
{
"tokens" : [
{
"token" : "赵深宸 name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
}
]
}
- standard
GET _analyze
{
"analyzer": "standard",
"text": "赵深宸 name"
}
返回
{
"tokens" : [
{
"token" : "赵",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "深",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "宸",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "name",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
查询测试下:
GET testdb/_search
{
"query": {
"match": {
"desc": "致远 desc"
}
}
}
结果:
只返回 一条数据,查询结果没有 “致远 desc2”
- 指定字段类型进行查询
GET stars/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name.keyword": "吴亦凡"
}
}
]
}
}
}
查找多个精确值
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_finding_multiple_exact_values.html
高亮查询
- 默认高亮标签
GET stars/_search
{
"query": {
"match": {
"name": "吴亦凡"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
返回
{
"took" : 160,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.9424872,
"hits" : [
{
"_index" : "stars",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.9424872,
"_source" : {
"name" : "吴亦凡",
"age" : "29",
"desc" : "大碗宽面",
"tags" : [
"加拿大",
"电鳗",
"说唱",
"嘻哈"
]
},
"highlight" : {
"name" : [
"<em>吴</em><em>亦</em><em>凡</em>"
]
}
}
]
}
}
- 自定义高亮标签
GET stars/_search
{
"query": {
"match": {
"name": "吴亦凡"
}
},
"highlight": {
"pre_tags": "<p class='key' style='color:red'>",
"post_tags": "</p>",
"fields": {
"name": {}
}
}
}
结果:
"highlight" : {
"name" : [
"<p class='key' style='color:red'>吴</p><p class='key' style='color:red'>亦</p><p class='key' style='color:red'>凡</p>"
]
}
"highlight" : {
"name" : [
"<p class='key' style='color:red'>吴</p>小<p class='key' style='color:red'>凡</p>"
]
}
聚合查询
我们知道SQL中有group by,在ES中它叫Aggregation,即聚合运算。
用之前导入索引的数据进行测试
简单聚合
比如我们希望计算出每个州的统计数量, 使用aggs
关键字对state
字段聚合,被聚合的字段无需对分词统计,所以使用state.keyword
对整个字段统计
group_by_state 为取的名字,根据自己需求取名
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30
},
{
"key" : "MD",
"doc_count" : 28
},
{
"key" : "ID",
"doc_count" : 27
},
{
"key" : "AL",
"doc_count" : 25
},
{
"key" : "ME",
"doc_count" : 25
},
{
"key" : "TN",
"doc_count" : 25
},
{
"key" : "WY",
"doc_count" : 25
},
{
"key" : "DC",
"doc_count" : 24
},
{
"key" : "MA",
"doc_count" : 24
},
{
"key" : "ND",
"doc_count" : 24
}
]
}
}
}
因为无需返回条件的具体数据, 所以设置size=0,返回hits为空。
doc_count
表示bucket中每个州的数据条数。
嵌套聚合
比如承接上个例子, 计算每个州的平均结余。涉及到的就是在对state分组的基础上,嵌套计算avg(balance):
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
返回:
{
"took" : 49,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30,
"average_balance" : {
"value" : 26073.3
}
},
{
"key" : "MD",
"doc_count" : 28,
"average_balance" : {
"value" : 26161.535714285714
}
},
{
"key" : "ID",
"doc_count" : 27,
"average_balance" : {
"value" : 24368.777777777777
}
},
{
"key" : "AL",
"doc_count" : 25,
"average_balance" : {
"value" : 25739.56
}
},
{
"key" : "ME",
"doc_count" : 25,
"average_balance" : {
"value" : 21663.0
}
},
{
"key" : "TN",
"doc_count" : 25,
"average_balance" : {
"value" : 28365.4
}
},
{
"key" : "WY",
"doc_count" : 25,
"average_balance" : {
"value" : 21731.52
}
},
{
"key" : "DC",
"doc_count" : 24,
"average_balance" : {
"value" : 23180.583333333332
}
},
{
"key" : "MA",
"doc_count" : 24,
"average_balance" : {
"value" : 29600.333333333332
}
},
{
"key" : "ND",
"doc_count" : 24,
"average_balance" : {
"value" : 26577.333333333332
}
}
]
}
}
}
对聚合结果排序
可以通过在aggs中对嵌套聚合的结果进行排序
比如承接上个例子, 对嵌套计算出的avg(balance),这里是average_balance,进行排序
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}