Elasticsearch-AdvanceSearch
批量导入数据
将 account.json 的数据加载到 elastic 中
数据类似如下(2条)
1 2 3 4
| {"index":{"_id":"1"}} {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} {"index":{"_id":"6"}} {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
|
1 2 3 4 5 6 7 8
| curl -XPOST "localhost:9200/bank/_bulk?pretty&refresh" -H "Content-Type: application/json" --data-binary "@accounts.json"
curl -XGET "localhost:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open bank qerxmeePRsmIeXoZ58eNsQ 1 1 1000 0 396.8kb 396.8kb yellow open customer LbQLl0UQTSOPnn54wQwA2g 1 1 1 1 4.3kb 4.3kb
|
搜索 _search
注意,一旦获得了搜索结果,Elasticsearch就会结束这次搜索,不会再维护任何服务端资源,也没有结果游标,这与其他很多平台,如SQL,不一样。
通过 uri 传递参数
1 2 3 4 5 6
| GET /bank/_search?q=*&sort=account_number:asc&pretty curl -XGET "localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty"
|
通过 request body 传递参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] }
curl -XGET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": {"match_all": {}}, "sort": [ {"account_number": "asc"} ] } '
|
响应数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| { "took" : 22, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "0", "_score" : null, "_source" : { "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "bradshawmckenzie@euron.com", "city" : "Hobucken", "state" : "CO" }, "sort" : [ 0 ] }, ... ] } }
|
Query DSL(Query domain-specific language)
Elasticsearch提供了一种json风格的查询语言
sort, from, size 排序,分页
1 2 3 4 5 6 7 8 9 10
| curl -XGET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": {"match_all": {}}, "sort": [ {"account_number": "asc"} ], "from":5, "size":5 } '
|
query 表示这次查询的定义
match_all 表示查询类型-匹配所有文档
sort 指定排序
from 参数(基于0)指定从哪个文档序号开始(默认0),size参数指定返回多少个文档(默认10),这两个参数对于搜索结果分页非常有用
_source 指定返回的字段
默认情况下, 搜索结果中包含了完整的json文档( _source 字段),如果不希望返回源文档全部内,可以设置要返回的字段
1 2 3 4 5 6
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query":{"match_all": {}}, "_source":["account_number", "balance"] } '
|
match_all 匹配所有文档
1 2 3 4 5
| curl -XGET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": {"match_all": {}} } '
|
match 匹配查询
返回 account_number 为 20 的所有账户
1 2 3 4 5 6 7
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match": {"account_number": 20} } } '
|
返回 address 包含 mill 的所有账户
1 2 3 4 5 6 7
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match": {"address": "mill"} } } '
|
返回 address 包含 mill 或 lane 的所有账户
1 2 3 4 5 6 7
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match": {"address": "mill lane"} } } '
|
match_phrase 匹配整个短语
返回 address 包含 “mill lane” 的所有账户
1 2 3 4 5 6 7
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match_phrase": {"address": "mill lane"} } } '
|
bool 布尔查询
布尔查询使用布尔逻辑,将小查询组合成大查询
bool must 包含两个 match,逻辑与
返回地址中包含 mill 且也包含 lane 的账户
must 下所有匹配条件为真,文档才视为匹配
1 2 3 4 5 6 7 8 9 10 11 12
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ {"match": {"address": "mill"}}, {"match": {"address": "lane"}} ] } } } '
|
bool should 包含两个 match,逻辑或
返回地址中包含 mill 或也包含 lane 的账户
must 下任意一个匹配条件为真,文档就视为匹配
1 2 3 4 5 6 7 8 9 10 11 12
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "should": [ {"match": {"address": "mill"}}, {"match": {"address": "lane"}} ] } } } '
|
bool must_not包含两个 match,逻辑与非
返回地址中既不包含“mill”也不包含“lane”的帐户
bool must_not子句包含的匹配条件全部为假,文档将被视为匹配
1 2 3 4 5 6 7 8 9 10 11 12
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must_not": [ {"match": {"address": "mill"}}, {"match": {"address": "lane"}} ] } } } '
|
bool 查询中同时组合 must、should和must_not (解决should 不生效问题)
当使用should查询时,如果包含了must或者filter查询,那么should的查询语句就不是或者的意思了,而是有或者没有都行的含义。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ], "should": [ {"match": {"address": "Jackson"}}, {"match": {"address": "Suydam"}} ], "minimum_should_match": 1 } } } '
curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { // 先满足前置条件 "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } }, { // 再满足后置条件 "bool": { "should": [ {"match": {"address": "Jackson"}}, {"match": {"address": "Suydam"}} ] } } ] } } } '
|
filter 过滤
_score(分数)字段是衡量文档与搜索条件匹配程度的一个指标。分数越高,文档越相关,分数越低,文档越不相关。并不总是需要生成分数,需不需要Elasticsearch会自动判断,以避免计算无用的分数。
布尔查询还支持filter子句,用于设置过滤条件。过滤条件不影响文档的相关性分数。
返回 balance 在 20000-30000 之间的账户
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": { "match_all": {}}, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } '
|
aggs 聚合
聚合提供了对数据进行分组、统计的能力,类似于SQL中GROUP by和SQL聚合函数。在Elasticsearch中,可以同时返回搜索结果及其聚合计算结果,这是非常强大和高效的。
terms 账户按所在州分组,统计每组账户数量,然后返回前10条目
类似sql中的 SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
注: size=0 表示不显示搜索结果,我们只想看到聚合结果
1 2 3 4 5 6 7 8 9 10 11 12
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { // group_by_state 不是关键字,只是结果展示的名称 "terms": { // 这个才是关键字 "field": "state.keyword" } } } } '
|
terms, avg 所有账户按州计算平均账户余额,返回前10条目,按账户数量降序排列
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { // 先按照州聚合 "group_by_state": { "terms": { "field": "state.keyword", "order": { "average_balance": "desc" } }, "aggs": { // 再计算每个州的平均账户余额 "average_balance": { "avg": { "field": "balance" } } } } } } '
|
range, term, avg 先按照年龄段分组,然后按性别分组,统计每个年龄等级,每种性别的平均账户余额
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } } '
|
参考资料
Elasticsearch 搜索数据
ES多条件查询must和should不能同时生效问题
es bool多条件查询should和must同时使用注意