2023-08-08  阅读(289)
原文作者:Ressmix 原文地址:https://www.tpvlog.com/article/146

我们在上一章讲到了Elasticsearch在聚合分析中,可以使用“term”对指定field按照数量分组。本章,我们将介绍另一种区间分组的方式——histogram。

一、简介

1.1 histogram

histogram,它会接收一个field,然后按照这个field值的各个范围区间,进行bucket分组操作:

    GET /tvs/_search
    {
       "size" : 0,
       "aggs":{
          "price":{
             "histogram":{ 
                "field": "price",
                "interval": 2000
             }
          }
       }
    }

上述请求中,我们对“price”字段进行区间分组,区间间隔为2000,返回结果如下:

    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 8,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "price" : {
          "buckets" : [
            {
              "key" : 0.0,
              "doc_count" : 3
            },
            {
              "key" : 2000.0,
              "doc_count" : 4
            },
            {
              "key" : 4000.0,
              "doc_count" : 0
            },
            {
              "key" : 6000.0,
              "doc_count" : 0
            },
            {
              "key" : 8000.0,
              "doc_count" : 1
            }
          ]
        }
      }
    }

按照区间分组之后,我们就可以对各个bucket执行metric操作了,比如计算总和:

    GET /tvs/_search
    {
       "size" : 0,
       "aggs":{
          "price":{
             "histogram":{ 
                "field": "price",
                "interval": 2000
             },
             "aggs":{
                "revenue": {
                   "sum": { 
                     "field" : "price"
                   }
                 }
             }
          }
       }
    }

1.2 date_histogram

如果我们希望的按区间分组的字段是date类型的,那么需要用到date_histogram关键字。比如:

    GET /tvs/_search
    {
       "size" : 0,
       "aggs": {
          "sales": {
             "date_histogram": {
                "field": "sold_date",
                "interval": "month", 
                "format": "yyyy-MM-dd",
                "min_doc_count" : 0, 
                "extended_bounds" : { 
                    "min" : "2016-01-01",
                    "max" : "2017-12-31"
                }
             }
          }
       }
    }

解释下上述几个关键参数:

  • min_doc_count:某个日期区间内的doc数量至少要等于这个参数,这个区间才会返回;
  • extended_bounds:划分bucket的时候,会限定在这个起始日期和截止日期内。

二、实战

2.1 date_histogram

假设我们现在的需求是:统计每季度每个品牌的电视销售额,那么可以这样构造请求:

    GET /tvs/_search 
    {
      "size": 0,
      "aggs": {
        "group_by_sold_date": {
          "date_histogram": {
            "field": "sold_date",
            "interval": "quarter",
            "format": "yyyy-MM-dd",
            "min_doc_count": 0,
            "extended_bounds": {
              "min": "2016-01-01",
              "max": "2017-12-31"
            }
          },
          "aggs": {
            "total_sum_price": {
              "sum": {
                "field": "price"
              }
            },  
            "group_by_brand": {
              "terms": {
                "field": "brand"
              },
              "aggs": {
                "sum_price": {
                  "sum": {
                    "field": "price"
                  }
                }
              }
            }
          }
        }
      }
    }

上述请求其实就是先按日期进行分组,然后下钻到组内再按照品牌分组,最后对每个子组执行求和metric操作,结果如下:

    {
      "took" : 97,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 8,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "group_by_sold_date" : {
          "buckets" : [
            {
              "key_as_string" : "2016-01-01",
              "key" : 1451606400000,
              "doc_count" : 0,
              "total_sum_price" : {
                "value" : 0.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [ ]
              }
            },
            {
              "key_as_string" : "2016-04-01",
              "key" : 1459468800000,
              "doc_count" : 1,
              "total_sum_price" : {
                "value" : 3000.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "小米",
                    "doc_count" : 1,
                    "sum_price" : {
                      "value" : 3000.0
                    }
                  }
                ]
              }
            },
            {
              "key_as_string" : "2016-07-01",
              "key" : 1467331200000,
              "doc_count" : 2,
              "total_sum_price" : {
                "value" : 2700.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "TCL",
                    "doc_count" : 2,
                    "sum_price" : {
                      "value" : 2700.0
                    }
                  }
                ]
              }
            },
            {
              "key_as_string" : "2016-10-01",
              "key" : 1475280000000,
              "doc_count" : 3,
              "total_sum_price" : {
                "value" : 5000.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "长虹",
                    "doc_count" : 3,
                    "sum_price" : {
                      "value" : 5000.0
                    }
                  }
                ]
              }
            },
            {
              "key_as_string" : "2017-01-01",
              "key" : 1483228800000,
              "doc_count" : 2,
              "total_sum_price" : {
                "value" : 10500.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "三星",
                    "doc_count" : 1,
                    "sum_price" : {
                      "value" : 8000.0
                    }
                  },
                  {
                    "key" : "小米",
                    "doc_count" : 1,
                    "sum_price" : {
                      "value" : 2500.0
                    }
                  }
                ]
              }
            },
            {
              "key_as_string" : "2017-04-01",
              "key" : 1491004800000,
              "doc_count" : 0,
              "total_sum_price" : {
                "value" : 0.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [ ]
              }
            },
            {
              "key_as_string" : "2017-07-01",
              "key" : 1498867200000,
              "doc_count" : 0,
              "total_sum_price" : {
                "value" : 0.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [ ]
              }
            },
            {
              "key_as_string" : "2017-10-01",
              "key" : 1506816000000,
              "doc_count" : 0,
              "total_sum_price" : {
                "value" : 0.0
              },
              "group_by_brand" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [ ]
              }
            }
          ]
        }
      }
    }

三、总结

本章,我介绍了聚合分析中的区间分组,Elasticsearch采用histogram关键字来完成对指定字段值的区间分组,如果我们想要分组的字段类型为日期,则需要使用date_histogram关键字。

阅读全文
  • 点赞