hive按照某个字段分组,然后获取每个分组中最新的n条数据

2018年7月26日 6点热度 0条评论 来源: to.to
hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

解释说明:
ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn
表示通过 advertId 进行分区,然后在每个分区中按照时间降序排列结果。
ROW_NUMBER() 生成的时候通OVER(通过)(PARTITION BY advertId ORDER BY addTime desc) 这里面的结果生成行号

其中:

select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2

通过上面SQL生成的结果类似:

通过:

hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

生成的数据为:

    原文作者:to.to
    原文地址: https://blog.csdn.net/toto1297488504/article/details/85100542
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。