略微加速

略速 - 互联网笔记

Clickhouse groupArray/groupUniqArray/arrayJoin/splitByChar

2022-06-28 leiting (4777阅读)

标签 数据库

行列转置一般由由行转为列,或者由列转为行。

CREATE TABLE datasets.t_city

(

    `province` String, 

    `city` String, 

    `createtime` DateTime, 

    `city_level` Int8

)

ENGINE = MergeTree()

ORDER BY province

SETTINGS index_granularity = 8192;


insert into t_city values('Hubei','Wuhan',now(),2),('Hubei','Xiangyang',now(),3),('Shanghai','Shanghai',now(),1),('Guangdong','Guangzhou',now(),1),('Guangdong','Shenzhen',now(),1),('Guangdong','DOngguan',now(),2),('Guangdong','Zhuhai',now(),3);


Clickhouse> select * from t_city;

SELECT *

FROM t_city

┌─province──┬─city──────┬──────────createtime─┬─city_level─┐

│ Guangdong │ Guangzhou │ 2020-07-07 14:02:53 │          1 │

│ Guangdong │ Shenzhen  │ 2020-07-07 14:02:53 │          1 │

│ Guangdong │ DOngguan  │ 2020-07-07 14:02:53 │          2 │

│ Guangdong │ Zhuhai    │ 2020-07-07 14:02:53 │          3 │

│ Hubei     │ Wuhan     │ 2020-07-07 14:02:53 │          2 │

│ Hubei     │ Xiangyang │ 2020-07-07 14:02:53 │          3 │

│ Shanghai  │ Shanghai  │ 2020-07-07 14:02:53 │          1 │

└───────────┴───────────┴─────────────────────┴────────────┘

7 rows in set. Elapsed: 0.002 sec. 

列转行:

查询每个省份用户的城市:groupArray


Clickhouse> select province, groupArray(city) from t_city group by province;

SELECT 

    province, 

    groupArray(city)

FROM t_city

GROUP BY province

┌─province──┬─groupArray(city)─────────────────────────────┐

│ Shanghai  │ ['Shanghai']                                 │

│ Hubei     │ ['Wuhan','Xiangyang']                        │

│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │

└───────────┴──────────────────────────────────────────────┘


插入一条重复的记录:

insert into t_city values('Hubei','Wuhan',now(),2);

 

可以看到Hubei有一个重复的wuhan

 

Clickhouse> select province, groupArray(city) from t_city group by province;

SELECT 

    province, 

    groupArray(city)

FROM t_city

GROUP BY province

┌─province──┬─groupArray(city)─────────────────────────────┐

│ Shanghai  │ ['Shanghai']                                 │

│ Hubei     │ ['Wuhan','Xiangyang','Wuhan']                │

│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │

└───────────┴──────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.002 sec. 

 

可以使用函数groupUniqArray进行去重:

 

Clickhouse> select province, groupUniqArray(city) from t_city group by province;


SELECT 

    province, 

    groupUniqArray(city)

FROM t_city

GROUP BY province

┌─province──┬─groupUniqArray(city)─────────────────────────┐

│ Shanghai  │ ['Shanghai']                                 │

│ Hubei     │ ['Wuhan','Xiangyang']                        │

│ Guangdong │ ['Zhuhai','Dongguan','Guangzhou','Shenzhen'] │

└───────────┴──────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.003 sec. 


字符串拆分:splitByChar

 

Clickhouse> select splitByChar('#','Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen') col;

 

SELECT splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen') AS col

┌─col────────────────────────────────────────────────────┐

│ ['Hubei_Wuhan','Hubei_Xiangyang','Guangdong_Shenzhen'] │

└────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 


可以使用ArrayJoin 函数将一行转为列:


Clickhouse> select arrayJoin(splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen')) col;

 

SELECT arrayJoin(splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen')) AS col

┌─col────────────────┐

│ Hubei_Wuhan        │

│ Hubei_Xiangyang    │

│ Guangdong_Shenzhen │

└────────────────────┘

3 rows in set. Elapsed: 0.002 sec. 


实际示例:

 

SELECT 

    province, 

    groupArray(city)

FROM t_city

GROUP BY province

┌─province──┬─groupArray(city)─────────────────────────────┐

│ Shanghai  │ ['Shanghai']                                 │

│ Hubei     │ ['Wuhan','Xiangyang','Wuhan']                │

│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │

└───────────┴──────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.003 sec. 

 

Clickhouse> SELECT     province,     arrayJoin(groupArray(city)) new_city FROM t_city GROUP BY province;

 

SELECT 

    province, 

    arrayJoin(groupArray(city)) AS new_city

FROM t_city

GROUP BY province

┌─province──┬─new_city──┐

│ Shanghai  │ Shanghai  │

│ Hubei     │ Wuhan     │

│ Hubei     │ Xiangyang │

│ Hubei     │ Wuhan     │

│ Guangdong │ Guangzhou │

│ Guangdong │ Shenzhen  │

│ Guangdong │ Dongguan  │

│ Guangdong │ Zhuhai    │

└───────────┴───────────┘

8 rows in set. Elapsed: 0.003 sec. 


结论:在clickhouse中主要使用两个函数groupArray 和ArrayJoin。


原文链接:https://blog.csdn.net/u010002184/article/details/124782457


北京半月雨文化科技有限公司.版权所有 京ICP备12026184号-3