谈谈clickhouse优化

ClickHouse版本

23.10.1.396

问题描述

情况如上图,项目运行一段时间,随着数据的累计,服务器内存占用率已经接近100%,通过top命令查看内存占用情况,发现clickhouse进程占用了80%的内存,导致服务器内存占用率接近100%

参考资料

比较匆忙的看了资料,配置文件中有这样一个描述 ,于是将其改成了8G看看效果

1
2
3
4
5
6
7
8
9
10
11
12
<!-- Maximum memory usage (resident set size) for server process.
Zero value or unset means default. Default is "max_server_memory_usage_to_ram_ratio" of available physical RAM.
If the value is larger than "max_server_memory_usage_to_ram_ratio" of available physical RAM, it will be cut down.

The constraint is checked on query execution time.
If a query tries to allocate memory and the current memory usage plus allocation is greater
than specified threshold, exception will be thrown.

It is not practical to set this constraint to small values like just a few gigabytes,
because memory allocator will keep this amount of memory in caches and the server will deny service of queries.
-->
<max_server_memory_usage>8G</max_server_memory_usage>

二轮优化

在运行了大概2个月之后,情况没有改善,内存仍然继续增长,但是本次的提示如上图超过的数值跟我文件中配置的数值有偏差,当然中间也尝试过配置
使用类似 SET max_memory_usage = 3 * 1024 * 1024 * 1024; 这样的语句去动态配置,但是使用如下语句查询返回还是没变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
select  *  from `system`.settings  where name in(
'max_memory_usage_for_user',
'max_memory_usage',
'max_bytes_before_external_group_by',
'max_bytes_before_external_sort',
'max_memory_usage_for_all_queries',
'max_concurrent_queries_for_user',
'max_concurrent_queries_for_all_users',
'max_concurrent_queries',
'max_server_memory_usage',
'max_server_memory_usage_to_ram_ratio',
'max_thread_pool_size',
'distributed_aggregation_memory_efficient'
);

后来增加了配置项

config.xml

  • max_server_memory_usage_to_ram_ratio

    这里按照总内存的百分比计算

    1
    2
    3
    4
    <clickhouse>
    <!-- On memory constrained environments you may have to set this to value larger than 1.-->
    <max_server_memory_usage_to_ram_ratio>0.2</max_server_memory_usage_to_ram_ratio>
    </clickhouse>

users.xml

  • max_bytes_before_external_group_by

  • max_bytes_before_external_sort

    这里设置分组查询和排序时内存的最大使用阈值1G,超过使用硬盘

    1
    2
    3
    4
    5
    6
    7
    8
    <clickhouse>
    <profiles>
    <default>
    <max_bytes_before_external_group_by>1073741824</max_bytes_before_external_group_by>
    <max_bytes_before_external_sort>1073741824</max_bytes_before_external_sort>
    </default>
    <profiles>
    </clickhouse>

    2024-10-14 15:08:09 后面跟进看看优化效果

三轮优化

控制内存有点治标不治本,这次优化方面如下

控制集群节点的数据存储权重

config.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<clickhouse>
    <remote_servers>
        <cluster_3S_1R>
             <shard>
                <internal_replication>true</internal_replication>
#权重
            <weight>10</weight>
        <replica>
             <host>...</host>
            <port>...</port>
            <user>...</user>
            <password>...</password>
                </replica>
             </shard>

<shard>
                <internal_replication>true</internal_replication>
#权重
            <weight>45</weight>
        <replica>
             <host>...</host>
            <port>...</port>
            <user>...</user>
            <password>...</password>
                </replica>
             </shard>

<shard>
                <internal_replication>true</internal_replication>
#权重
            <weight>45</weight>
        <replica>
             <host>...</host>
            <port>...</port>
            <user>...</user>
            <password>...</password>
                </replica>
             </shard>
        </cluster_3S_1R>
    </remote_servers>
</clickhouse>

控制数据的保存时间

1
2
alter table  table_data on   cluster cluster_3S_1R 
modify TTL create_time + toIntervalDay(45) ;

2024-11-02 09:00:09 后面跟进看看优化效果


完结