首页 > Postgresql > PostgreSQL CPU占用100% 问题分析

PostgreSQL CPU占用100% 问题分析

1 使用nmon查看系统资源,或如下ps命令

ps auxw --sort=%cpu|tail -10


使用如下sql确定相应sql

SELECT
	procpid,
	START,
	now( ) - START AS lap,
	current_query 
FROM
	(
	SELECT
		backendid,
		pg_stat_get_backend_pid ( S.backendid ) AS procpid,
		pg_stat_get_backend_activity_start ( S.backendid ) AS START,
		pg_stat_get_backend_activity ( S.backendid ) AS current_query 
	FROM
		( SELECT pg_stat_get_backend_idset ( ) AS backendid ) AS S 
	) AS S 
WHERE
	current_query <> '<IDLE>' 
	AND procpid = 121231 --相应操作系统PID
ORDER BY
	lap DESC;

2 追踪慢SQL
突然间的慢,基本都与特定的SQL的执行计划发生变化有关,如何确定慢SQL或者说问题SQL,有如下几种方法.
1)第一种方法是,pg_stat_statements插件
使用前,先复位下这个插件的统计信息

select pg_stat_reset();
select pg_stat_statements_reset();

等待一段时间(例如1分钟),使计数器积累足够的信息,然后查询最耗时的SQL(一般就是导致问题的直接原因)。

select userid::regrole, dbid, total_time,query from pg_stat_statements order by total_time desc limit 20; 

查询读取Buffer次数最多的SQL,这些SQL可能由于所查询的数据没有索引,而导致了过多的Buffer读,也同时大量消耗了CPU。

SELECT
	userid :: regrole,
	dbid,
	( shared_blks_hit + shared_blks_read ) AS shared_buffer_read,
	query 
FROM
	pg_stat_statements 
ORDER BY
	shared_blks_hit + shared_blks_read DESC 
	LIMIT 5;

2)第二种方法是,直接通过pg_stat_activity视图
利用下面的查询,查看当前长时间执行,一直不结束的SQL。这些SQL对应造成CPU满,也有直接嫌疑。

SELECT
	'select pg_terminate_backend(' || pid || ');' AS KILL,
backend_xid,
datname,
usename,
client_addr,
application_name,
STATE,
backend_start,
xact_start,
xact_stay,
query_start,
query_stay,
REPLACE ( QUERY, chr ( 10 ), ' ' ) AS QUERY 
FROM
	(
	SELECT
		pgsa.pid,
		pgsa.datname AS datname,
		pgsa.usename AS usename,
		pgsa.client_addr client_addr,
		pgsa.application_name AS application_name,
		pgsa.STATE AS STATE,
		pgsa.backend_xid,
		pgsa.backend_start AS backend_start,
		pgsa.xact_start AS xact_start,
		EXTRACT ( epoch FROM ( now( ) - pgsa.xact_start ) ) AS xact_stay,
		pgsa.query_start AS query_start,
		EXTRACT ( epoch FROM ( now( ) - pgsa.query_start ) ) AS query_stay,
		pgsa.QUERY AS QUERY 
	FROM
		pg_stat_activity AS pgsa 
	WHERE
		pgsa.STATE != 'idle' 
		AND pgsa.STATE != 'idle in transaction' 
		AND pgsa.STATE != 'idle in transaction (aborted)' 
	) idleconnections 
WHERE
	backend_xid IS NULL -- and query like '%V1.organization_no%'
	
ORDER BY
	query_stay DESC 
	LIMIT 50;

3)第3种方法,是从数据表上表扫描(Table Scan)的信息开始查起
查找缺失索引的表。数据表如果缺失索引,数据库只能使用表扫描,并需要处理已在内存中的大量的无关记录,而耗费大量CPU。
通过下面的查询,查出使用表扫描最多的表:

select * from pg_stat_user_tables where n_live_tup > 100000 and seq_scan > 0 order by seq_tup_read desc limit 10;

查询当前正在运行的访问到上述表的慢查询:

select * from pg_stat_activity where query ilike '%<table name>%' and query_start - now() > interval '10 seconds';

也可以通过pg_stat_statements插件定位涉及到这些表的查询:

select * from pg_stat_statements where query ilike '%<table>%'order by shared_blks_hit+shared_blks_read desc limit 10;

3 处理慢SQL
对于上面的方法查出来的慢SQL,首先需要做的可能是Cancel或Kill掉他们,使业务先恢复:

select pg_cancel_backend(pid) from pg_stat_activity where  query like '%<query text>%' and pid != pg_backend_pid();
select pg_terminate_backend(pid) from pg_stat_activity where  query like '%<query text>%' and pid != pg_backend_pid();

4 优化慢SQL
如果这些SQL确实是业务上必需的,则需要对他们做优化,这方面有“三板斧”:
1)对查询涉及的表,执行ANALYZE table_name或VACUUM ANZLYZE table_name,更新表的统计信息,使查询计划更准确。注意,为避免对业务影响,最好在业务低峰执行。
2)执行explain (query text)或explain (buffers true, analyze true, verbose true) (query text)命令,查看SQL的执行计划(注意,前者不会实际执行SQL,后者会实际执行而且能得到详细的执行信息),对其中的Table Scan涉及的表,建立索引。
3)重新编写SQL,去除掉不必要的子查询、改写UNION ALL、使用JOIN CLAUSE固定连接顺序等到,都是进一步深度优化SQL的手段。

5 其它常用SQL
1)那么如何知道有哪些表、哪些进程被锁住了?
可以用如下SQL查出来:

SELECT
	* 
FROM
	pg_locks
	A JOIN pg_class b ON A.relation = b.oid
	JOIN pg_stat_activity C ON A.pid = C.pid 
WHERE
	A.MODE LIKE'%ExclusiveLock%';

杀掉指定表指定锁的进程

SELECT
	pg_cancel_backend ( A.pid ) 
FROM
	pg_locks
	A JOIN pg_class b ON A.relation = b.oid
	JOIN pg_stat_activity C ON A.pid = C.pid 
WHERE
	b.relname ILIKE'表名' 
	AND A.MODE LIKE'%ExclusiveLock%';

--或者使用更加霸道的pg_terminate_backend():
SELECT
	pg_terminate_backend ( A.pid ) 
FROM
	pg_locks
	A JOIN pg_class b ON A.relation = b.oid
	JOIN pg_stat_activity C ON A.pid = C.pid 
WHERE
	b.relname ILIKE'表名' 
	AND A.MODE LIKE'%ExclusiveLock%';

2)查看enqueue锁

SELECT
	pid,
	datname,
	usename,
	wait_event_type,
	wait_event,
	pg_blocking_pids ( pid ) AS blocked_by,
	query 
FROM
	pg_stat_activity 
	WHERE--wait_event IS NOT NULL
	STATE != 'idle' 
	AND STATE != 'idle in transaction' 
	AND STATE != 'idle in transaction (aborted)';
分类: Postgresql 标签: ,
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.