Postgresql分析慢sql

i查拉图斯特拉如是说 2023-03-28 原文

现象

突然发现测试环境一条慢sql,就想着分析一下，写写总结。说到优化其实我个人认为是不到不得已还是没有必要的，毕竟除非特别重大的问题，影响了基本操作和体验，平时还是基本的配置也够了，就像《重构：改善代码的既有设计》当你闻到了代码的坏味道才需要重构，毕竟如果一个项目用户量小，并发不高，其实优化跟不优化差距差不了多少，而且有可能改着改着，新的bug就有产生了。但大的项目就不一样了，一点点小小的优化就是指数级别的收益，就像你跑步一样，跑个5公里，10公里，什么都可以不在乎，但是你跑个50公里，衣服鞋子肺活量身体素质，这些都是要考虑的范围。突然让我想起最近边跑马拉松边抽烟那哥们，简直违反生物学，牛顿看了都会直呼惊呆了?。

select fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side,
fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchange
from ccc  fo   JOIN aaa sp ON fo.symbol = sp.symbol
JOIN bbb bo ON bo.order_status = '1' and bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_id
where 1 = 1 and fo.trade_time >= 1609430400000  and fo.trade_time <= 1702655999999
and fo.broker_account =  any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
)
ORDER BY fo.trade_time DESC, fo.fund_account, fo.symbol   LIMIT 30 OFFSET 0

言归正传，当我看到这条sql的时候,我查了一下发现sql执行用了12s,顿时有点惊呆了，一般的sql大概超过2s就应该优化了，好了我们来分析一下吧。

分析

拿到sql我就想看看表数据量多少


select count(*) from aaa;
6945

select count(*) from bbb;
558729

select count(*) from ccc;
798024

从表数据上面来看，50多万的数据还好，不至于这么差，接下来好好分析一下。第一个点，但从sql上面我就发现一个点不合理，我之前也喜欢用 where 1=1觉得后面就是一个条件true，直到后来经过跟别人讨论，有一种可能SQL解析会认为1是一个属性名，完了去表里面找这样就跟写SQL背到而驰了，我们理解可能是认为他就是TRUE，但是回到SQL解析上面又差别不大，去掉1=1之后发现运行速度快了3秒，从某种程度来说还是会影响SQL的执行效率，而且从多表拼接的SQL上面确实发现啊了200毫秒，单表可能看不出什么差距。

explain analyze select * from ccc where 1= 1 limit 10;
Limit  (cost=0.00..0.35 rows=10 width=289) (actual time=0.017..0.020 rows=10 loops=1)
  ->  Seq Scan on ccc  (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.016..0.018 rows=10 loops=1)
Planning time: 0.150 ms
Execution time: 0.057 ms

explain analyze select * from ccc limit 10;
Limit  (cost=0.00..0.35 rows=10 width=289) (actual time=0.020..0.023 rows=10 loops=1)
  ->  Seq Scan on ccc  (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.018..0.019 rows=10 loops=1)
Planning time: 0.173 ms
Execution time: 0.062 ms

这是当天的SQL的执行计划和索引，我发现几个神奇的现象：

PG数据的ANY居然比常用的in还要好，原因竟然是走ANY不需要使用哈希表，我是看到大佬的文章得出的结论。
如果没有匹配到索引ORDER BY的运行效率会变得非常慢，如果匹配到了索引那么速度就会非常快。

create unique index idx_aaa_symbol
    on aaa (symbol);
-----
create index idx_bbb_place_time
    on bbb (place_time);

create index idx_bbb_broker_account
    on bbb (broker_account);
-----
create index idx_ccc_trade_time
    on ccc (trade_time);

create index idx_ccc_broker_account
    on ccc (broker_account);
------

explain (ANALYZE,BUFFERS)
    select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id,
       fo.symbol, fo.trade_type, fo.side,
       fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time
       ,bo.price book_price
--        ,sp.symbol_name, sp.exchange
from ccc  fo
--          JOIN aaa sp ON fo.symbol = sp.symbol
         JOIN bbb bo ON bo.order_status = '1'
--                                        and bo.order_id = fo.order_id
                                       and bo.cl_order_id = fo.cl_order_id
where
    fo.trade_time < 1702655999999
       and fo.trade_time > 1609430400000
--     fo.trade_time = 1669704603470
--     and fo.fo_id = 1007811
--   and fo.broker_account in ('0014510913T0','0014510911As','0014510911Th','0014510911Ts','0014510913Alpha','100000001Ts','100000001As','100000001Th','0014510910As','0014510910Ts','0014510910Th','0014510921Th','0014510921Ts','0014510219Th','0014510219As',
--                             '0014510921Ah','0014510219Ts','0014510921As','16654T0','16654Alpha','0321080000016656Th','0321080000016656Ts','0321080000016656As','0014510920Th','0014510920Ts','0014510920As','0014510910Ah','100000001Ah','0014510911Ah','0311020001234597Ah','0014510219Ah','0321080000016656Ah','0014510920Ah','0014510909T0','0014510909Alpha')
    and fo.broker_account =  any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
        )
  ORDER BY fo.trade_time DESC
  , fo.fund_account ASC
  , fo.symbol
LIMIT 30 OFFSET 0;

进行了字段排序sort，但是ANY却不匹配索引了，所以进行了全表检索，执行效率非常的慢。

Limit  (cost=1483726.37..1483726.45 rows=30 width=89) (actual time=3398.207..3398.213 rows=30 loops=1)
"  Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
  ->  Sort  (cost=1483726.37..1554119.82 rows=28157379 width=89) (actual time=3398.205..3398.208 rows=30 loops=1)
"        Sort Key: fo.trade_time DESC, fo.fund_account, fo.symbol"
        Sort Method: top-N heapsort  Memory: 37kB
"        Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
        ->  Merge Join  (cost=226264.61..652113.59 rows=28157379 width=89) (actual time=2051.812..2788.036 rows=1345806 loops=1)
              Merge Cond: (fo.cl_order_id = bo.cl_order_id)
"              Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
              ->  Sort  (cost=134482.56..135779.65 rows=518838 width=89) (actual time=1436.513..1584.291 rows=512740 loops=1)
                    Sort Key: fo.cl_order_id
                    Sort Method: external merge  Disk: 56464kB
"                    Buffers: shared hit=1113 read=16319 dirtied=2, temp read=10028 written=10050"
                    ->  Seq Scan on ccc fo  (cost=0.00..58628.56 rows=518838 width=89) (actual time=0.069..882.202 rows=512740 loops=1)
"                          Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"
                          Rows Removed by Filter: 188246
                          Buffers: shared hit=1113 read=16319 dirtied=2
              ->  Materialize  (cost=91406.68..93963.41 rows=511346 width=16) (actual time=613.979..813.987 rows=1521084 loops=1)
"                    Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"
                    ->  Sort  (cost=91406.68..92685.04 rows=511346 width=16) (actual time=613.975..695.387 rows=510968 loops=1)
                          Sort Key: bo.cl_order_id
                          Sort Method: external merge  Disk: 13032kB
"                          Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"
                          ->  Seq Scan on bbb bo  (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.013..361.825 rows=510968 loops=1)
                                Filter: (order_status = '1'::bpchar)
                                Rows Removed by Filter: 67455
                                Buffers: shared hit=1010 read=25927 dirtied=3
Planning time: 0.879 ms
Execution time: 2828.213 ms

没有进行字段排序no sort，但是却走了索引，这可能是ANY的一大特点，，所以从执行结果来看效果非常明显。所以我一直陷入一个怪圈，想着把ANY的多字符匹配给去掉，换成一种直接匹配的方式会好很多，可是多字符匹配哪里那么容易改变，然后就查资料查文档。突然我想到阿里的葵花宝典还是在哪里记得，我看过这么一句话，外键要加索引，但是看了文档却没有发现就不了了之了，之后又被其他的事情耽误了。

Limit  (cost=0.42..2866.61 rows=30 width=89) (actual time=27.068..400.315 rows=30 loops=1)
  Buffers: shared hit=36576
  ->  Nested Loop  (cost=0.42..2690142553.60 rows=28157379 width=89) (actual time=27.066..400.294 rows=30 loops=1)
        Buffers: shared hit=36576
        ->  Seq Scan on bbb bo  (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.012..0.063 rows=13 loops=1)
              Filter: (order_status = '1'::bpchar)
              Buffers: shared hit=1
        ->  Index Scan using ccc_order_id_cl_order_id_index on ccc fo  (cost=0.42..5260.72 rows=12 width=89) (actual time=25.727..30.779 rows=2 loops=13)
              Index Cond: (cl_order_id = bo.cl_order_id)
"              Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"
              Buffers: shared hit=36575
Planning time: 0.761 ms
Execution time: 400.389 ms

过几天我再来看的时候发现执行效率恢复了正常，细看发现，只是改动了几个索引，果然验证了那个猜想，外键加索引瞬间快到飞起。这是修改索引之后的执行结果


--- ADD
create index bbb_cl_order_id_order_id_order_status_index
    on bbb (cl_order_id, order_id, order_status);
--- UDATE
create index idx_ccc_trade_time_fund_account_symbol
    on ccc (trade_time desc, fund_account asc, symbol asc);
--- ADD
create index idx_ccc_cl_order_id_order_id
    on ccc (cl_order_id, order_id);
----


explain  ANALYZE select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side,
       fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchange
from ccc  fo   JOIN aaa sp ON fo.symbol = sp.symbol
                            JOIN bbb bo ON bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_id

where 1=1 AND bo.order_status = '1' and  fo.trade_time >= 1609430400000  and fo.trade_time <= 1702655999999
  and fo.broker_account =  any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
    )
ORDER BY fo.trade_time DESC
       , fo.fund_account, fo.symbol
LIMIT 30 OFFSET 0;


Limit  (cost=1001.16..5150.43 rows=30 width=106) (actual time=95.097..119.294 rows=30 loops=1)
  ->  Gather Merge  (cost=1001.16..258255.92 rows=1860 width=106) (actual time=95.094..119.286 rows=30 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=1.13..257041.20 rows=775 width=106) (actual time=4.812..21.327 rows=15 loops=3)
              ->  Nested Loop  (cost=0.85..256808.08 rows=775 width=90) (actual time=4.756..21.130 rows=15 loops=3)
                    ->  Parallel Index Scan using idx_ccc_trade_time_fund_account_symbol on ccc fo  (cost=0.42..95331.87 rows=230396 width=90) (actual time=4.706..20.921 rows=9 loops=3)
                          Index Cond: ((trade_time >= '1609430400000'::numeric) AND (trade_time <= '1702655999999'::numeric))
"                          Filter: ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[]))"
                          Rows Removed by Filter: 11298
                    ->  Index Scan using bbb_cl_order_id_order_id_order_status_index on bbb bo  (cost=0.42..0.69 rows=1 width=21) (actual time=0.018..0.020 rows=2 loops=26)
                          Index Cond: ((cl_order_id = fo.cl_order_id) AND ((order_id)::text = (fo.order_id)::text) AND (order_status = '1'::bpchar))
              ->  Index Scan using idx_aaa_symbol on aaa sp  (cost=0.28..0.30 rows=1 width=23) (actual time=0.011..0.011 rows=1 loops=46)
                    Index Cond: ((symbol)::text = (fo.symbol)::text)
Planning time: 1.990 ms
Execution time: 119.455 ms

总结

越来越觉得PG数据库，有点像半自动，很多东西需要自己去配置，不像mysql那么智能，放到上面基本没有什么SQL上面的问题。总的来说还是蛮有收获的，外键加索引，查看执行计划看有没有走索引，索引其实也是一把双刃剑，加快了查询的速度，却增加了新增修改的速度，就像达摩克之剑需要小心谨慎使用。

参考

PostgreSQL in 与 = any 的SQL语法异同与性能优化

Postgresql sql-explain

慢sql查询

有关Postgresql分析慢sql的更多相关文章

Hive SQL 五大经典面试题 - 2
目录第1题连续问题分析：解法：第2题分组问题分析：解法：第3题间隔连续问题分析：解法：第4题打折日期交叉问题分析：解法：第5题同时在线问题分析：解法：第1题连续问题如下数据为蚂蚁森林中用户领取的减少碳排放量iddtlowcarbon10012021-12-1212310022021-12-124510012021-12-134310012021-12-134510012021-12-132310022021-12-144510012021-12-1423010022021-12-154510012021-12-1523.......找出连续3天及以上减少碳排放量在100以上的用户分析：遇到这类
sql - 查询忽略时间戳日期的时间范围 - 2
我正在尝试查询我的Rails数据库(Postgres)中的购买表，我想查询时间范围。例如，我想知道在所有日期的下午2点到3点之间进行了多少次购买。此表中有一个created_at列，但我不知道如何在不搜索特定日期的情况下完成此操作。我试过:Purchases.where("created_atBETWEEN?and?",Time.now-1.hour,Time.now)但这最终只会搜索今天与那些时间的日期。最佳答案您需要使用PostgreSQL'sdate_part/extractfunction从created_at中提取小时
ruby-on-rails - 如何让 datamapper 与 postgresql 数据库一起工作？ - 2
我已经找到了几个使用datamapper的示例，并且能够让它们正常工作。不过，所有这些示例都是针对sqlite数据库的。我正在尝试将数据映射器与postgresql一起使用。我将datamapper中的调用从sqlite3更改为postgres，并且我已经安装了dm-postgres-adapter。但它仍然不起作用。我还需要做什么？最佳答案与SQLite不同，PostgreSQL不将数据库存储在单个文件中。在你拥有createdyourdatabase之后，尝试这样的事情:DataMapper.setup:default,{:
sql - 在 Rails Console for PostgreSQL 的表中显示数据 - 2
我找到了这样的东西:Rails:Howtolistdatabasetables/objectsusingtheRailsconsole?这一行没问题:ActiveRecord::Base.connection.tables并返回所有表但是ActiveRecord::Base.connection.table_structure("users")产生错误:ActiveRecord::Base.connection.table_structure("projects")我认为table_structure不是Postgres方法。如何列出Postgres数据库的Rails控制台中表中的所有
ruby - 防止SQL注入(inject)/好的Ruby方法 - 2
Ruby中防止SQL注入(inject)的好方法是什么？最佳答案直接使用ruby？使用准备好的语句:require'mysql'db=Mysql.new('localhost','user','password','database')statement=db.prepare"SELECT*FROMtableWHEREfield=?"statement.execute'value'statement.fetchstatement.close 关于ruby-防止SQL注入(inject
ruby-on-rails - Rails 迁移中的 PostgreSQL 点类型 - 2
我想使用PostgreSQL中的point类型。我已经完成了:railsgmodelTestpoint:point最终的迁移是:classCreateTests当我运行时:rakedb:migrate结果是:==CreateTests:migrating====================================================--create_table(:tests)rakeaborted!Anerrorhasoccurred,thisandalllatermigrationscanceled:undefinedmethod`point'for#/hom
ruby-on-rails - Ruby on Rails 单表继承(STI)和单元测试问题(使用 PostgreSQL) - 2
我正在使用带有单个“帐户”表的STI模型来保存用户和技术人员的信息(即用户...8)错误:test_the_truth(用户测试):ActiveRecord::StatementInvalid:PGError:ERROR:关系“技术人员”不存在:从“技术人员”中删除...从本质上讲，标准框架不承认Technicians和Users表(或PostgreSQL称它们为“关系”)不存在，事实上，应该别名为Accounts。有什么想法吗？我对RoR比较陌生，不知道如何解决这个问题而又不完全删除STI。最佳答案原来问题是由于存在:./te
ruby - rails 3.2.2(或 3.2.1)+ Postgresql 9.1.3 + Ubuntu 11.10 连接错误 - 2
我正在使用PostgreSQL9.1.3(x86_64-pc-linux-gnu上的PostgreSQL9.1.3，由gcc-4.6.real(Ubuntu/Linaro4.6.1-9ubuntu3)4.6.1，64位编译)和在ubuntu11.10上运行3.2.2或3.2.1。现在，我可以使用以下命令连接PostgreSQLsupostgres输入密码我可以看到postgres=#我将以下详细信息放在我的config/database.yml中并执行“railsdb”，它工作正常。开发:adapter:postgresqlencoding:utf8reconnect:falsedat
ruby-on-rails - 如何在 Rails 中的不同数据库上执行直接 SQL 代码 - 2
我正在编写一个Rails应用程序，它将监视某些特定数据库的数据质量。为了做到这一点，我需要能够对这些数据库执行直接SQL查询——这当然与用于驱动Rails应用程序模型的数据库不同。简而言之，这意味着我无法使用通过ActiveRecord基础连接的技巧。我需要连接的数据库在设计时是未知的(即:我不能将它们的详细信息放在database.yaml中)。相反，我有一个模型“database_details”，用户将使用它来输入应用程序将在运行时执行查询的数据库的详细信息。因此与这些数据库的连接实际上是动态的，细节仅在运行时解析。最佳答案
建模分析 | 平面2R机器人(二连杆)运动学与动力学建模(附Matlab仿真) - 2
目录0专栏介绍1平面2R机器人概述2运动学建模2.1正运动学模型2.2逆运动学模型2.3机器人运动学仿真3动力学建模3.1计算动能3.2势能计算与动力学方程3.3动力学仿真0专栏介绍?附C++/Python/Matlab全套代码?课程设计、毕业设计、创新竞赛必备！详细介绍全局规划(图搜索、采样法、智能算法等)；局部规划(DWA、APF等)；曲线优化(贝塞尔曲线、B样条曲线等)。?详情：图解自动驾驶中的运动规划(MotionPlanning)，附几十种规划算法1平面2R机器人概述如图1所示为本文的研究本体——平面2R机器人。对参数进行如下定义：机器人广义坐标

Postgresql分析慢sql

现象

分析

总结

参考

有关Postgresql分析慢sql的更多相关文章

随机推荐