MySQL索引原理及慢查询优化

作者：网络转载发布时间：[ 2015/8/11 11:02:30 ] 推荐标签：数据库测试开发技术

　　rows返回的都非常少，看不到有什么异常情况。我们在看一下语句，发现后面有order by + limit组合，会不会是排序量太大搞的？于是我们简化SQL，去掉后面的order by 和 limit，看看到底用了多少记录来排序
　　select
　　count(*)
　　from
　　contact c
　　inner join
　　contact_branch cb
　　on c.id = cb.contact_id
　　inner join
　　branch_user bu
　　on cb.branch_id = bu.branch_id
　　and bu.status in (
　　1，
　　2)
　　inner join
　　org_emp_info oei
　　on oei.data_id = bu.user_id
　　and oei.node_left >= 2875
　　and oei.node_right <= 10802
　　and oei.org_category = - 1
　　+----------+
　　| count(*) |
　　+----------+
　　|   778878 |
　　+----------+
　　1 row in set (5.19 sec)
　　发现排序之前居然锁定了778878条记录，如果针对70万的结果集排序，将是灾难性的，怪不得这么慢，那我们能不能换个思路，先根据contact的created_time排序，再来join会不会比较快呢？
　　于是改造成下面的语句，也可以用straight_join来优化
　　select
　　c.id，
　　c.name，
　　c.position，
　　c.sex，
　　c.phone，
　　c.office_phone，
　　c.feature_info，
　　c.birthday，
　　c.creator_id，
　　c.is_keyperson，
　　c.giveup_reason，
　　c.status，
　　c.data_source，
　　from_unixtime(c.created_time) as created_time，
　　from_unixtime(c.last_modified) as last_modified，
　　c.last_modified_user_id
　　from
　　contact c
　　where
　　exists (
　　select
　　1
　　from
　　contact_branch cb
　　inner join
　　branch_user bu
　　on cb.branch_id = bu.branch_id
　　and bu.status in (
　　1，
　　2)
　　inner join
　　org_emp_info oei
　　on oei.data_id = bu.user_id
　　and oei.node_left >= 2875
　　and oei.node_right <= 10802
　　and oei.org_category = - 1
　　where
　　c.id = cb.contact_id
　　)
　　order by
　　c.created_time desc limit 0 ，
　　10;
　　验证一下效果预计在1ms内，提升了13000多倍！
　　10 rows in set (0.00 sec)
　　本以为至此大工告成，但我们在前面的分析中漏了一个细节，先排序再join和先join再排序理论上开销是一样的，为何提升这么多是因为有一个limit！大致执行过程是：mysql先按索引排序得到前10条记录，然后再去join过滤，当发现不够10条的时候，再次去10条，再次join，这显然在内层join过滤的数据非常多的时候，将是灾难的，极端情况，内层一条数据都找不到，mysql还傻乎乎的每次取10条，几乎遍历了这个数据表！
　　用不同参数的SQL试验下
　　select
　　sql_no_cache   c.id，
　　c.name，
　　c.position，
　　c.sex，
　　c.phone，
　　c.office_phone，
　　c.feature_info，
　　c.birthday，
　　c.creator_id，
　　c.is_keyperson，
　　c.giveup_reason，
　　c.status，
　　c.data_source，
　　from_unixtime(c.created_time) as created_time，
　　from_unixtime(c.last_modified) as last_modified，
　　c.last_modified_user_id
　　from
　　contact c
　　where
　　exists (
　　select
　　1
　　from
　　contact_branch cb
　　inner join
　　branch_user bu
　　on cb.branch_id = bu.branch_id
　　and bu.status in (
　　1，
　　2)
　　inner join
　　org_emp_info oei
　　on oei.data_id = bu.user_id
　　and oei.node_left >= 2875
　　and oei.node_right <= 2875
　　and oei.org_category = - 1
　　where
　　c.id = cb.contact_id
　　)
　　order by
　　c.created_time desc limit 0 ，
　　10;
　　Empty set (2 min 18.99 sec)
　　2 min 18.99 sec！比之前的情况还糟糕很多。由于mysql的nested loop机制，遇到这种情况，基本是无法优化的。这条语句终也只能交给应用系统去优化自己的逻辑了。
　　通过这个例子我们可以看到，并不是所有语句都能优化，而往往我们优化时，由于SQL用例回归时落掉一些极端情况，会造成比原来还严重的后果。所以，第一：不要指望所有语句都能通过SQL优化，第二：不要过于自信，只针对具体case来优化，而忽略了更复杂的情况。
　　慢查询的案例分析到这儿，以上只是一些比较典型的案例。我们在优化过程中遇到过超过1000行，涉及到16个表join的“垃圾SQL”，也遇到过线上线下数据库差异导致应用直接被慢查询拖死，也遇到过varchar等值比较没有写单引号，还遇到过笛卡尔积查询直接把从库搞死。再多的案例其实也只是一些经验的积累，如果我们熟悉查询优化器、索引的内部原理，那么分析这些案例变得特别简单了。
　　写在后面的话
　　本文以一个慢查询案例引入了MySQL索引原理、优化慢查询的一些方法论;并针对遇到的典型案例做了详细的分析。其实做了这么长时间的语句优化后才发现，任何数据库层面的优化都抵不上应用系统的优化，同样是MySQL，可以用来支撑Google/FaceBook/Taobao应用，但可能连你的个人网站都撑不住。套用近比较流行的话：“查询容易，优化不易，且写且珍惜！”
　　参考
　　参考文献如下：
　　1.《高性能MySQL》
　　2.《数据结构与算法分析》

本文内容不用于商业目的，如涉及知识产权问题，请权利人联系SPASVO小编(021-61079698-8054)，我们将立即处理，马上删除。