MySQL索引原理及慢查询优化

作者：网络转载发布时间：[ 2015/8/11 11:02:30 ] 推荐标签：数据库测试开发技术

　　优化过的语句如下
　　select
　　emp.id
　　from
　　cm_log cl
　　inner join
　　employee emp
　　on cl.ref_table = 'Employee'
　　and cl.ref_oid = emp.id
　　where
　　cl.last_upd_date >='2013-11-07 15:03:00'
　　and cl.last_upd_date<='2013-11-08 16:00:00'
　　and emp.is_deleted = 0
　　union
　　select
　　emp.id
　　from
　　cm_log cl
　　inner join
　　emp_certificate ec
　　on cl.ref_table = 'EmpCertificate'
　　and cl.ref_oid = ec.id
　　inner join
　　employee emp
　　on emp.id = ec.emp_id
　　where
　　cl.last_upd_date >='2013-11-07 15:03:00'
　　and cl.last_upd_date<='2013-11-08 16:00:00'
　　and emp.is_deleted = 0
　　4.不需要了解业务场景，只需要改造的语句和改造之前的语句保持结果一致
　　5.现有索引可以满足，不需要建索引
　　6.用改造后的语句实验一下，只需要10ms 降低了近200倍！
　　+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+
　　| id | select_type | table      | type   | possible_keys                   | key               | key_len | ref                   | rows | Extra       |
　　+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+
　　| 1 | PRIMARY      | cl         | range | cm_log_cls_id，idx_last_upd_date | idx_last_upd_date | 8       | NULL                  | 379 | Using where |
　　| 1 | PRIMARY      | emp        | eq_ref | PRIMARY                         | PRIMARY           | 4       | meituanorg.cl.ref_oid |    1 | Using where |
　　| 2 | UNION        | cl         | range | cm_log_cls_id，idx_last_upd_date | idx_last_upd_date | 8       | NULL                  | 379 | Using where |
　　| 2 | UNION        | ec         | eq_ref | PRIMARY，emp_certificate_empid   | PRIMARY           | 4       | meituanorg.cl.ref_oid |    1 |             |
　　| 2 | UNION        | emp        | eq_ref | PRIMARY                         | PRIMARY           | 4       | meituanorg.ec.emp_id |    1 | Using where |
　　| NULL | UNION RESULT | <union1，2> | ALL    | NULL                            | NULL              | NULL    | NULL                  | NULL |             |
　　+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+
　　明确应用场景
　　举这个例子的目的在于颠覆我们对列的区分度的认知，一般上我们认为区分度越高的列，越容易锁定更少的记录，但在一些特殊的情况下，这种理论是有局限性的
　　select
　　*
　　from
　　stage_poi sp
　　where
　　sp.accurate_result=1
　　and (
　　sp.sync_status=0
　　or sp.sync_status=2
　　or sp.sync_status=4
　　);
　　0.先看看运行多长时间，951条数据6.22秒，真的很慢
　　951 rows in set (6.22 sec)
　　1.先explain，rows达到了361万，type = ALL表明是全表扫描
　　+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
　　| id | select_type | table | type | possible_keys | key | key_len | ref | rows    | Extra       |
　　+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
　　| 1 | SIMPLE      | sp    | ALL | NULL          | NULL | NULL    | NULL | 3613155 | Using where |
　　+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
　　2.所有字段都应用查询返回记录数，因为是单表查询 0已经做过了951条
　　3.让explain的rows 尽量逼近951
　　看一下accurate_result = 1的记录数
　　select count(*)，accurate_result from stage_poi group by accurate_result;
　　+----------+-----------------+
　　| count(*) | accurate_result |
　　+----------+-----------------+
　　|     1023 |              -1 |
　　| 2114655 |               0 |
　　|   972815 |               1 |
　　+----------+-----------------+
　　我们看到accurate_result这个字段的区分度非常低，整个表只有-1，0，1三个值，加上索引也无法锁定特别少量的数据
　　再看一下sync_status字段的情况
　　select count(*)，sync_status from stage_poi group by sync_status;
　　+----------+-------------+
　　| count(*) | sync_status |
　　+----------+-------------+
　　|     3080 |           0 |
　　| 3085413 |           3 |
　　+----------+-------------+
　　同样的区分度也很低，根据理论，也不适合建立索引
　　问题分析到这，好像得出了这个表无法优化的结论，两个列的区分度都很低，即便加上索引也只能适应这种情况，很难做普遍性的优化，比如当sync_status 0、3分布的很平均，那么锁定记录也是百万级别的
　　4.找业务方去沟通，看看使用场景。业务方是这么来使用这个SQL语句的，每隔五分钟会扫描符合条件的数据，处理完成后把sync_status这个字段变成1，五分钟符合条件的记录数并不会太多，1000个左右。了解了业务方的使用场景后，优化这个SQL变得简单了，因为业务方保证了数据的不平衡，如果加上索引可以过滤掉绝大部分不需要的数据
　　5.根据建立索引规则，使用如下语句建立索引
　　alter table stage_poi add index idx_acc_status(accurate_result，sync_status);
　　6.观察预期结果，发现只需要200ms，快了30多倍。
　　952 rows in set (0.20 sec)
　　我们再来回顾一下分析问题的过程，单表查询相对来说比较好优化，大部分时候只需要把where条件里面的字段依照规则加上索引好，如果只是这种“无脑”优化的话，显然一些区分度非常低的列，不应该加索引的列也会被加上索引，这样会对插入、更新性能造成严重的影响，同时也有可能影响其它的查询语句。所以我们第4步调差SQL的使用场景非常关键，我们只有知道这个业务场景，才能更好地辅助我们更好的分析和优化查询语句。
　　无法优化的语句
　　select
　　c.id，
　　c.name，
　　c.position，
　　c.sex，
　　c.phone，
　　c.office_phone，
　　c.feature_info，
　　c.birthday，
　　c.creator_id，
　　c.is_keyperson，
　　c.giveup_reason，
　　c.status，
　　c.data_source，
　　from_unixtime(c.created_time) as created_time，
　　from_unixtime(c.last_modified) as last_modified，
　　c.last_modified_user_id
　　from
　　contact c
　　inner join
　　contact_branch cb
　　on c.id = cb.contact_id
　　inner join
　　branch_user bu
　　on cb.branch_id = bu.branch_id
　　and bu.status in (
　　1，
　　2)
　　inner join
　　org_emp_info oei
　　on oei.data_id = bu.user_id
　　and oei.node_left >= 2875
　　and oei.node_right <= 10802
　　and oei.org_category = - 1
　　order by
　　c.created_time desc limit 0 ，
　　10;
　　还是几个步骤
　　0.先看语句运行多长时间，10条记录用了13秒，已经不可忍受
　　10 rows in set (13.06 sec)
　　1.explain
　　+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+
　　| id | select_type | table | type   | possible_keys                       | key                     | key_len | ref                      | rows | Extra                                        |
　　+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+
　　| 1 | SIMPLE      | oei   | ref    | idx_category_left_right，idx_data_id | idx_category_left_right | 5       | const                    | 8849 | Using where; Using temporary; Using filesort |
　　| 1 | SIMPLE      | bu    | ref    | PRIMARY，idx_userid_status           | idx_userid_status       | 4       | meituancrm.oei.data_id   |   76 | Using where; Using index                     |
　　| 1 | SIMPLE      | cb    | ref    | idx_branch_id，idx_contact_branch_id | idx_branch_id           | 4       | meituancrm.bu.branch_id |    1 |                                              |
　　| 1 | SIMPLE      | c     | eq_ref | PRIMARY                             | PRIMARY                 | 108     | meituancrm.cb.contact_id |    1 |                                              |
　　+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+
　　从执行计划上看，mysql先查org_emp_info表扫描8849记录，再用索引idx_userid_status关联branch_user表，再用索引idx_branch_id关联contact_branch表，后主键关联contact表。

本文内容不用于商业目的，如涉及知识产权问题，请权利人联系SPASVO小编(021-61079698-8054)，我们将立即处理，马上删除。