简述

最近在一个群里看到join和in的争论，到底什么时候用in，什么时候用join，我做了实验来验证

预备知识

join type介绍

join type并非指join类型，而是查询类型 explain sql后的type字段正是join type Mysql中有以下join type，性能从高到底为： * system : 系统常量表 * const : 常量表 * eq_ref : 主键或者唯一索引 * ref : 非空索引 * ref_or_null : 可空索引 * range :索引列使用 BETWEEN, IN, >=, LIKE等范围查询 * index : 循序扫描索引列 * ALL : 顺序扫描整张表

详见：https://dev.mysql.com/doc/internals/en/optimizer-determining-join-type.html

in和or等效

在Mysql中，这两个语句是等效的，in查询会被转换为or

column1 IN (1,2,3)
column1 = 1 OR column1 = 2 OR column1 = 3

详见：https://dev.mysql.com/doc/internals/en/optimizer-range-join-type.html

Mysql的Join算法（NLJ）

Mysql的join使用的NLJ算法（内循环join算法）假设三个表之间的连接 t1，t2以及 t3是使用以下类型的连接来执行：

Table   Join Type
t1      range
t2      ref
t3      ALL

如果使用简单的NLJ算法，则会按如下方式处理连接：

for each row in t1 matching range {
  for each row in t2 matching reference key {
    for each row in t3 {
      if row satisfies join conditions, send to client
    }
  }
}

Mysql使用Block Nested-Loop (BNL)进行处理，也就是join_buffer缓存联合数据，join_buffer_size 控制此项

for each row in t1 matching range {
  for each row in t2 matching reference key {
    store used columns from t1, t2 in join buffer
    if buffer is full {
      for each row in t3 {
        for each t1, t2 combination in join buffer {
          if row satisfies join conditions, send to client
        }
      }
      empty join buffer
    }
  }
}

if buffer is not empty {
  for each row in t3 {
    for each t1, t2 combination in join buffer {
      if row satisfies join conditions, send to client
    }
  }
}

join_buffer_size 详见：https://dev.mysql.com/doc/refman/8.0/en/nested-loop-joins.html

join优化： https://dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html

实测数据

使用Mysql5.7测试

tag_count:3000,post_count:300000,tag_post_count:300000

多对多，使用where in ，子查询
sql:select * from posts where id in (select post_id from tag_post where tag_id in (select id from tags where type = "3" ) )
total:57653
cost:414.41ms

多对多，拆分成3个查询，分别执行
total tag_id:573
total post_id:57653
total:57653
cost:446.82ms

多对多，使用join
sql:select * from posts left join tag_post on posts.id = tag_post.post_id left join tags on tags.id = tag_post.tag_id where tags.type = "3"
total:57653
cost:295.14ms

一对多，使用子查询
sql:select * from posts where tag_id in (select id from tags where type=3)
total:57100
cost:260.81ms

一对多，使用in，并且拆分成两个查询
total tag_id:573
total:57100
cost:219.88ms

一对多，使用join
sql:select * from posts left join tags on tags.id = posts.tag_id where tags.type = "3"
total:57100
cost:223.49ms

结论

in不慢，即使是上万的id。in会被mysql转换为or。
in子查询比join慢，子查询有时无法正常使用索引
表记录越多，join越慢，如果是特别大的表，用in会快很多
拆分查询很有帮助，不过要注意IO固定开销

GargantuaX

简述

预备知识

join type介绍

in和or等效

Mysql的Join算法（NLJ）

实测数据

结论