前言

Github:https://github.com/HealerJean

博客:http://blog.healerjean.com

1、mapper

mapper configuration是用 foreach循环做的,差不多是这样

1.1、代码

<insert id="batchInsert" parameterType="java.util.List">
    insert into USER (id, name) values
    <foreach collection="list" item="model" index="index" separator=","> 
        (#{model.id}, #{model.name})
    </foreach>
</insert>

1.2、解释

这个方法提升批量插入速度的原理是,将传统的:

INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");

转化为:

INSERT INTO `table1` (`field1`, `field2`) 
VALUES ("data1", "data2"),
("data1", "data2"),
("data1", "data2"),
("data1", "data2"),
("data1", "data2");

2、存在的问题

乍看上去这个foreach 没有问题,但是经过项目实践发现,当表的列数较多(20+),以及一次性插入的行数较多(5000+)时,整个插入的耗时十分漫长,达到了14分钟,这是不能忍的。在资料中也提到了一句话:

Of course don't combine ALL of them, if the amount is HUGE. Say you have 1000 rows you need to insert, then don't do it one at a time. You shouldn't equally try to have all 1000 rows in a single query. Instead break it into smaller sizes.

它强调,当插入数量很多时,不能一次性全放在一条语句里。可是为什么不能放在同一条语句里呢?这条语句为什么会耗时这么久呢?我查阅了资料发现:


Insert inside Mybatis foreach is not batch, this is a single (could become giant) SQL statement and that brings drawbacks:

Mybatis foreach内部的Insert不是批处理,这是一个单一的(可能会成为巨大的)SQL语句,这会带来缺点:


some database such as Oracle here does not support.
in relevant cases: there will be a large number of records to insert and the database configured limit (by default around 2000 parameters per statement) will be hit, and eventually possibly DB stack error if the statement itself become too large.

有些数据库(如Oracle)不支持。在相关情况下:将有大量记录要插入,并且将达到数据库配置的限制(默认情况下,每个语句大约有2000个参数),如果语句本身变得太大,最终可能出现数据库堆栈错误。

Iteration over the collection must not be done in the mybatis XML. Just execute a simple Insertstatement in a Java Foreach loop. The most important thing is the session Executor type.

不得在mybatis XML中对集合进行迭代。只需在Javaforeach循环中执行一个简单的Insertstatement。最重要的是会话执行器类型。



Unlike default ExecutorType.SIMPLE, the statement will be prepared once and executed for each record to insert.

与默认类型不同。简单地说,该语句将准备一次,并为每个要插入的记录执行一次。

从资料中可知,默认执行器类型为·Simple·,会为每个语句创建一个新的预处理语句,也就是创建一个·PreparedStatement对象。在我们的项目中,会不停地使用批量插入这个方法,而因为 MyBatis对于含有 <foreach>的语句,无法采用缓存,那么在每次调用方法时,都会重新解析 sql 语句,如果是SQL很大的话,肯定不合理。

image-20211217145259544

所以,如果非要使用 foreach 的方式来进行批量插入的话,可以考虑减少一条 insert 语句中 values 的个数,最好能达到上面曲线的最底部的值,使速度最快。一般按经验来说,一次性插20~50行数量是比较合适的,时间消耗也能接受

2.1、推荐的方案(MI)

重点来了。上面讲的是,如果非要用<foreach>的方式来插入,可以提升性能的方式。而实际上,MyBatis文档中写批量插入的时候,是推荐使用另外一种方法。(可以看 http://www.mybatis.org/mybatis-dynamic-sql/docs/insert.html 中 Batch Insert Support 标题里的内容)

SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
try {
    SimpleTableMapper mapper = session.getMapper(SimpleTableMapper.class);
    List<SimpleTableRecord> records = getRecordsToInsert(); // not shown
 
    BatchInsert<SimpleTableRecord> batchInsert = insert(records)
            .into(simpleTable)
            .map(id).toProperty("id")
            .map(firstName).toProperty("firstName")
            .map(lastName).toProperty("lastName")
            .map(birthDate).toProperty("birthDate")
            .map(employed).toProperty("employed")
            .map(occupation).toProperty("occupation")
            .build()
            .render(RenderingStrategy.MYBATIS3);
 
    batchInsert.insertStatements().stream().forEach(mapper::insert);
 
    session.commit();
} finally {
    session.close();
}

即基本思想是将 MyBatis sessionexecutor type 设为 Batch ,然后多次执行插入语句。就类似于JDBC的下面语句一样。

Connection connection = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/mydb?useUnicode=true&characterEncoding=UTF-8&useServerPrepStmts=false&rewriteBatchedStatements=true","root","root");
connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(
        "insert into tb_user (name) values(?)");
for (int i = 0; i < stuNum; i++) {
    ps.setString(1,name);
    ps.addBatch();
}
ps.executeBatch();
connection.commit();
connection.close();

经过试验,5000+使用了 ExecutorType.BATCH 的插入方式,性能显著提升,不到 2s 便能全部插入完成。

总结一下,如果MyBatis需要进行批量插入,推荐使用 ExecutorType.BATCH 的插入方式,如果非要使用 <foreach>的插入的话,需要将每次插入的记录控制在 20~50 左右。

3、批量插入对比

1、使用 SqlSessionFactory,每一批数据执行一次提交

2、使用 mybatis-plus框架的 insert 方法,for 循环,每次执行一次插入

3、使用 ibatis,纯 sql插入

sql 插入的效率最高,sqlsession 次之,mybatis 框架 foreach 插入效率最低。

image-20211217145721330

4、批量插入工具类

Mybatis 内置的 ExecutorType3 种,

⬤ 默认的是 simple,该模式下它为每个语句的执行创建一个新的[预处理]语句,单条提交sql

batch模式重复使用已经预处理的语句,并且批量执行所有更新语句,显然batch性能将更优

dbBatchUtil.batchInsertOrUpdate(
  venderSignOutWhiteConfigs, 
  VenderSignOutWhiteConfigMapper.class, 
  VenderSignOutWhiteConfigMapper::insertOrUpdate);

@Slf4j
@Service
public class DbBatchUtil {

  /**
     * SqlSessionFactory 工厂类
     */
  @Resource
  private SqlSessionFactory batchSqlSessionFactory;

  /**
     * 默认批量大小 1000条
     */
  private final static int BATCH_DEAL_NUM = 1000;


  /**
     * 批量插入记录
     */
  public <M, T> int batchInsert(List<T> list, Class<M> clazz, BiConsumer<M, T> biConsumer) {
    SqlSession batchSession = batchSqlSessionFactory.openSession(ExecutorType.BATCH, false);
    M mapper = batchSession.getMapper(clazz);
    int i = 0;
    try {
      for (int cnt = list.size(); i < cnt; i++) {
        biConsumer.accept(mapper, list.get(i));
        if ((i + 1) % BATCH_DEAL_NUM == 0) {
          batchSession.commit();
        }
      }
      batchSession.commit();
      batchSession.clearCache();
    } catch (Throwable ex) {
      batchSession.rollback();
      throw new RuntimeException(ex);
    } finally {
      batchSession.close();
    }
    return i;
  }


  /**
     * 批量更新记录
     */
  public <M, T> int batchUpdate(List<T> list, Class<M> clazz, BiConsumer<M, T> biConsumer) {
    SqlSession batchSession = batchSqlSessionFactory.openSession(ExecutorType.BATCH, false);
    M mapper = batchSession.getMapper(clazz);
    int i = 0;
    try {
      for (int cnt = list.size(); i < cnt; i++) {
        biConsumer.accept(mapper, list.get(i));
        if ((i + 1) % BATCH_DEAL_NUM == 0) {
          batchSession.commit();
        }
      }
      batchSession.commit();
      batchSession.clearCache();
    } catch (Throwable ex) {
      batchSession.rollback();
      throw new RuntimeException(ex);
    } finally {
      batchSession.close();
    }
    return i;
  }


  /**
     * 批量插入或更新记录
     */
  public <M, T> int batchInsertOrUpdate(List<T> list, Class<M> clazz, BiConsumer<M, T> biConsumerInsertOrUpdate) {
    SqlSession batchSession = batchSqlSessionFactory.openSession(ExecutorType.BATCH, false);
    M mapper = batchSession.getMapper(clazz);
    int i = 0;
    try {
      for (int cnt = list.size(); i < cnt; i++) {
        biConsumerInsertOrUpdate.accept(mapper, list.get(i));
        if ((i + 1) % BATCH_DEAL_NUM == 0) {
          batchSession.commit();
        }
      }
      batchSession.commit();
      batchSession.clearCache();
    } catch (Throwable ex) {
      batchSession.rollback();
      throw new RuntimeException(ex);
    } finally {
      batchSession.close();
    }
    return i;
  }


}

ContactAuthor