背景
  Spark在对目标数据进行计算后,RDD格式的数据一般都会存在HDFS,Hive,HBase中,另一方面,对于非RDD格式的数据,可能会存放在像Mysql中这种传统的RDMS中.
  但是写入过程中经常出现各种各样的问题, stackoverflow上有很多帖子:
  Error writing spark dataframe to mysql table JDBC batch insert performance
  还有些其他的贴
  - Using Apache Spark and MySQL for Data Analysis
  - spark 1.3.0 将dataframe数据写入Hive分区表
  - Spark读取数据库(Mysql)的四种方式详解
  - 完整java开发中JDBC连接数据库代码和步骤
  - Spark踩坑记——数据库(Hbase+Mysql)
  RDD
  Spark SQL通过JDBC连接MySQL读写数据
  非RDD
  import java.sql.{Date, DriverManager, PreparedStatement, Connection}
  /*
  tableName = "tempTableName"
  columns = [key : String, value : Int]
  DBIP = 10.10.10.10
  DBPort = 8888
  DB = tempDB
  */
  def connection2Mysql() = {
  var conn : Connection = null
  var ps : PreparedStatement = nulll
  val userName = "admin"
  val passwd = "admin"
  val key = "Tom"
  val value = 1024
  val sql = "INSERT INTO tempTableName(key,value) values (?,?)"
  try {
  Class.forName("com.mysql.jdbc.Driver").newInstance
  conn = DriverManager.getConnection("jdbc:mysql://10.10.10.10:8888/tempDB", userName, passwd)
  ps = conn.prepareStatement(sql)
  ps.setDate(1, key)
  ps.setLong(2, value)
  ps.executeUpdate()
  } catch {
  case e: Exception => println("----> Exception! : " + e + "<-----")
  } finally {
  if (ps != null) ps.close()
  if (conn != null) conn.close()
  }
  }
  这里会遇到的一个问题是,在本地启动client进行功能检查,方法是可行的,但是通过submit提交给YARN之后,却被报错
  java.sql.SQLException: No suitable driver found for jdbc:mysql://10.10.10.10:8888/tempDB
  或是
  Error:java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
  发生这种情况的原因是因为在上述代码中,没有找到对应的类,即Class.forName("com.mysql.jdbc.Driver").newInstance 这行代码出现了问题. 这行代码的目的是对driver进行注册,查看Driver源码,如下所示:
  /**
  * The Java SQL framework allows for multiple database drivers. Each driver should supply a class that implements the Driver interface
  *
  * The DriverManager will try to load as many drivers as it can find and then for any given connection request, it will ask each driver in turn to try to
  * connect to the target URL.
  *
  * It is strongly recommended that each Driver class should be small and standalone so that the Driver class can be loaded and queried without bringing in vast
  * quantities of supporting code.
  *
  * When a Driver class is loaded, it should create an instance of itself and register it with the DriverManager. This means that a user can load and register a
  * driver by doing Class.forName("foo.bah.Driver")
  */
  public class Driver extends NonRegisteringDriver implements java.sql.Driver {
  // Register ourselves with the DriverManager
  static {
  try {
  java.sql.DriverManager.registerDriver(new Driver());
  } catch (SQLException E) {
  throw new RuntimeException("Can't register driver!");
  }
  }
  /**
  * Construct a new driver and register it with DriverManager
  * @throws SQLException if a database error occurs.
  */
  public Driver() throws SQLException {
  // Required for Class.forName().newInstance()
  }
  }
  出现这个问题有多种可能,
  1. 在–jars参数里面加入Mysql jar包引用是没有用的. 需要通过加入–driver-class-path参数来设置driver的classpath.
  $  bin/spark-submit --master local[2] --driver-class-path lib/mysql-connector-java-5.1.38.jar --class temp.jar
  原因是两者分发的node不同, link
  –driver-class-path driver所依赖的包,多个包之间用冒号(:)分割
  –jars driver和executor都需要的包,多个包之间用逗号(,)分割
  2. 使用依赖结果打包的时候没有将对应jar包导入.
  使用依赖的时候需要将对应jar包打入终的jar包中,这样才能正确的找到对应的类名并成功注册.
  例如,在build.sbt中添加依赖后
  // https://mvnrepository.com/artifact/mysql/mysql-connector-java
  libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.38"
  需要在Artifacts中添加对应的jar包.