位置:首页 > Spark手册 - debug >

Spark手册 - debug

作者:小牛君|发布时间:2017-06-16


1.  debug

1.1.toDebugString

查看RDD依赖关系的debug调试信息

val a = sc.parallelize(1 to 9, 3)
 
val b = sc.parallelize(1 to 3, 3)
 
val c = a.subtract(b)
c.toDebugString
 

res59: String =
  (3) MapPartitionsRDD[119] at subtract at <console>:28 []
   |    SubtractedRDD[118] at subtract at <console>:28 []
   +-(3) MapPartitionsRDD[116] at   subtract at <console>:28 []
   |    |  ParallelCollectionRDD[114] at   parallelize at <console>:24 []
   +-(3) MapPartitionsRDD[117] at   subtract at <console>:28 []
      |    ParallelCollectionRDD[115] at parallelize at <console>:24 []

 

1.2.   dependencies

val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
b.dependencies
 

res8:   Seq[org.apache.spark.Dependency[_]] = List()

b.map(a => a).dependencies
 

res9:Seq[org.apache.spark.Dependency[_]]=List(org.apache.spark.OneToOneDependency@226ea84)

b.cartesian(a).dependencies
 

res10:Seq[org.apache.spark.Dependency[_]]   = List(org.apache.spark.rdd.CartesianRDD$$anon$1@15bf089, org.apache.spark.rdd.CartesianRDD$$anon$2@32f1f5da)

 

scala> val   rdd=sc.textFile("/hdfs/wordcount/in/words.txt").flatMap(_.split("\\s+

")).map((_,1)).reduceByKey(_+_)

scala> rdd.dependencies

res11:Seq[org.apache.spark.Dependency[_]]=List(org.apache.spark.ShuffleDependency@3cd79d17)

 

 



加入千人QQ群一起学习大数据:Spark大数据交流学习群613807316

分享到: