位置:首页 > Spark手册 - debug >

Spark手册 - debug

作者:小牛君|发布时间:2017-06-16

小牛学堂的课程大纲最近进行了再一次升级,并且同时推出Java大数据平台开发班、Python爬虫与数据挖掘班、Spark项目班、Spark大神班、机器学习算法实战班、BI数据分析实战班, 目前这类人群凤毛麟角,导致这个行业的平均薪资极高,为此小牛学堂集合了行业的诸多大牛开设对应班级,为想学习的同学提供机会!
如果想了解详细情况,请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:Spark大数据交流学习群613807316

以下是本文正文:


1.  debug

1.1.toDebugString

查看RDD依赖关系的debug调试信息

val a = sc.parallelize(1 to 9, 3)
 
val b = sc.parallelize(1 to 3, 3)
 
val c = a.subtract(b)
c.toDebugString
 

res59: String =
  (3) MapPartitionsRDD[119] at subtract at <console>:28 []
   |    SubtractedRDD[118] at subtract at <console>:28 []
   +-(3) MapPartitionsRDD[116] at   subtract at <console>:28 []
   |    |  ParallelCollectionRDD[114] at   parallelize at <console>:24 []
   +-(3) MapPartitionsRDD[117] at   subtract at <console>:28 []
      |    ParallelCollectionRDD[115] at parallelize at <console>:24 []

 

1.2.   dependencies

val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
b.dependencies
 

res8:   Seq[org.apache.spark.Dependency[_]] = List()

b.map(a => a).dependencies
 

res9:Seq[org.apache.spark.Dependency[_]]=List(org.apache.spark.OneToOneDependency@226ea84)

b.cartesian(a).dependencies
 

res10:Seq[org.apache.spark.Dependency[_]]   = List(org.apache.spark.rdd.CartesianRDD$$anon$1@15bf089, org.apache.spark.rdd.CartesianRDD$$anon$2@32f1f5da)

 

scala> val   rdd=sc.textFile("/hdfs/wordcount/in/words.txt").flatMap(_.split("\\s+

")).map((_,1)).reduceByKey(_+_)

scala> rdd.dependencies

res11:Seq[org.apache.spark.Dependency[_]]=List(org.apache.spark.ShuffleDependency@3cd79d17)

 

 



了解更多详情请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:Spark大数据交流学习群613807316

分享到: