首家大数据培训挂牌机构 股票代码:837906 | EN CN
【小牛原创】Spark SQL 从入门到实战 -- spark sql 1.6版本相关api
【小牛原创】Spark SQL 从入门到实战 -- 概述
Spark Streaming:大规模流式数据处理
spark RDD 相关需求
spark RDD 高级应用
Spark手册 - load&save
Spark手册 - debug
Spark手册 - cache&checkpoint
Spark手册 - RDD Action API
Spark手册 - Partitioner源码
Spark手册 - RDD Transformation API
Spark手册 - RDD的依赖关系
Spark手册 - RDD入门
Spark手册 - 远程debug
Spark手册 - 在IDEA中编写WordCount程序(3)
Spark手册 - 在IDEA中编写WordCount程序(2)
Spark手册 - 在IDEA中编写WordCount程序(1)
Spark手册 - 执行Spark程序
Spark手册 - 集群安装
20页PPT|视频类网站大数据生态 Spark在爱奇艺的应用实践
Spark机器学习入门实例——大数据集(30+g)二分类
Spark官方文档中文翻译:Spark SQL 之 Data Sources
使用Spark MLlib来训练并服务于自然语言处理模型
Spark知识体系完整解读
案例 :Spark应用案例现场分享(IBM Datapalooza)
最全的Spark基础知识解答
Spark在GrowingIO数据无埋点全量采集场景下的实践
Apache Spark探秘:三种分布式部署方式比较
Apache Spark探秘:多进程模型还是多线程模型?
Apache Spark探秘:实现Map-side Join和Reduce-side Join
Apache Spark探秘:利用Intellij IDEA构建开发环境
spark on yarn的技术挑战
Apache Spark学习:将Spark部署到Hadoop 2.2.0上
Hadoop与Spark常用配置参数总结
基于Spark Mllib,SparkSQL的电影推荐系统
spark作业调优秘籍,解数据倾斜之痛
Spark入门必学:预测泰坦尼克号上的生还情况
小牛学堂浅谈基于Spark大数据平台日志审计系统的设计与实现
【Hadoop Summit Tokyo 2016】使用基于Lambda架构的Spark的近实时的网络异常检测和流量分析
Spark编程环境搭建经验分享
Spark技术在京东智能供应链预测的应用
spark中textFile、groupByKey、collect、flatMap、map结合小案例
Spark中DataFrame的schema讲解
深度剖析Spark分布式执行原理
【Spark Summit East 2017】从容器化Spark负载中获取的经验
内存分析技术哪家强?Spark占几何
Spark系列之一:Spark,一种快速数据分析替代方案
6种最常见的Hadoop和Spark项目
Hadoop vs Spark
Hadoop与Spark常用配置参数总结
Spark RPC通信层设计原理分析
Spark Standalone架构设计要点分析
Spark UnifiedMemoryManager内存管理模型分析
网易的Spark技术分享

Spark手册 - 在IDEA中编写WordCount程序(1)

于2017-06-16由小牛君创建

分享到:


1.1.   在IDEA中编写WordCount程序

spark shell仅在测试和验证我们的程序时使用的较多,在生产环境中,通常会在IDE中编制程序,然后打成jar包,然后提交到集群,最常用的是创建一个Maven项目,利用Maven来管理jar包的依赖。

1.1.1.   创建项目

1.创建一个项目

image.png

image.png

 

2.选择Maven项目,然后点击next

image.png

 

3.填写mavenGAV,然后点击next

image.png

 

4.填写项目名称,然后点击finish

image.png

 

5.创建好maven项目后,点击Enable Auto-Import

image.png

6.配置Mavenpom.xml

<?xml version="1.0" encoding="UTF-8"?>
 
<project xmlns="http://maven.apache.org/POM/4.0.0"
        
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <
modelVersion>4.0.0</modelVersion>
 
    <
groupId>com.edu360</groupId>
    <
artifactId>spark-demo</artifactId>
    <
version>1.0-SNAPSHOT</version>
    <
properties>
        <
maven.compiler.source>1.8</maven.compiler.source>
        <
maven.compiler.target>1.8</maven.compiler.target>
        <
scala.version>2.11.8</scala.version>
        <
scala.compat.version>2.11</scala.compat.version>
        <
spark.version>2.1.0</spark.version>
        <
hadoop.version>2.6.0</hadoop.version>
        <
encoding>UTF-8</encoding>
        <
akka.version>2.4.16</akka.version>
    </
properties>
 
    <
dependencies>
        <
dependency>
            <
groupId>org.scala-lang</groupId>
            <
artifactId>scala-library</artifactId>
            <
version>${scala.version}</version>
        </
dependency>
 
        <
dependency>
            <
groupId>org.apache.spark</groupId>
            <
artifactId>spark-core_2.11</artifactId>
            <
version>${spark.version}</version>
        </
dependency>
 
        <
dependency>
            <
groupId>org.apache.spark</groupId>
            <
artifactId>spark-sql_2.11</artifactId>
            <
version>${spark.version}</version>
        </
dependency>
 
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-client</artifactId>
            <
version>${hadoop.version}</version>
        </
dependency>
 
        <
dependency>
            <
groupId>com.typesafe.akka</groupId>
            <
artifactId>akka-actor_2.11</artifactId>
            <
version>${akka.version}</version>
        </
dependency>
 
        <
dependency>
            <
groupId>com.typesafe.akka</groupId>
            <
artifactId>akka-remote_2.11</artifactId>
            <
version>${akka.version}</version>
        </
dependency>
    </
dependencies>
 
    <
build>
        <
pluginManagement>
            <
plugins>
                <
plugin>
                    <
groupId>net.alchim31.maven</groupId>
                    <
artifactId>scala-maven-plugin</artifactId>
                    <
version>3.2.2</version>
                </
plugin>
                <
plugin>
                    <
groupId>org.apache.maven.plugins</groupId>
                    <
artifactId>maven-compiler-plugin</artifactId>
                    <
version>3.5.1</version>
                </
plugin>
            </
plugins>
        </
pluginManagement>
        <
plugins>
            <
plugin>
                <
groupId>net.alchim31.maven</groupId>
                <
artifactId>scala-maven-plugin</artifactId>
                <
executions>
                    <
execution>
                        <
id>scala-compile-first</id>
                        <
phase>process-resources</phase>
                        <
goals>
                            <
goal>add-source</goal>
                            <
goal>compile</goal>
                        </
goals>
                    </
execution>
                    <
execution>
                        <
id>scala-test-compile</id>
                        <
phase>process-test-resources</phase>
                        <
goals>
                            <
goal>testCompile</goal>
                        </
goals>
                    </
execution>
                </
executions>
            </
plugin>
 
            <
plugin>
                <
groupId>org.apache.maven.plugins</groupId>
                <
artifactId>maven-compiler-plugin</artifactId>
                <
executions>
                    <
execution>
                        <
phase>compile</phase>
                        <
goals>
                            <
goal>compile</goal>
                        </
goals>
                    </
execution>
                </
executions>
            </
plugin>
 
            <
plugin>
                <
groupId>org.apache.maven.plugins</groupId>
                <
artifactId>maven-shade-plugin</artifactId>
                <
version>2.4.3</version>
                <
executions>
                    <
execution>
                        <
phase>package</phase>
                        <
goals>
                            <
goal>shade</goal>
                        </
goals>
                        <
configuration>
                            <
filters>
                                <
filter>
                                    <
artifact>*:*</artifact>
                                    <
excludes>
                                        <
exclude>META-INF/*.SF</exclude>
                                        <
exclude>META-INF/*.DSA</exclude>
                                        <
exclude>META-INF/*.RSA</exclude>
                                    </
excludes>
                                </
filter>
                            </
filters>
                        </
configuration>
                    </
execution>
                </
executions>
            </
plugin>
        </
plugins>
    </
build>
 
</
project>
 

 

 

7.创建scala源码包

image.png

image.png

image.png

image.png

等待一会

image.png