Install & Run Flink on Multi-node Cluster

1. Objective

In this blog we will learn how to install Apache Flink in cluster mode on Ubuntu 14.04. Setupof Flink on multiple nodes is also called Flink in Distributed mode. This blogprovides step by step tutorial to deploy Apache Flink on multi-node cluster. Apache Flink is lightening fast cluster computing tool, it is also know as 4G of Big Data, to learn more about Apache Flink follow this Introduction Guide .

To learnhow to install Apache Flink onsingle node you can refer this installation guide .

Install & Run Flink on Multi-node Cluster

2. Platform 2.1. Platform Requirements Operating system: Ubuntu 14.04 or later, we can also use other linux flavors like: CentOS, Redhat, etc. Java 7.x or higher 2.2. Configure & Setup Platform

If you are using windows / Mac OS you can create virtual machine and install Ubuntu using VMWare Player , alternatively you can create virtual machine and install Ubuntu using Oracle Virtual Box

3. Prerequisites 3.1 Install Java 7

NOTE: Install Java on all the nodes of the cluster

3.1.1 Install python-software properties

Apache Flink requires Java to be installed as it runs on JVM. Firstly We need to installpython-software properties to add java repositories. To download and install python-software properties use the following command.

dataflair@ubuntu:~$ sudo apt-get install python-software-properties 3.1.2 Add Repository

To add repository runthe below command in terminal:

dataflair@ubuntu:~$ sudo add-apt-repository ppa:webupd8team/java 3.1.3 Update the source list dataflair@ubuntu:~$ sudo apt-get update 3.1.4 Install java

Now we will download and install the Java. To download and install Java runthe below command in terminal:

dataflair@ubuntu:~$ sudo apt-get install oracle-java7-installer

On executing this command Java gets automatically start downloading and gets installed.

To check whether installation procedure gets successfully completed and a completely working Java is installed or not, we have to use the below command:

dataflair@ubuntu:~$ java -version 3.2. Configure SSH

SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure passwordless SSH. Passwordless SSH means without a password we can login to a remote machine. Password less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.

3.2.1. Install Open SSH Server-Client $ sudo apt-get install openssh-server openssh-client 3.2.2. Generate Key Pairs $ ssh-keygen -t rsa -P ""

It will ask “Enter the name of file in which to save the key (/home/dataflair/.ssh/id_rsa):” let it be default, don’t specify any path just press “Enter”. Now it will be available in the default path i.e. “.ssh”. To check the default path use command “$ls .ssh/” and you will see that two files are created “id_rsa” which is a private key and “id_rsa.pub” which is a public key.

3.2.3. Configure password-less SSH

Copy the contents of “id_rsa.pub” of master into the “authorized_keys” files of all the slaves and master

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 3.2.4. Check by SSH to localhost $ ssh localhost $ ssh <SLAVE-IP>

It shouldnot ask for any password and you can easily get logged in to remote machinesince we have configured passwordless SSH

4. Install Apache Flink in Cluster Mode 4.1 Install Flink on Master 4.1.1 Download the Flink Setup

Download the Flink Setup from its official website http://flink.apache.org/downloads.html

4.1.2 Untar the file

In order to extract all the contents of compressed Apache Flink file package, use the below command:

dataflair@ubuntu:~$ tar xzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz 4.1.3 Rename the directory dataflair@ubuntu:~$ mv flink-0.8-incubating-SNAPSHOT/ flink 4.1.4 Setup Configuration 4.1.4.1.Go to Flink confdirectory dataflair@ubuntu:~$ cd flink dataflair@ubuntu:~/flink$ cd conf 4.1.4.2. Add the entry of Master

Choose a master node (JobManager) and set the jobmanager.rpc.address in conf/flink-conf.yaml to its IP or hostname. Make sure that all nodes in your cluster have the same jobmanager.rpc.address configured.

dataflair@ubuntu:~/flink/conf$ nano flink-conf.yaml
Add this line: jobmanager.rpc.address: 192.168.1.3 4.1.4.3. Add the entry of all theSlaves

Add the IPs or hostnames (one per line) of all worker nodes (TaskManager) to the slaves files in conf/slaves. To configure file use the following command.

dataflair@ubuntu:~/flink/conf$ nano slaves

Enter ip addresses like this

192.168.1.4

192.168.1.5

4.2 Install Flink onSlaves 4.2.1 Copy configured setup from master to all the slaves

We will create a tarof configured Flink setup and copy it on all the slaves

4.2.1.1Create tar-ball of configured setup: $ tar czf flink.tar.gz flink

NOTE: Run this command on Master

4.2.1.2 Copy the configured tar-ball on all the slaves $ scp flink.tar.gz 192.168.1.4:~
$ scp flink.tar.gz 192.168.1.5:~

NOTE: Run this command on Master

4.2.1.3 Un-tar configured hadoop setup on all the slaves $ tar xzf flink.tar.gz

NOTE: Run this command on all the slaves

5. Start that Flink Cluster 5.1 Start the cluster

To startthe cluster run below script, it will startall the daemons running on master as well as slaves.

dataflair@ubuntu:~/flink/$ bin/start-cluster.sh

NOTE: Run this command on Master

5.2Check whether services have been started 5.2.1Check daemons on Master $ jps
JobManager 5.2.2 Check daemons onSlaves $ jps
TaskManager 5.3 Stop the cluster

To stop the cluster run below script, it will stop all the daemons running on master as well as slaves

dataflair@ubuntu:~/flink/$ bin/stop-cluster.sh

Spark or Flink which will be the successor of Hadoop-MapReduce, Refer Spark vs Flink comparison Guide

Install & Run Flink on Multi-node Cluster

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

新华网这张照片绝了!直讽江泽民宋祖英淫乱组图

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

Google Chrome Portable 140.0.7339.186 穩定版免安裝中文版 - Google 瀏覽器

免费翻墙节点大全