Quantcast
Channel: CodeSection,代码区,Linux操作系统:Ubuntu_Centos_Debian - CodeSec
Viewing all articles
Browse latest Browse all 11063

Install & Run Flink on Multi-node Cluster

$
0
0
1. Objective

In this blog we will learn how to install Apache Flink in cluster mode on Ubuntu 14.04. Setupof Flink on multiple nodes is also called Flink in Distributed mode. This blogprovides step by step tutorial to deploy Apache Flink on multi-node cluster. Apache Flink is lightening fast cluster computing tool, it is also know as 4G of Big Data, to learn more about Apache Flink follow this Introduction Guide .

To learnhow to install Apache Flink onsingle node you can refer this installation guide .


Install & Run Flink on Multi-node Cluster
2. Platform 2.1. Platform Requirements Operating system: Ubuntu 14.04 or later, we can also use other linux flavors like: CentOS, Redhat, etc. Java 7.x or higher 2.2. Configure & Setup Platform

If you are using windows / Mac OS you can create virtual machine and install Ubuntu using VMWare Player , alternatively you can create virtual machine and install Ubuntu using Oracle Virtual Box

3. Prerequisites 3.1 Install Java 7

NOTE: Install Java on all the nodes of the cluster

3.1.1 Install python-software properties

Apache Flink requires Java to be installed as it runs on JVM. Firstly We need to installpython-software properties to add java repositories. To download and install python-software properties use the following command.

dataflair@ubuntu:~$ sudo apt-get install python-software-properties 3.1.2 Add Repository

To add repository runthe below command in terminal:

dataflair@ubuntu:~$ sudo add-apt-repository ppa:webupd8team/java 3.1.3 Update the source list dataflair@ubuntu:~$ sudo apt-get update 3.1.4 Install java

Now we will download and install the Java. To download and install Java runthe below command in terminal:

dataflair@ubuntu:~$ sudo apt-get install oracle-java7-installer

On executing this command Java gets automatically start downloading and gets installed.

To check whether installation procedure gets successfully completed and a completely working Java is installed or not, we have to use the below command:

dataflair@ubuntu:~$ java -version 3.2. Configure SSH

SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure passwordless SSH. Passwordless SSH means without a password we can login to a remote machine. Password less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.

3.2.1. Install Open SSH Server-Client $ sudo apt-get install openssh-server openssh-client 3.2.2. Generate Key Pairs $ ssh-keygen -t rsa -P ""

It will ask “Enter the name of file in which to save the key (/home/dataflair/.ssh/id_rsa):” let it be default, don’t specify any path just press “Enter”. Now it will be available in the default path i.e. “.ssh”. To check the default path use command “$ls .ssh/” and you will see that two files are created “id_rsa” which is a private key and “id_rsa.pub” which is a public key.

3.2.3. Configure password-less SSH

Copy the contents of “id_rsa.pub” of master into the “authorized_keys” files of all the slaves and master

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 3.2.4. Check by SSH to localhost $ ssh localhost $ ssh <SLAVE-IP>

It shouldnot ask for any password and you can easily get logged in to remote machinesince we have configured passwordless SSH

4. Install Apache Flink in Cluster Mode 4.1 Install Flink on Master 4.1.1 Download the Flink Setup

Download the Flink Setup from its official website http://flink.apache.org/downloads.html

4.1.2 Untar the file

In order to extract all the contents of compressed Apache Flink file package, use the below command:

dataflair@ubuntu:~$ tar xzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz 4.1.3 Rename the directory dataflair@ubuntu:~$ mv flink-0.8-incubating-SNAPSHOT/ flink 4.1.4 Setup Configuration 4.1.4.1.Go to Flink confdirectory dataflair@ubuntu:~$ cd flink dataflair@ubuntu:~/flink$ cd conf 4.1.4.2. Add the entry of Master

Choose a master node (JobManager) and set the jobmanager.rpc.address in conf/flink-conf.yaml to its IP or hostname. Make sure that all nodes in your cluster have the same jobmanager.rpc.address configured.

dataflair@ubuntu:~/flink/conf$ nano flink-conf.yaml
Add this line: jobmanager.rpc.address: 192.168.1.3 4.1.4.3. Add the entry of all theSlaves

Add the IPs or hostnames (one per line) of all worker nodes (TaskManager) to the slaves files in conf/slaves. To configure file use the following command.

dataflair@ubuntu:~/flink/conf$ nano slaves

Enter ip addresses like this

192.168.1.4

192.168.1.5

4.2 Install Flink onSlaves 4.2.1 Copy configured setup from master to all the slaves

We will create a tarof configured Flink setup and copy it on all the slaves

4.2.1.1Create tar-ball of configured setup: $ tar czf flink.tar.gz flink

NOTE: Run this command on Master

4.2.1.2 Copy the configured tar-ball on all the slaves $ scp flink.tar.gz 192.168.1.4:~
$ scp flink.tar.gz 192.168.1.5:~

NOTE: Run this command on Master

4.2.1.3 Un-tar configured hadoop setup on all the slaves $ tar xzf flink.tar.gz

NOTE: Run this command on all the slaves

5. Start that Flink Cluster 5.1 Start the cluster

To startthe cluster run below script, it will startall the daemons running on master as well as slaves.

dataflair@ubuntu:~/flink/$ bin/start-cluster.sh

NOTE: Run this command on Master

5.2Check whether services have been started 5.2.1Check daemons on Master $ jps
JobManager 5.2.2 Check daemons onSlaves $ jps
TaskManager 5.3 Stop the cluster

To stop the cluster run below script, it will stop all the daemons running on master as well as slaves

dataflair@ubuntu:~/flink/$ bin/stop-cluster.sh

Spark or Flink which will be the successor of Hadoop-MapReduce, Refer Spark vs Flink comparison Guide


Viewing all articles
Browse latest Browse all 11063

Trending Articles