Sunteți pe pagina 1din 17

Create Your First FLUME Program - Beginner's Tutorial

Prerequisites:
This tutorial is developed on Linux - Ubuntu operating System.
You should have Hadoop (version 2.2.0 used for this tutorial)
already installed and is running on the system.
You should have Java(version 1.8.0 used for this tutorial) already installed on the system.
You should have set JAVA_HOME accordingly.
Before we start with the actual process, change user to 'hduser' (user used for Hadoop ).
su - hduser

Steps :
Flume, library and source code setup
1. Create a new directory with name 'FlumeTutorial'
sudo mkdir FlumeTutorial
1. Give read, write and execute permissions sudo chmod -R 777 FlumeTutorial
2. Copy files MyTwitterSource.java and MyTwitterSourceForFlume.java in this directory.
Download Input Files From Here
Check the file permissions of all these files and if 'read' permissions are missing then grant the
same-

2. Download 'Apache Flume' from site- https://flume.apache.org/download.html


Apache Flume 1.4.0 has been used in this tutorial.

Next Click

3. Copy the downloaded tarball in the directory of your choice and extract contents using the following
command
sudo tar -xvf apache-flume-1.4.0-bin.tar.gz

This command will create a new directory named apache-flume-1.4.0-bin and extract files into it. This
directory will be referred to as <Installation Directory of Flume> in rest of the article.
4. Flume library setup
Copy twitter4j-core-4.0.1.jar, flume-ng-configuration-1.4.0.jar, flume-ng-core-1.4.0.jar, flume-ng-sdk1.4.0.jar to
<Installation Directory of Flume>/lib/
It is possible that either or all of the copied JAR will have execute permission. This may cause issue with the
compilation of code. So, revoke execute permission on such JAR.
In my case, twitter4j-core-4.0.1.jar was having execute permission. I revoked it as belowsudo chmod -x twitter4j-core-4.0.1.jar

After this command give 'read' permission on twitter4j-core-4.0.1.jar to all.


sudo chmod +rrr /usr/local/apache-flume-1.4.0-bin/lib/twitter4j-core-4.0.1.jar
Please note that I have downloaded- twitter4j-core-4.0.1.jar from http://mvnrepository.com/artifact/org.twitter4j/twitter4j-core
- Allflume JARs i.e., flume-ng-*-1.4.0.jar from http://mvnrepository.com/artifact/org.apache.flume
Load data from Twitter using Flume
1. Go to directory containing source code files in it.
2. Set CLASSPATH to contain <Flume Installation
Dir>/lib/* and ~/FlumeTutorial/flume/mytwittersource/*
export CLASSPATH="/usr/local/apache-flume-1.4.0bin/lib/*:~/FlumeTutorial/flume/mytwittersource/*"

3. Compile source code using commandjavac -d . MyTwitterSourceForFlume.java MyTwitterSource.java

4.Create jar
First,create Manifest.txt file using text editor of your choice and add below line in itMain-Class: flume.mytwittersource.MyTwitterSourceForFlume
.. here flume.mytwittersource.MyTwitterSourceForFlume is name of the main class. Please note that you
have to hit enter key at end of this line.

Now, create JAR 'MyTwitterSourceForFlume.jar' asjar cfm MyTwitterSourceForFlume.jar Manifest.txt flume/mytwittersource/*.class

5. Copy this jar to <Flume Installation Directory>/lib/


sudo cp MyTwitterSourceForFlume.jar <Flume Installation Directory>/lib/

6. Go to configuration directory of Flume, <Flume Installation Directory>/conf


If flume.conf does not exist, then copy flume-conf.properties.template and rename it to flume.conf
sudo cp flume-conf.properties.template flume.conf

If flume-env.sh does not exist, then copy flume-env.sh.template and rename it to flume-env.sh
sudo cp flume-env.sh.template flume-env.sh

7. Create a Twitter application by signing in to https://dev.twitter.com/user/login?destination=home

a. Go to 'My applications' (This option gets dropped down when 'Egg'


button at top right corner is clicked)

b. Create a new application by clicking 'Create New App'


c. Fill up application details by specifying name of application, description
and website. You may refer to the notes given underneath each input box.

d. Scroll down the page and accept terms by marking 'Yes, I agree' and click on button 'Create your
Twitter application'

e. On window of newly created application, go to tab, 'API Keys' scroll down the page and click
button 'Create my access token'

f. Refresh the page.


g. Click on 'Test OAuth'. This will display 'OAuth' settings of application.

h. Modify 'flume.conf' (created in Step 6) using these OAuth settings. Steps to modify 'flume.conf' are
given in step 8 below.

We need to copy Consumer key, Consumer secret, Access token and Access token secret to update
'flume.conf'.
Note: These values belongs to the user and hence are confidential, so should not be shared.
8. Open 'flume.conf' in write mode and set values for below parameters[A]
sudo gedit flume.conf

Copy below contentsMyTwitAgent.sources = Twitter


MyTwitAgent.channels = MemChannel
MyTwitAgent.sinks = HDFS
MyTwitAgent.sources.Twitter.type = flume.mytwittersource.MyTwitterSourceForFlume
MyTwitAgent.sources.Twitter.channels = MemChannel
MyTwitAgent.sources.Twitter.consumerKey = <Copy consumer key value from Twitter App>
MyTwitAgent.sources.Twitter.consumerSecret = <Copy consumer secret value from Twitter App>
MyTwitAgent.sources.Twitter.accessToken = <Copy access token value from Twitter App>
MyTwitAgent.sources.Twitter.accessTokenSecret = <Copy access token secret value from Twitter App>
MyTwitAgent.sources.Twitter.keywords = guru99
MyTwitAgent.sinks.HDFS.channel = MemChannel
MyTwitAgent.sinks.HDFS.type = hdfs
MyTwitAgent.sinks.HDFS.hdfs.path = hdfs://localhost:54310/user/hduser/flume/tweets/
MyTwitAgent.sinks.HDFS.hdfs.fileType = DataStream
MyTwitAgent.sinks.HDFS.hdfs.writeFormat = Text
MyTwitAgent.sinks.HDFS.hdfs.batchSize = 1000
MyTwitAgent.sinks.HDFS.hdfs.rollSize = 0
MyTwitAgent.sinks.HDFS.hdfs.rollCount = 10000
MyTwitAgent.channels.MemChannel.type = memory
MyTwitAgent.channels.MemChannel.capacity = 10000
MyTwitAgent.channels.MemChannel.transactionCapacity = 1000

[B]
Also, set TwitterAgent.sinks.HDFS.hdfs.path as below,
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://<Host Name>:<Port Number>/<HDFS Home
Directory>/flume/tweets/

To know <Host Name>, <Port Number> and <HDFS Home Directory> , see value of
parameter 'fs.defaultFS' set in $HADOOP_HOME/etc/hadoop/core-site.xml

[C]
In order to flush the data to HDFS, as an when it comes, delete below entry if it exists,
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
9. Open 'flume-env.sh' in write mode and set values for below parameters,
JAVA_HOME=<Installation directory of Java>
FLUME_CLASSPATH="<Flume Installation Directory>/lib/MyTwitterSourceForFlume.jar"

10. Start Hadoop


$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
11. Two of the JAR files from the Flume tar ball are not compatible with Hadoop 2.2.0. So, we will need to
follow below steps to make Flume compatible with Hadoop 2.2.0.
a. Move protobuf-java-2.4.1.jar out of '<Flume Installation Directory>/lib'.
Go to '<Flume Installation Directory>/lib'
cd <Flume Installation Directory>/lib
sudo mv protobuf-java-2.4.1.jar ~/

b. Find for JAR file 'guava' as below


find . -name "guava*"

Move guava-10.0.1.jar out of '<Flume Installation Directory>/lib'.


sudo mv guava-10.0.1.jar ~/

c. Download guava-17.0.jar from http://mvnrepository.com/artifact/com.google.guava/guava/17.0

Now, copy this downloaded jar file to '<Flume Installation Directory>/lib'


12. Go to '<Flume Installation Directory>/bin' and start Flume as./flume-ng agent -n MyTwitAgent -c conf -f <Flume Installation Directory>/conf/flume.conf

Command prompt window where flume is fetching Tweets-

From command window message we can see that the output is written
to /user/hduser/flume/tweets/ directory.
Now, open this directory using web browser.
13. To see the result of data load, using a browser open http://localhost:50070/ and browse file system,
then go to the directory where data has been loaded, that is<HDFS Home Directory>/flume/tweets/

You Might Like


Cassandra Data Types & Expiration Tutorial
VBA Operators
What is VBA?
Execute JavaScript based code using Selenium Webdriver

Prev

Next

3 Comments
Recommend

1 Login

Guru99

Share

Sort by Best

Join the discussion


rohit 2 years ago

Thanks for the blog. It is really simple to understand beginner like me. So I started working as you direct while doing
./flume-ng agent -n MyTwitAgent -c conf -f <flume installation="" directory="">/conf/flume.conf is giving me a
following error. I set java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64

see more

Reply Share

Farheen Nilofer a year ago

At the step number 7 following instructions are given:


If flume.conf does not exist, then copy flume-conf.properties.template and rename it to flume.conf
If flume-env.sh does not exist, then copy flume-env.sh.template and rename it to flume-env.sh
Problem : I don't get flume-conf.properties.templates and flume-env.sh.template rather flume-env.ps1.template in
my case.
Can anyone tell me what to do now?

see more

Reply Share
SIVANAGARAJU CHILAKALA 2 years ago

iam Getting this error


Failed to search tweets: 429:Returned in API v1.1 when a request cannot be served due to the application's rate
limit having been exhausted for the resource. See Rate Limiting in API v1.1.(https://dev.twitter.com/docs/r...
message - Rate limit exceeded
code - 88
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search... or
http://www.google.co.jp/search...
TwitterException{exceptionCode=[506c3b98-13a0928b], statusCode=429, message=Rate limit exceeded,
code=88, retryAfter=-1, rateLimitStatus=RateLimitStatusJSONImpl{remaining=0, limit=180,
resetTimeInSeconds=1425626432, secondsUntilReset=126}, version=4.0.1}
at twitter4j.HttpClientImpl.handleRequest(HttpClientImpl.java:164)
at twitter4j.HttpClientBase.request(HttpClientBase.java:53)
at twitter4j.HttpClientBase.get(HttpClientBase.java:71)
at twitter4j.TwitterImpl.get(TwitterImpl.java:1968)
at twitter4j.TwitterImpl.search(TwitterImpl.java:293)
at flume.mytwittersource.MyTwitterSource.fetchTweets(MyTwitterSource.java:63)
at flume.mytwittersource.MyTwitterSourceForFlume.start(MyTwitterSourceForFlume.java:29)
see more

Reply Share

ALSO ON GURU99

SAP HANA Reporting

Business

1 comment 6 months ago

1 comment 8 months ago

Subhash456 Very

SAP HANA SQL -

Abel Chithila Kwagwanji

Application Testing

SAP HANA SQL -

Application Testing

1 comment 6 months ago

1 comment 5 months ago

balu aggala

Subscribe

d Add Disqus to your site

Jaivik Brahmbhatt

Privacy

1. System Administrator
2. Linux Programs
3. Current Job Openings
4. Learn Java Programming
5. HTML Templates

BigData Tutorials
1) What Is Big Data
2) Learn Hadoop
3) Installation
4) Learn HDFS
5) MAPReduce
6) First Hadoop Program
7) Counters & Joins In MapReduce
8) Flume and Sqoop
9) Sqoop vs Flume vs HDFS in Hadoop
10) HANDS-ON : FLUME
11) Hadoop PIG
12) Learn OOZIE
13)Big Data Testing: Functional & Performance

About
About us
Advertise with Us

Jobs
Privacy Policy

Contact Us
Contact us
FAQ
Write For Us
Follow Us

Certifications
ISTQB Certification
MySQL Certification
QTP Certification Testing Certification
CTAL Exam
Execute online
Execute Java Online
Execute Javascript
Execute HTML
Execute Python
Interesting!
Books to Read!
Contest
Quiz

Copyright - Guru99 2016

S-ar putea să vă placă și