Tuesday, July 23, 2013

Flume NG - Installation and Configuration

Geoinsyssoft Hadoop Training : Flume installation 



Download flume bin version

http://archive.apache.org/dist/flume/stable/

copy to home directory

Extract it

tar -xvf apache-flume-1.4.0.bin.tar.gz

Go to //apache-flume-1.4.0.bin/conf

sudo cp conf/flume-conf.properties.template conf/flume.conf
sudo cp conf/flume-env.sh.template conf/flume-env.sh

Copying Local file to HDFS thru Flume ;

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory

# Here exec1 is source name.
agent1.sources.exec1.channels = ch1
agent1.sources.exec1.type = exec
agent1.sources.exec1.command = tail -F /home/geoinsys/test/
#in /home/geoinsys/test/ - source file path for a text file.

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
# Here HDFS is sink name.
agent1.sinks.HDFS.channel = ch1
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://localhost:54310/anand
agent1.sinks.HDFS.hdfs.file.Type = DataStream

# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
#source name can be of anything.(here i have chosen exec1)
agent1.sources = exec1
#sinkname can be of anything.(here i have chosen HDFS)
agent1.sinks = HDFS


Run in terminal

bin/flume-ng node -n agent1 -f conf/flume.conf 


A flow in flumeNG describes the whole transport from a source to a sink. The sink could also be a new source to collect different streams into one sink. The process flume starts is an agent. A setup could be run like the example:
source -             -> source => channel => sink
        \           /       
source - => channel => sink 
        /           \
source -             -> channel => source => channel => sink