Friday, May 29, 2015

Integrate Pentaho With Hadoop (Cloudera)

STEPS TO INTEGRATE PENTAHO WITH HADOOP

1.Before making changes to the existing configuration file please observe pentaho installation path. Go To the Directory  " \design-tools\data-integration\plugins "

2. In this particular folder, we will have a directory with the name " pentaho-big-data-plugin "

3. Navigate to "pentaho-big-data-plugin" directory and observe the file configuration located here. A file with the name "plugin.properties" existed in this folder.

4.Try to analyse by opening the file, at very beginning of the file there will be a field  with name "active.hadoop.configuration " and it set to to hadoop-20 by default. It some thing like below image.



5. In "pentaho-big-data-plugin" directory there is another  folder available with name "hadoop-configurations". Lets navigate and observe the folder structure.

6.Now in the "plugin.properties" file observe the value set to the filed "active.hadoop.configuration". by default it is set to hadoop-20. If we want to change this to cloudera then there are other folders availble with name "cdh*". So simply in the "plugin.propertes" file change the value accordingly.

7. After modifications done to the file it looks like in below image.Save the file and restart the server in order to reflect changes.


8. Now you can connect to Hadoop eco system from your Pentaho Data Integration.