Pig Introduction:
Apache pig is a platform for analyzing large data sets that consist of a high - level languages for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
At the present time, pig infrastructure layer consist of a compiler that produces a sequence of Map-Reduce programs , for which large scale parallel implementations already exist.
Pig's language layer currently consist of a textual language called pig latin and will have the below functionalities:
Ease of programming :
It is trivial to achieve parallel execution of simple , "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequence making them easy to write and understand and maintain.
Optimization Opportunities:
The way in which tasks are encoded permits the system to optimize execution automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility:
Users can create their own custom function to do special processing of data.
syntax to go to PIG terminal:
#pig -x local
after executing the above command you will be navigated to grunt>
where you can execute pig commands
Difference between PIG and HIVE to execute statements:
HIVE PIG
hive>execute statement; grunt>statement;
from linux/unix shell:
$hive -e 'execute statement' $pig -e 'execute statement'
$hive -f .hql file $pig -f test.pig
Different modes in PIG:
************THIS PAGE IS STILL UNDER CONSTRUCTION **********************
Apache pig is a platform for analyzing large data sets that consist of a high - level languages for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
At the present time, pig infrastructure layer consist of a compiler that produces a sequence of Map-Reduce programs , for which large scale parallel implementations already exist.
Pig's language layer currently consist of a textual language called pig latin and will have the below functionalities:
Ease of programming :
It is trivial to achieve parallel execution of simple , "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequence making them easy to write and understand and maintain.
Optimization Opportunities:
The way in which tasks are encoded permits the system to optimize execution automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility:
Users can create their own custom function to do special processing of data.
- Developed by yahoo and managing by apache.
- Developed using JAVA
- Process data by using pig procedural called Pig-Latin
- Step by step process
- Pig can deal with structured , semi structured and unstructured.
- It will works directly on HDFS
- current version is 0.14 (As per blog update)
- Can process any kind of data
- Can process data from HDFS
- Pig has two modes one is MapReduce mode and Local mode
syntax to go to PIG terminal:
#pig -x local
after executing the above command you will be navigated to grunt>
where you can execute pig commands
Difference between PIG and HIVE to execute statements:
HIVE PIG
hive>execute statement; grunt>statement;
from linux/unix shell:
$hive -e 'execute statement' $pig -e 'execute statement'
$hive -f .hql file $pig -f test.pig
Different modes in PIG:
- Local
- Mapreduce
- Tez
************THIS PAGE IS STILL UNDER CONSTRUCTION **********************
No comments:
Post a Comment