With an increased usage of the internet, the data usage is also getting increased exponentially year on year. So obviously to handle such an enormous data we needed a better platform to process data. So a programming model was introduced called Map R
Partitioning techniques with respect to performance tuning : KEY BASED TECHNIQUES: 1.HASH 2.MODULUS 3.RANGE 4.DB/2 KEYLESS TECHNIQUES:1.same 2.entire 3.Round Robin 4.Random All Key Based stages by default are assosciated with hash as keybased technique. Hash technique:Principle of Partitioning:-Same key column values are given to same node Suppose there are 3 nodes N1,N2,N3 HASH TECHNIQUE
Column1
Column2
Column3
INPUT DATA
RECORDS FETCHED BY N1
RECORDS FETCHED BY N2
RECORDS FETCHED BY N3
10
10
20
10
10
10
30
10
20
20
10
20
30
20
20
20
10
30
20
30
30
30
Hash partitioning technique can be selected in 2 cases No of key columns > 1 No of keycolumns =1 ,(other than Integer) Round Robin:The first record goes to the first processing node, the seco nd goes to the second processing node and so on. When it reaches the last node, then loading starts from the first node once again. In general t his method of partitioning creates approximately equal sized partitions. ROUNDROBIN
Column1
Column2
Column3
INPUT DATA
RECORDS FETCHED BY N1
RECORDS FETCHED BY N2
RECORDS FETCHED BY N3
1
1
2
3
2
4
5
6
3
7
8
9
4
10
11
12
5
13
14
15
6
16
17
18
7
19
8 9 10 11 12 13 14 15 16 17 18 19 20
Random:Records are randomly distributed across all processing nodes. Random partitioning also creates approximately equal sized partitions, but the data is partitioned in a random format.
ROUNDROBIN
Column1
Column2
Column3
INPUT DATA
RECORDS FETCHED BY N1
RECORDS FETCHED BY N2
RECORDS FETCHED BY N3
1
1
2
10
2
9
12
4
3
8
11
6
4
7
5
3
5
14
15
20
6
18
19
17
7
16
13
8 9 10 11 12 13 14 15 16 17 18 19 20
Modulus:Modulus is having good performance when compared to hash. Principle of Partitioning:It Distributes the Data by calculating the MOD value. MOD Value= (Value/No of partitions or nodes) Mod is selected ,when it has only 1 key column and it is an integer .
MODULUS
Column1
Column2
Column3
INPUT DATA
RECORDS FETCHED BY N1
RECORDS FETCHED BY N2
RECORDS FETCHED BY N3
0
0
1
2
3
3
1
2
2
0
1
2
1
3
0 2 3 2 1 1
Modulus technique is used when the key column is numeric type.this is the only difference between hash and modulus technique.