Sunteți pe pagina 1din 4

Partition Components

50

A partition is a file that is a portion of a multifile. 2. A partition is a segment of a parallel computation. To partition data is to divide it into segments, so the data can run in parallel. Some components partition data. There are number of partition components likely partition by Partition by key Partition by Key reads records from the in port and distributes data records to its output flow partitions according to key values.

In the parameter field key has to be mentioned A partition by key component is generally followed by a sort (shall be discussed later)component See the example below

[In the above example in Join component sort parameter is used as input must be sorted ] Partition by Round Robin Partition by round-robin distributes blocks of data records evenly to each output flow in round-robin fashion. Partitioning key is not required. The difference between Partition by Key and Partition by Round Robin is the 1st one may not distribute data uniformly across the all partition in a multi file system but the latter does. Partition by Expression Partition by Expression distributes data records to its output flow partitions according to a specified expression.

In the function parameter we need to mention the required expression For example ((next_in_sequence()*number_of_partition() + this_partition())/number_of_partition)/1000 expression will distribute all the records in block of 1000 records in round robin fashion across all partition For example if (record_sub_typ=="cg1") 0 else if (record_sub_typ=="cg2") 1 else 3 expression suggess all the records having value record_sub_typ is cg1 will pass through flow 0 and if value record_sub_typ is cg2all the records will pass through flow 1 else rest of the records will pass through flow 2. Partition by Range Partition by Range distributes data records to its output flow partitions according to the ranges of key values specified for each partition. This component is not frequently used Use the same key specifier for both components. Make the number of partitions on the flow connected to the out port of Partition by Range the same as the value (n) in the num_partitions parameter of Find Splitters. This component Reads splitter records from the split port, and assumes that these records are sorted according to the key parameter. Determines whether the number of flows connected to the out port is equal to n (where n-1 represents the number of splitter records).If not, Partition by Range writes an error message and stops the execution of the graph. Reads data records from the flows connected to the in port in arbitrary order. Distributes the data records to the flows connected to the out port according to the values of the key field(s), as follows:

a) Assigns records with key values less than or equal to the first splitter record to the first output flow. b) Assigns records with key values greater than the first splitter record, but less than or equal to the second splitter record to the second output flow, and so on.

Partition with Load Balance Partition with Load Balance distributes data records to its output flow partitions, writing more records to the flow partitions that consume records faster. This component is not frequently used.

S-ar putea să vă placă și