4.2.1. Grading function parameters
The user can choose the value of three parameters: alfa, beta and gamma. Varying their values will have effects on the final result of the summary, because the grading function for a sentence depends on these coefficients. The BioSumm grading function takes into account the presence of the domain specific words contained in the dictionary G. Let be K the words of the document collection which are identified by the preprocessing block.
For each sentence j in a document i, the grading function is given by [FORMULA] where wk is the number of occourrences of term k of set K, and fik is the key words factor of the Edmundson statistic-based summarization method, used to give higher scores to statistically significant words in the document.
The factor deltaj is a weighting factor which considers the number of occourrences of gene/protein dictionary G in sentence j, and is defined by:
[FORMULA],
where ωg represents the number of distinct occurrences in sentence j of term g belonging to G. Instead, (ωg − 1) counts all the occourrences of g, duplicates included, starting from the second one. This means that in a sentence in which there is five times the same entry of the dictionary ω'g is equal to 1 and (ωg − 1) is equal to 4. We used (ωg − 1) so that no dictionary entry is considered twice.
The parameters α, β and γ are three constant factors. The coefficient α belongs to the range [1, +∞) and its role is to favour the sentences that contain terms in G, disregarding their number. The coefficient β is instead in the range [0, 1] and weights the occurrences of distinct words of G. Finally, the coefficient γ is in the range [0, β] and weights the “repetitions” of words of G.