{"id":384,"date":"2010-12-29T10:13:29","date_gmt":"2010-12-29T09:13:29","guid":{"rendered":"http:\/\/dbdmg.polito.it\/wordpress\/?page_id=384"},"modified":"2011-10-06T21:19:28","modified_gmt":"2011-10-06T20:19:28","slug":"biosumm","status":"publish","type":"page","link":"https:\/\/dbdmg.polito.it\/wordpress\/research\/biosumm\/","title":{"rendered":"BioSumm"},"content":{"rendered":"<h3>Goals of the project<\/h3>\n<p style=\"text-align: justify;\">The BioSumm project tackles the problem of managing and exploiting the huge mass of information contained in increasingly wider text repositories such as PubMed. The project aims at becoming a powerful automatic instrument to support for both knowledge inference from scientific papers and biological validation of gene\/proteins interactions obtained in different ways (e.g., with other data mining techniques).<\/p>\n<p style=\"text-align: justify;\">Researchers that discover gene correlations by means of analysis tools (e.g., data mining tools) may exploit this framework to effectively support the biological validation of their results.<\/p>\n<h3>Framework description<\/h3>\n<p style=\"text-align: justify;\">BioSumm is a flexible and modular framework which analyzes large collections of unclassified biomedical texts and produces ad hoc summaries oriented to biological information. Its modular architecture is composed by two blocks:<\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-400\" title=\"BioSumm_framework\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_framework.jpg\" alt=\"\" width=\"468\" height=\"108\" srcset=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_framework.jpg 468w, https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_framework-300x69.jpg 300w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><\/p>\n<ul>\n<li style=\"text-align: justify;\"><em><strong>Preprocessing and Categorization<\/strong><\/em>. It extracts relevant parts of the original document, produces a matricial representation of the sources and divides unclassified and rather diverse texts into homogeneous clusters.<\/li>\n<li style=\"text-align: justify;\"><strong><em>Summarization.<\/em><\/strong> For each cluster it produces a summary oriented to biological information.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">The first block is a general purpose block with the goal of preparing the input documents for the summarization part. The prepocessing part is performed using the RapidMiner text plug-in whereas the categorization part exploits the CLUTO software package for clustering. The summarization block is the core of the framework. The summary generation is driven by a novel grading function, which biases sentence selection by means of an appropriate domain dictionary. In the current version of BioSumm, in order to focus on a biological target, the dictionary contains genes and proteins names and aliases.<\/p>\n<h3>Experimental results<\/h3>\n<p style=\"text-align: justify;\">BioSumm is neither a traditional summarizer nor a extractor of dictionary terms. It is designed to be a summarizer oriented to the biological domain. Thus, its summaries have both the expressive power of the traditional summaries and the domain specificity of documents produced by a dictionary entry extractor.<\/p>\n<p style=\"text-align: justify;\">The difference with a traditional summarizer may be appreciated in the next table. It reports the six most graded sentences in BioSumm and in a traditional summary. The table was produced by the experiments carried on the scientific journals freely available in PubMed Central. Specifically, it contains sentences belonging to the a cluster of documents belonging to the Breast Cancer journal. The keywords of the cluster (the words describing its major topics) are proband, Ashkenazi, Jewish<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-401\" title=\"BioSumm_results\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_results.jpg\" alt=\"\" width=\"1013\" height=\"507\" srcset=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_results.jpg 1013w, https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm_results-300x150.jpg 300w\" sizes=\"auto, (max-width: 1013px) 100vw, 1013px\" \/><\/p>\n<p style=\"text-align: justify;\">The comparison shows that BioSumm, although oriented on biology, is still able to cover all the major topics covered by a traditional summarizer. Moreover, its sentences are less generic and contains a lot of genes and proteins which are described in details and not only listed.<\/p>\n<p style=\"text-align: justify;\">The results suggest that researchers that discover gene correlations by means of analysis tools (e.g., data mining tools) may exploit this framework to effectively support the biological validation of their results.<\/p>\n<p style=\"text-align: justify;\">In the following some preliminary experimental results obtained by means of ROUGE are presented.<\/p>\n<table border=\"1\">\n<thead>\n<tr style=\"text-align: center; background-color: #f9b742;\" align=\"center\">\n<td><strong>Datasets<\/strong><\/td>\n<td colspan=\"3\"><strong>BioSumm<\/strong><\/td>\n<td colspan=\"3\"><strong>OTS<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ROUGE-2<\/td>\n<td>Precision<\/td>\n<td>Recall<\/td>\n<td>F-measure<\/td>\n<td>Precision<\/td>\n<td>Recall<\/td>\n<td>F-measure<\/td>\n<\/tr>\n<tr>\n<td>Breast Cancer<\/td>\n<td><strong><em>0.08246<\/em><\/strong><\/td>\n<td><strong><em>0.22553<\/em><\/strong><\/td>\n<td><strong><em>0.11456<\/em><\/strong><\/td>\n<td>0.08026<\/td>\n<td>0.21860<\/td>\n<td>0.11141<\/td>\n<\/tr>\n<tr>\n<td>Arthritis Res<\/td>\n<td><strong><em>0.09089<\/em><\/strong><\/td>\n<td><strong><em>0.25362<\/em><\/strong><\/td>\n<td><strong><em>0.12596<\/em><\/strong><\/td>\n<td>0.08844<\/td>\n<td>0.24406<\/td>\n<td>0.12197<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<table border=\"1\">\n<thead>\n<tr style=\"text-align: center; background-color: #f9b742;\">\n<td><strong>Datasets <\/strong><\/td>\n<td colspan=\"3\"><strong>BioSumm <\/strong><\/td>\n<td colspan=\"3\"><strong>OTS <\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ROUGE-SU4<\/td>\n<td>Precision<\/td>\n<td>Recall<\/td>\n<td>F-measure<\/td>\n<td>Precision<\/td>\n<td>Recall<\/td>\n<td>F-measure<\/td>\n<\/tr>\n<tr>\n<td>Breast Cancer<\/td>\n<td><strong><em>0.10038<\/em><\/strong><\/td>\n<td><strong><em>0.28175<\/em><\/strong><\/td>\n<td><strong><em>0.14053<\/em><\/strong><\/td>\n<td>0.09872<\/td>\n<td>0.27599<\/td>\n<td>0.13811<\/td>\n<\/tr>\n<tr>\n<td>Arthritis Res<\/td>\n<td><strong><em>0.11095<\/em><\/strong><\/td>\n<td><strong><em>0.31777<\/em><\/strong><\/td>\n<td><strong><em>0.15498<\/em><\/strong><\/td>\n<td>0.10905<\/td>\n<td>0.30888<\/td>\n<td>0.15169<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>GUI Interface<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-403\" title=\"screenshot\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/screenshot-1024x864.jpg\" alt=\"\" width=\"640\" height=\"540\" srcset=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/screenshot-1024x864.jpg 1024w, https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/screenshot-300x253.jpg 300w, https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/screenshot.jpg 1601w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<table border=\"0\">\n<tbody>\n<tr>\n<td><strong><img loading=\"lazy\" decoding=\"async\" title=\"button1\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/button1.jpg\" alt=\"\" width=\"32\" height=\"22\" \/><\/strong><\/td>\n<td><strong>Document search.<\/strong>\u00a0The user can set the parameters to retrieve the documents from supported digital libraries.<\/td>\n<\/tr>\n<tr>\n<td><strong><img loading=\"lazy\" decoding=\"async\" title=\"button2\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/button2.jpg\" alt=\"\" width=\"32\" height=\"22\" \/><\/strong><\/td>\n<td><strong>Document browsing.<\/strong>\u00a0Management of retrieved documents to select the most relevant for summarization task.<\/td>\n<\/tr>\n<tr>\n<td><strong><img loading=\"lazy\" decoding=\"async\" title=\"button3\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/button3.jpg\" alt=\"\" width=\"32\" height=\"22\" \/><\/strong><\/td>\n<td><strong>Documents of cluster.<\/strong>\u00a0List of the documents belonging to a cluster identified by the clustering block.<\/td>\n<\/tr>\n<tr>\n<td><strong><img loading=\"lazy\" decoding=\"async\" title=\"button4\" src=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/button4.jpg\" alt=\"\" width=\"32\" height=\"22\" \/><\/strong><\/td>\n<td><strong>Cluster summary<\/strong>.\u00a0The most relevant features (stems) which identify the topic of the cluster.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Downloads<\/h3>\n<p style=\"text-align: justify;\">A new version of BioSumm will be avalaible<\/p>\n<p style=\"text-align: justify;\">GUI application (32 bits) : <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm-1.1.tar.gz\">BioSumm-1.1.tar<\/a><\/p>\n<p style=\"text-align: justify;\">GUI application (64 bits) : <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/BioSumm-1.1_64bit.tar.gz\">BioSumm-1.1_64bit.tar<\/a><\/p>\n<p style=\"text-align: justify;\"><a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2010\/12\/MPI_presentation.pdf\">Presentation at Max Planck Institute (Saarbrucken)<\/a><\/p>\n<h3>Publications<\/h3>\n<p>For further details please refer to the scientific papers.<\/p>\n<div class=\"teachpress_pub_list\"><form name=\"tppublistform\" method=\"get\"><a name=\"tppubs\" id=\"tppubs\"><\/a><\/form><div class=\"teachpress_message_error\"><p>Sorry, no publications matched your criteria.<\/p><\/div><\/div>\n<h3>Contacts<\/h3>\n<p>Further details on the project may be obtained by contacting:<\/p>\n<script type=\"text\/javascript\"> \/\/ <!-- \neval(unescape('%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%61%6c%65%73%73%61%6e%64%72%6f%2e%66%69%6f%72%69%40%70%6f%6c%69%74%6f%2e%69%74%22%3e%61%6c%65%73%73%61%6e%64%72%6f%20%64%6f%74%20%66%69%6f%72%69%20%61%74%20%70%6f%6c%69%74%6f%20%64%6f%74%20%69%74%3c%2f%61%3e%27%29'))\n\/\/ --> <\/script>\n<div><strong><br \/>\n<\/strong><\/div>\n<div><strong><br \/>\n<\/strong><\/div>\n<div><strong><br \/>\n<\/strong><\/div>\n<div><strong><br \/>\n<\/strong><\/div>\n<br class=\"fixfloat\" \/>","protected":false},"excerpt":{"rendered":"<p>Goals of the project The BioSumm project tackles the problem of managing and exploiting the huge mass of information contained in increasingly wider text repositories such as PubMed. The project aims at becoming a powerful automatic instrument to support for both knowledge inference from scientific papers and biological validation of gene\/proteins interactions obtained in different<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/research\/biosumm\/\">[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":397,"parent":98,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-384","page","type-page","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/384","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/comments?post=384"}],"version-history":[{"count":23,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/384\/revisions"}],"predecessor-version":[{"id":2040,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/384\/revisions\/2040"}],"up":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/98"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media\/397"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media?parent=384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}