Semantic data mining

This page has hierarchy - Parent page: Theses


Giulia Bruno, Elena Baralis


Due to the high cardinality and sparseness of data collected in specific domains, the lack of semantic information causes serious drawbacks in the execution of classical data mining algorithms, which treat data simply as values lacking any semantic information among them . The incorporation of semantic models (i.e., ontologies) in the data mining algorithms will improve the knowledge discovery process by adding a knowledge base.

New semantic data mining algorithms incorporating domain ontologies will be implemented, particularly belonging to the following categories:

  • Pattern extraction
  • Anomaly detection
  • Similarity computation
  • Classification


  • Healthcare data analysis
    • An urgent problem in healthcare is to find a way to exploit the vast amount of existing (but currently unanalyzed) patient-related big data.
    • The developed algorithms will support medical experts in understanding how the patients are managed across a healthcare system, to see if they follow the medical guidelines prescribed for their pathology in terms of periodic examinations and drug assumptions or if there are anomalous events.
    • Furthermore, they will allow the discovery of group of homogeneous patients which could benefits from the same treatments and also to predict future patients’ behaviours to plan an efficient utilization of resources and to improve decision making.
  • Industrial data analysis
    • The retrieval of manufacturing knowledge in companies is critical, particularly the similarity identification between new and past products, which relies almost exclusively on the memory and the experience of employees.
    • The developed algorithms will allow the automatic identification of past similar products, so that they can be used to speed up the design of manufacturing of the new product.
    • The similarity will be computed by using a semantic model in the form of ontology, which constitute the reference knowledge hierarchy of concepts, and a new similarity index will be defined based on the portion of overlapping subgraph of concepts related to the two products.