Current analytics tools and data mining techniques are mostly restricted to isolated datasets and therefore remain primarily stand alone and specialized in nature. In many cases, such datasets impose limitations towards answering complex and interdisciplinary questions that require the establishment of association links among similar entities and concepts. In order to establish such associations, the corresponding information has to be represented in a format that is accessible and without any ambiguities regarding its structure and semantics.
This is where linked data principles come. By adopting linked data principles, a set of advantages are provided towards the production of advanced business analytics. Combination of data from multiple and in many cases distributed sources can help businesses enhancing their experience of managing and processing of data, in ways not available before. Moreover, interlinked versions of datasets may be maintained or regularly updated facilitating the provision of access to latest versions of the available datasets. Linked data also provide mechanisms for assessing the quality of the available data, since there is a need to handle data with different quality characteristics.
In addition to the business value produced through the exploitation of linked data for the production of business analytics, the interlinking of the analysis results with the related input used for the analysis has also a set of advantages. The value of the produced data is increased since, detailed information is provided to analysts regarding the exact datasets used in the analysis as well as the details of the process followed during the analysis. Moreover, with linked data analytics, the produced data are made discoverable for further private or public use in the future.
Within LinDA, by taking into account current business trends and challenges towards the production of advanced business analytics that can lead into insights and facilitate businesses to make analytical-driven decisions, an approach for exploiting linked data towards the production of added-value business analytics has been provided. A library of basic and robust data analytic functionality is provided through the support of a set of algorithms, enabling enterprises to utilize and share analytic methods on linked data for the discovery and communication of meaningful new patterns that were unattainable or hidden in the previous isolated data structures. The business analytics and data mining component consists of the following sub-components: the Query selection component, the Algorithm selection and configuration component, the Algorithm execution component and the Linked data analytics management component.
The Query selection component is responsible for processing the output upon the execution of simple or complex queries and loading it as input for the initiation of an analysis process. For this purpose, appropriate interconnection interfaces with the tools that serve the design and execution of SPARQL queries are designed and implemented.
The Algorithm selection and configuration component coordinates the process of suggesting, selecting and configuring an analytics extraction process in a user friendly and quite explanatory way. Selection is realized based on a categorization of the supported algorithms in classification, association, regression/forecasting, clustering and geospatial analysis algorithms.
Upon finalizing the configuration phase of the algorithm, the next step regards the usage of the Algorithm execution component. This component realizes the execution of the analytics process. It should be noted that, at the current phase, integration of algorithms from the Weka open-source tool and the R open-source project for statistical computing is realized.
The produced business analytics are then handled by the linked data analytics management component. This component is responsible for providing the output of the analysis in the appropriate format in order to be meaningful for the end user, as well as easily exploitable for further. This component realizes also the interlinking of the input and output datasets, based on the defined interlinking policy.
Author: Anastasios Zafeiropoulos.