A more technical perspective on the LinDA toolset….
Due to the proliferation of available public sector data sources and initiatives, the interlinking and combination of such datasets has become a topic of major interest within SME information managers. While more agile options for data integration are being requested, conventional methods of data integration are not feasible for use due to the massive size of available data. The current state of the latter data is also mostly unstructured, thus making it either unaccessible for SMEs, or else making the cost of utilizing such data unbearable for SMEs. This calls for tools that support users in the re-use of such data, whilst hiding the underlying complexity and allowing the re-use of existing software applications.
The LinDA Publication and Consumption Framework aims to assist SMEs and data publishers and consumers in analyzing and interlinking public sector information with enterprise data. The main approach of this framework towards encouraging data re-use is the conversion from RDF to a number of other formats, included but not limited to CSV, XLS, XML, JSON and RDB. This conversion will allow users to import open RDF data into the original format of their current system. The data can then be interlinked with the entity’s own data, enabling the potential identification of patterns and maybe even predictions.
Through a user interface, users of the framework are enabled to select what data to access from the available open datasets. This is possible either through a SPARQL endpoint, or otherwise, if the user is not familiar with SPARQL, through a Query Builder. The latter provides drag and drop and auto-complete features which allow a user to easily build the desired query and access the required data. The user can then access an API server where, through RESTful calls, the user can pass the SPARQL query that results in the data to be converted and the conversion format to be used. The generated results can then be downloaded to the user’s desktop.
Let us take German Tours as a use case. German Tours is an SME that provides various tours for tourists visiting Germany. A number of languages are used for the tours, including English, Italian and German. Alice, the manager, thinks the tours provided by German Tours need to be updated to reflect current tourist trends. She is of the opinion that they need other relevant data aside from the information that the SME already has, which includes the tour type, language used, and number of tourist bookings.
She therefore starts looking for any relevant information on the web. She discovers the statistical data published by the German NSO, which contains information about relevant touristic information such as most popular tourist nationalities visiting Germany, as well as the time of the year in which they visited the country. This data is ideal for her purpose, as it would enable her to compare the SMEs tour bookings with the actual tourists who visited Germany. Unfortunately however, the available data is in RDF while her data is in RDB format. Alice thus exploits the LinDA Publication and Consumption Framework in order to import the required data into the SMEs system. The German NSO data can be easily accessed through the LinDA easy-to use and intuitive interface which allows Alice to create a SPARQL query in order to query the desired data. Alice then proceeds to convert the required data into RDB format, and download it into the SMEs system. By linking the SMEs data with the German NSO open data, Alice realises that a large number of Spanish tourists visited Germany, however, since German Tours has no Spanish tours available, Alice thus manages to discover a niche in the market that they do not cater for. Thus, through the LinDA Publication and Consumption Framework, Alice is given the opportunity to enhance the SMEs services in order to better reflect current tourist demands.