big dataAs public sector information data sources and initiatives have proliferated over the last years, linking and combining datasets has become one of the major topics for information managers of SMEs. Due to the massive growth of available data, conventional methods of data integration are bound to fail while the complexity of processes within organisations ask for more agile options to link and mash-up data in a qualified way. Availability and matching of diverse data sources become more crucial and therefore the need for standards-based tools for the information management of SMEs is growing.
In most cases public sector information is not published in a machine-processable format that would allow data re-users from the private sector to automate combining public data with proprietary data sources. In exacerbation to this situation, many public organisations provide open data as unstructured documents or reports, thus making the effort and cost of linking and utilizing this information unbearable for SMEs. The vast majority of Public Data sources do not provide the datasets in a standard format which would support true semantic enrichment and interlinking of data (such as RDF). The very few Public data initiatives that do follow the Linked Data Paradigm mostly focus only on the metadata for the discovery layer of the datasets, therefore leaving the significant value of analysing and linking the actual information contained in the data itself by large unexploited.
During the last years, significant research activities have appeared that focus on industrial relevant scenarios, such as the LATC and the LOD2 projects that aim to contribute highquality interlinked versions of public semantic web datasets and promoting their use in new cross-domain applications by developers across the globe. In the context of these efforts and emerging tools, while there is considerable support for linked data in other issues, such as storage (Virtuoso, MonetDB), linkage (Silk Framework), discovery and publishing (SPARQL standard) and even visualisation of RDF graphs (Gephi, Cytoscape), there are very limited options for renovating existing data into Linked Data. Currently available solutions either support specific structured data formats, such as spreadsheets (XLWrap) and relational databases (D2R, Triplify) or provide RDF representations of data for specific sources (DBpedia). Lastly, most existing work related to exploring and visualizing RDF is limited on concrete domains and datatypes and is mainly focused towards academic researchers that are familiar with the semantic web technologies.
In this context, a unified solution for transforming and renovating existing data sources, regardless of the original data format, would greatly enhance the ability of public organisations to provide usable, machine-processable linked data, while offering SMEs the opportunity to combine and link existing public sector information with privately-owned data in the most resourceful and cost-effective manner. Towards this direction, however, there is also a strong need for supporting consumers unfamiliar with the linked data paradigm through interfaces that hide the underlying complexity and allow the re-use of existing software apps and database management systems.