LinDA Publication and Consumption Framework

From LinDA Wiki
Jump to: navigation, search

LinDA Publication and Consumption Framework

Github Page

http://github.com/LinDA-tools/RDF2Any https://github.com/LinDA-tools/QueryBuilder

Scope

The RDF2Any converter and QueryBuilder aim enabling users to explore open RDF data and reuse it easily. The RDF2Any converter provides REST APIs which will eventually allow any developer to convert/serialize RDF datasets to the format they desire. The conversion still requires the users to have a knowledge of SPARQL to retrieve the dataset. So, for users who do not have a knowledge on SPARQL, a QueryBuilder has been built to assist them in constructing a Query. The QueryBuilder provides a robust GUI where users can explore classes in datasets, and get an idea of the data structure and form query by utilizing various filter options. The resulting output format can be downloaded. Also available in the QueryBuilder is a storing facility for serialising transformations as ontology instances. The latter are queryable, which will enable them to easily searched and reused in future.

Technical Components

RDF2Any

  • Java - This is the main language used for all the programming
  • Glassfish - This provides the API server
  • Jena - This is a Java library which helps in connecting to RDF datasets. This also helps in iterating through the retrieved RDF Result Sets.
  • Lucene - a Java-based library which provides the indexing solution for retrieving properties and statistics of a class.


QueryBuilder

  • Ruby on Rails - This the framework for the GUI tool. It uses Ruby as the main programming Language.
  • JavaScript - Client browser side programming language.
  • Coffescript - Ruby based Javascript language for easier coding in javascript.
  • jQuery - Javascript library.
  • Bootstrap - CSS library.
  • SQLite - Database to store query templates.
  • Webbrick - server used for the GUI tool.


Features

The project can be distributed into two main divisions; namely the RDF2Any conversion and the QueryBuilder.


RDF2Any Conversion

The RDF2Any conversion module deals with the conversion of RDF to various serialization outputs like RDB (Relational database script), JSON, CSV and a generic conversion. All the conversion is done through standard RESTful APIs. More details on the API definitions are available in the complete API documentation.

  • RDB Conversion:

In this, the queried RDF dataset is converted to an RDB upload script. The scripts are written in SQL, and are meant for Postgresql DB. The relations of the resultset are mapped and appropriate tables are created so that the output data is normalized. This RDB conversion is more powerful than existing approaches as it normalizes properties in which there are same multiple properties for the same object. Our RDB conversion also provides features for Language of language literals.

  • CSV Conversion:

The queried RDF resultset is converted to a comma separated output file. The properties of class are taken as columns.

  • JSON Conversion:

The queried RDF resultset is converted to JSON formats. Two JSON formats are supported:

  • Conversion suggested by Virtuoso. This output is the recommended version, as it is more reusable and reliable. Our QueryBuilder uses this conversion for viewing of data and other APIs.
  • Conversion Suggested by Sesame. Due to the implementation of SOAP technologies, which are less supported than REST, this output is less reliable than the previous one.  
  • Configured conversion

The queried RDF resultset can be converted to the specific output the user specified. Three programmable items are provided to the user to achieve this, namely:

  • print : this is a simple print of a variable. Variable names, URI and NAME are reserved for uri and name of the object.

$[=variable_name]

  • if condition : This checks whether the property has some particular property.

$[if property_variable_name] some body here $[end]

  • for each loop : This loops over the values of a particular property.

$[for property : property_variable_name] some body here $[end]


Output formats like XML, YAML and even JSON can be achieved using this.


QueryBuilder

The QueryBuilder consists of two parts. One is the GUI tool and the other are APIs which are consumed by this GUI tool. Any developer can build his own tool using these APIs.


QueryBuilder GUI tool

The QueryBuilder GUI tool is provided to the user to assist them to construct a SPARQL query without requiring expertise on the latter. This is a 3 step process:

  • In the first step the dataset is selected from a dropdown list.
  • In the second step the class is selected from a search box.
  • Once the class is selected the property histogram is displayed. The property histogram contains the properties of the class with its count, range classes etc. In this step the user can add more filters to the properties (Both Object and Data type properties), and can check which properties they want in the final output.

Using this tool, the user gets an idea of the data structure of the class and some preview of the objects of the class. The GUI tool also allows the user to save the complete parameters they used to download the result set as an ontology instance that allows the transformation to be potentially reused and/or edited.

QueryBuilder API

This provides APIs for QueryBuilder functionalities, like class search, object search, class properties, etc. More elaborate definitions of these API can be found in the API documentation.


Query Builder Functionality

The following list of activities and functionalities describes the current features of the LinDA Query Builder and the RDF2Any Conversion

  1. Dataset selection
  2. Class selection with autocomplete function
  3. More information shown about selected class: sample instances and number of class instances
  4. Properties available for selected class
  5. Selection of specific properties and generated query
  6. Execute SPARQL query to get sample results
  7. Formats for downloading result set
  8. Option to upload converting template


Limitations

  1. At present only one class can be queried using the query builder. But multiple subclasses can be queried in the same query. More research has to be done to figure out how multiple classes can be queried and viewed in the same view while forming the query as in most of the cases the classes will be unrelated and will provide a lot of cluttered and unimportant information to the user.
  2. Multiple datasets cannot be queried. Since everything is linked open data, a class URI can be nodes in two different datasets. Querying for such use cases requires more research to attain efficient algorithms to achieve the same, and thus is currently not tackled.
  3. The configured convert still lacks more programmable items. If some more items like user defined variables, basic arithmetic operations, etc. can be introduced, then almost all the serialization techniques (apart from RDB) could be achieved using that.


Future work

  1. Provide a faceted browser which will explore the data store for already transformation instances.
  2. Addition of more programmable items to configured convert.
  3. The GUI for Data type property filters in the QueryBuilder tool is a simple textbox. This requires the user to know some kind of syntax. To aid users further, the textbox will be replaced with some simple UI features that will enable the user to select the required inequality and just type in the value required.