LinDA Publication and Consumption Framework

From LinDA Wiki
Jump to: navigation, search

- Back to LinDA Tools

LinDA Publication and Consumption Framework

The RDF2Any converter and QueryBuilder aim enabling users to explore open RDF data and reuse it easily. The RDF2Any converter provides REST APIs which will eventually allow any developer to convert/serialize RDF datasets to the format they desire. The conversion still requires the users to have a knowledge of SPARQL to retrieve the dataset. So, for users who do not have a knowledge on SPARQL, a QueryBuilder has been built to assist them in constructing a Query. The QueryBuilder provides a robust GUI where users can explore classes in datasets, and get an idea of the data structure and form query by utilizing various filter options. The resulting output format can be downloaded. Also available in the QueryBuilder is a storing facility for serialising transformations as ontology instances. The latter are queryable, which will enable them to easily searched and reused in future.

Github Page

http://github.com/LinDA-tools/RDF2Any https://github.com/LinDA-tools/QueryBuilder

Features RDF2Any

  • Transform the result of an SPARQL query to Relational Database SQL script
  • Transform the result of an SPARQL query to CSV
  • Transform the result of an SPARQL query to JSON
  • Transform the result of an SPARQL query to PDF
  • Transform the result of an SPARQL query to customizable output by providing a template

Features QueryBuilder

  • Wizard-style creation of a SPARQL query
  • Users can view classes and subclasses within a single dataset, and also an example of instances of such classes/subclasses
  • Perform an extended search of classes that includes URIs
  • In case of a DBPEDIA class provide links to DBPEDIA for more information
  • Filter results of a selected class based on Object Properties and / or Data Type properties
  • EQUALS and NOT EQUAL operand for filtering Object properties
  • < , > , = operands for filtering Data type properties
  • Preview available data / resources when creating filters
  • Preview and modify Equivalent SPARQL Query
  • Ability to show checked properties in equivalen query and preview
  • Save / Load Query
  • Download Results of Query
  • Print Queries
  • Display query execution time

Example Tutorial

Example 1: Exploring DBpedia

  1. From the Linda main page click "Queries" and then "Query Builder".
  2. In Step 1 in the Query Builder, select the DBpedia dataset.
  3. In Step 2, click the “Show all Classes” button to view all classes in DBpedia. Since DBpedia is quite big, it might take some time to load.
  4. Click on Artist (http://dbpedia.org/ontology/Artist).
  5. Click on an instance (example Artist) to view more details, e.g. “Aaron Lines”. Close the popup window.
  6. Click on “More details on Artist” button to view more details on the Artist class. Close the popup window.
  7. Click on the “Plus” button by the Artist instances to view the subclasses of Artist.
  8. Click on the “World” button by the Actor class.
  9. In Step 3 you can view the properties for the class Actor.
  10. Click the check box beside Object Properties to deselect all properties. Repeat for the Data Type Properties. Tick the check boxes besides “Birth Place” and “Nationality “properties in the Object Properties box. These properties will be included in the results.
  11. Click on the Nationality property. Type Italy and select it from the drop down suggestion. Click Done. This will filter the results to only include Actors whose nationality is Italian.
  12. Click on the “Yes” button besides the Equivalent Sparql Query. This will show the property selection.
  13. Press the green “Play” button to preview a sample of the results.

Example 2 : Creating a Query and Downloading in CSV

  1. From the Linda main page click "Queries" and then "Query Builder".
  2. In Step 1 in the Query Builder, select the DBpedia dataset.
  3. In Step 2, type “populated”. The class Populated place will be suggested in a drop down list. Select Populated Place.
  4. In Step 3 you can view the properties for the class Populated Place. Click the check box beside Object Properties to deselect all properties. Repeat for the Data Type Properties. Tick the check boxes besides “Official Language” in the Object Properties box, and “Area Total” and “Population Density” from the Data Type Properties box. These properties will be included in the results.
  5. Click on the Official Language property. Type Spanish and select “Spanish Language” from the drop down suggestion. Click Done. This will filter the results to only include Populated Places whose Official Language is Spanish.
  6. Click on the “Yes” button besides the Equivalent Sparql Query. This will show the property selection.
  7. Press the green “Play” button to preview a sample of the results.
  8. Click on the red “Download” button. Select CSV. Save the results on your computer.
  9. Open the saved CSV document with Microsoft Excel.

Example 3 : Creating a Query and Downloading in JSON

  1. From the Linda main page click "Queries" and then "Query Builder".
  2. In Step 1 in the Query Builder, select the DBpedia dataset.
  3. In Step 2, type “Country”. Select the class Country from the drop down list.
  4. In Step 3 you can view the properties for the class “Populated Place”.
  5. Click on the “Population Density” property. Type “>500”. Click Done. This will filter the results to only include countries whose population density is larger than 500 per square km.
  6. Click on the “Yes” button besides the Equivalent Sparql Query. This will show the property selection.
  7. Press the green “Play” button to preview a sample of the results.
  8. Click on the red “Download” button. Select JSON. Click on “Standard W3C Recommended”. The results will load in a new tab in your browser.


Example 4 : Adding a new SPARQL endpoint

  1. From the Linda main page click "Queries" and then "Query Builder".
  2. In Step 1 in the Query Builder, select “Add your own SPARQL endpoint” from the drop down list.
  3. Write “http://linkedgeodata.org/sparql” and click “Add”.

Technical Components

RDF2Any

  • Java - This is the main language used for all the programming
  • Glassfish - This provides the API server
  • Jena - This is a Java library which helps in connecting to RDF datasets. This also helps in iterating through the retrieved RDF Result Sets.
  • Lucene - a Java-based library which provides the indexing solution for retrieving properties and statistics of a class.


QueryBuilder

  • Ruby on Rails - This the framework for the GUI tool. It uses Ruby as the main programming Language.
  • JavaScript - Client browser side programming language.
  • Coffescript - Ruby based Javascript language for easier coding in javascript.
  • jQuery - Javascript library.
  • Bootstrap - CSS library.
  • SQLite - Database to store query templates.
  • Webbrick - server used for the GUI tool.


Features

The project can be distributed into two main divisions; namely the RDF2Any conversion and the QueryBuilder.


RDF2Any Conversion

The RDF2Any conversion module deals with the conversion of RDF to various serialization outputs like RDB (Relational database script), JSON, CSV and a generic conversion. All the conversion is done through standard RESTful APIs. More details on the API definitions are available in the complete API documentation.

  • RDB Conversion:

In this, the queried RDF dataset is converted to an RDB upload script. The scripts are written in SQL, and are meant for Postgresql DB. The relations of the resultset are mapped and appropriate tables are created so that the output data is normalized. This RDB conversion is more powerful than existing approaches as it normalizes properties in which there are same multiple properties for the same object. Our RDB conversion also provides features for Language of language literals.

  • CSV Conversion:

The queried RDF resultset is converted to a comma separated output file. The properties of class are taken as columns.

  • JSON Conversion:

The queried RDF resultset is converted to JSON formats. Two JSON formats are supported:

  • Conversion suggested by Virtuoso. This output is the recommended version, as it is more reusable and reliable. Our QueryBuilder uses this conversion for viewing of data and other APIs.
  • Conversion Suggested by Sesame. Due to the implementation of SOAP technologies, which are less supported than REST, this output is less reliable than the previous one.  
  • Configured conversion

The queried RDF resultset can be converted to the specific output the user specified. Three programmable items are provided to the user to achieve this, namely:

  • print : this is a simple print of a variable. Variable names, URI and NAME are reserved for uri and name of the object.

$[=variable_name]

  • if condition : This checks whether the property has some particular property.

$[if property_variable_name] some body here $[end]

  • for each loop : This loops over the values of a particular property.

$[for property : property_variable_name] some body here $[end]


Output formats like XML, YAML and even JSON can be achieved using this.


QueryBuilder

The QueryBuilder consists of two parts. One is the GUI tool and the other are APIs which are consumed by this GUI tool. Any developer can build his own tool using these APIs.


QueryBuilder GUI tool

The QueryBuilder GUI tool is provided to the user to assist them to construct a SPARQL query without requiring expertise on the latter. This is a 3 step process:

  • In the first step the dataset is selected from a dropdown list.
  • In the second step the class is selected from a search box.
  • Once the class is selected the property histogram is displayed. The property histogram contains the properties of the class with its count, range classes etc. In this step the user can add more filters to the properties (Both Object and Data type properties), and can check which properties they want in the final output.

Using this tool, the user gets an idea of the data structure of the class and some preview of the objects of the class. The GUI tool also allows the user to save the complete parameters they used to download the result set as an ontology instance that allows the transformation to be potentially reused and/or edited.

QueryBuilder API

This provides APIs for QueryBuilder functionalities, like class search, object search, class properties, etc. More elaborate definitions of these API can be found in the API documentation.


Query Builder Functionality

The following list of activities and functionalities describes the current features of the LinDA Query Builder and the RDF2Any Conversion

  1. Dataset selection
  2. Class selection with autocomplete function
  3. More information shown about selected class: sample instances and number of class instances
  4. Properties available for selected class
  5. Selection of specific properties and generated query
  6. Execute SPARQL query to get sample results
  7. Formats for downloading result set
  8. Option to upload converting template

Image p1.png Image p2.png Image p3.png Image p4.png Image p5.png Image p6.png Image p7.png Image p8.png

Limitations

  1. At present only one class can be queried using the query builder. But multiple subclasses can be queried in the same query. More research has to be done to figure out how multiple classes can be queried and viewed in the same view while forming the query as in most of the cases the classes will be unrelated and will provide a lot of cluttered and unimportant information to the user.
  2. Multiple datasets cannot be queried. Since everything is linked open data, a class URI can be nodes in two different datasets. Querying for such use cases requires more research to attain efficient algorithms to achieve the same, and thus is currently not tackled.
  3. The configured convert still lacks more programmable items. If some more items like user defined variables, basic arithmetic operations, etc. can be introduced, then almost all the serialization techniques (apart from RDB) could be achieved using that.


Future work

  1. Provide a faceted browser which will explore the data store for already transformation instances.
  2. Addition of more programmable items to configured convert.
  3. The GUI for Data type property filters in the QueryBuilder tool is a simple textbox. This requires the user to know some kind of syntax. To aid users further, the textbox will be replaced with some simple UI features that will enable the user to select the required inequality and just type in the value required.