Query Designer and SPARQL Editor

From LinDA Wiki
Jump to: navigation, search

- Back to LinDA Tools

Query Designer and SparQL Editor

QueryDesigner.jpg

LinDA Query Designer can be used to create simple or complex linked data queries in a drag-n-drop manner, similar to SQL Query Designers of relational database management systems.With LinDA Query Designer you can create complex queries, join multiple data sources and apply advanced filters with a few clicks. The user selects one (or more) sparql endpoints / stored rdf and the LINDA Query Designer auto-detects the available classes classes and object properties. The items are presented with pagination, and they can be filtered via the “Search terms” input box. The user selects the classes that he desires and drags them to the Query Designer Canvas. The system auto-detects the available properties of the classes and the user selects the properties that he/she wishes to include in the query. The Query Designer prompts the number of instances of each property / class as an indication for the user for the popularity of the class. For each property the user is able to add filters and order by clauses. The user can then click the run button and get the results of the query. No prior knowledge of the SPARQL language is need, although the user can see in real-time the SPARQL query that is being constructed. Moreover the user can link any number of classes together in order to create more complicated queries. The classes can reside in different SPARQL endpoints through the use of the Federated Query of SPARQL 1.1.

Github

https://github.com/LinDA-tools/LindaWorkbench/tree/master/linda/query_designer

Features - LinDA Query Designer

  • Ability to load datasets live from an endpoint or a stored in a triple store
  • With simple drag n drop functionality users can perform SPARQL queries
  • On-the-fly and ultra-fast loading of all available classes and properties found in a dataset
  • On-the-fly loading of classes linked through a predicate (e.g. drag-n-drop of dbpedia author will bring Work and Person classes linked with the Author property)
  • Display the short name of a class but also the full URI to avoid conflicts
  • Ability to see all available owl:classes and subclasses in hierarchy (right panel)
  • Ability to see the number of instances of available owl classes and subclasses. Order classes based on number of available instances
  • Display number of instances of a selected class
  • Autocomplete search on available classes,properties
  • Scalable performance and working on the background to handle loading of thousands of properties and classes
  • Pagination of classes / properties search results
  • Ability to drag-n-drop multiple classes on the designer canvas
  • Browse, find and select Properties contained within selected classes
  • Display number of instances of properties
  • Ability to show / hide a particular property
  • Ability to set a particular property as optional
  • Sort results by selecting "order by" clause of a property (ASC / DESC in case of number / date)
  • Create multiple filters per property (String, Number, Date, URL)
  • Conditional operands on multiple filters ( "AND" or "OR" operands). Ability to create a custom boolean expression
  • Autocomplete search on available URI resources when selecting a URI filter
  • Link properties belonging to different classes (add a connection between two or more classes similar to INNER JOIN of database tables)
  • Auto-generate the equivalent SPARQL Query
  • Display and modify if required the equivalent SPARQL Query
  • Run the SPARQL Query and display the results
  • Enable pagination for the results
  • Download the results using the RDF2Any (JSON, CSV, PDF, SQL script, custom output)
  • Ability to save / load a query created with the Query Designer
  • List the queries created with the Query Designer. Right-click to load them in order to run or modify them
  • Perform federated queries between classes of different endpoints (endpoints need to support SPARQL 1.1)
  • Perform UNION and MINUS queries between classes
  • Ability to set a custom pattern formula (UNION, MINUS) between classes
  • Set DISTINCT, LIMIT and OFFSET Clauses for triple patterns


Features - LinDA SPARQL Editor

  • Provides code style formatting for SPARQL queries
  • Provide intelligent code completion for creating SPARQL queries
  • Provide a list of all available SPARQL commands at the proper SPARQL query position
  • Provide a complete list of Linked Data Vocabularies after the PREFIX clause
  • At Object level, provide a complete list of available classes based on the selected PREFIX at the proper SPARQL query position
  • At Predicate level, provide a complete list of available properties based on selected Subject
  • Provide documentation for each SPARQL query that is being suggested
  • Provide vocabulary link and documentation for each suggested property or class
  • Ability to automatically suggest properties and classes discovered dynamically from both Vocabulary Repository and selected SPARQL endpoint
  • Autocomplete locally defined variables
  • Ability to save / load queries
  • Download the results using the RDF2Any (JSON, CSV, PDF, SQL script, custom output)

Example Tutorial - LinDA Query Designer

First, simple example

  1. In the LinDA Workbench, on the left-side main menu of the website click the “Queries” link.
  2. On the top right of the page click the “Query Designer” green button.
  3. In the gray toolbar on top, a select list contains the datasources connected with LinDA. Choose the IMDB data source.
  4. Next to the datasource select, a list of the item types (Classes) appears. You can use the search box to filter them.
  5. Drag and drop the "Film" button in the white canvas area bellow (workspace) to find films in IMDB.
  6. You can see a new instance named “Film” has been added to the workspace. By default it only contains the film's URI, which works like an ID. From the select in the lower part of the Film instance choose the “label” and “initial release date”. You can also type in the input to find properties faster.
  7. After adding the properties, click the “Run” button in the toolbar (the one with the green arrow). A list of IMDB films with their URIs, labels and initial release dates will appear.
  8. Uncheck the “show” checkbox of the URI property of the film and run the query again. You can also press “F9” to run the query.
  9. By default you see the first 100 results, click on the “Fetch results 101 to 200” on the top right of the results to see the next page.
  10. From the toolbar, drag and drop the “Actor” class to the workspace.
  11. In the Film instance, add the “Actor” property. Then, move the arrow over the actor property row of the Film, click the “add” link in the foreign key column, and move the mouse over the new “Actor” instance.
  12. As you move the mouse over the Actor instance, an arrow appears connecting the two instances. Click the Actor instance to make this arrow permanent and create a connection between the two instances.
  13. The arrow that was created specifies a relationship like a foreign key would in a relational database: The actor property of the Film is now a “foreign key” to the Actor instance.
  14. Add the label property of the Actor, uncheck the “hide” checkbox next to its URI and run the query again.
  15. Each line of the result now contains a film, the film’s release date and an actor that played in this film. To order the results, in the Film instance, in the order by column of the “initial release date” property, select the “DESC” option and run the query again. In the results now, the films are ordered from the most recent to the older.

Filters and Aggregates Follow this example to obtain learn how to add various filters and aggregate functions in your queries.


  1. In the LinDA Workbench, on the left-side main menu of the website click the “Queries” link.
  2. On the top right of the page click the “Query Designer” green button.
  3. In the gray toolbar on top, a select list contains the datasources connected with LinDA. Choose the Eurostat data source.
  4. Drag and drop the Countries button in the workspace. In the new instance, add the “Level of internet access” and “Geocode” properties.
  5. Click the run button or press “F9” to see the results. The first two results (http://wifo5-04.informatik.uni-mannheim.de/eurostat/resource/countries/European_Union_Eu-27_) and (http://wifo5-04.informatik.uni-mannheim.de/eurostat/resource/countries/European_Union_Eu-25_) do not represent actual countries, but different subdivisions of the European Union. In order to filter them out do the following:
    1. On the Filter column, click the edit link of the Geocode property.
    2. In the filter dialog you can specify constraints to allow or exclude results. Select the String filter type for the Geocode property to filter countries based on their code.
    3. In order to exclude all countries with a geocode starting with “eu”, follow these steps:
      1. From the list after Values, choose the option “starts with”.
      2. Type “eu” in the input on the right, and click the +Add filter button.
      3. Type “ea” in the input on the right, and click the +Add filter button.
      4. You must specify how these filters will be linked together. In the top of the Filters dialog, choose the “All filters must be false” option.
      5. Click “Save & Close” to save the new filters.
  6. After adding the necessary filters, run the query again. Only actual countries should now be contained to the result set. Our next target is to calculate the average level of internet access for these countries.
    1. First of all we have to hide all properties that are not common for the average that we'll calculate. Uncheck the “Show” checkbox of the URI and the geocode. Notice that while we're hiding the geocodes from the results they're not actually removed, so all the filters we set still apply.
    2. Right click on the “Level of internet access” property and click “Options”.
    3. From the aggregate functions list select the “Average” function and click OK.
  7. Run the query. The result is the average level of internet access for the countries defined in the Eurostat datasource.

Example Tutorial - LinDA SPARQL Editor

LinDA SPARQL Editor is and advanced, intelli-sense editor for SPARQL queries. Oriented to technical users who know at least the basics of SPARQL syntax, it helps them create syntactically correct queries faster than ever.

Example 1: Simple query construction

  1. In the LinDA workbench, click the SPARQL editor link on the left, in the lower part of the main menu.
  2. From the “Choose a data source” select, choose the “DBPEDIA” option. This step allows users to choose against which datasource their query will be executes.
  3. Click the text area below, type “pre” and press Ctrl + Space. The “PREFIX” keyword show up as an option, and you can read a short documentation of the keyword. Press “enter” or click the keyword to fill it in the editor.
  4. Press space, the type “fo” and press Ctrl + Space to look for vocabularies. The first option should be the foaf (“Friend of a friend”) vocabulary. Choose it to add the namespace prefix of the foaf vocabulary.
  5. Type “prefix dbpedia” and press Ctrl + Space to trigger autocompletion. The dbpedia ontology should be automatically added.
  6. Press enter, type “SELECT ?name ?birthdate WHERE {” and press enter again. We want to fetch people and their dates of birth from DBPedia.
  7. The Friend of a Friend vocabulary defines a “Person” class that describes people. Different properties like their name, birthday and birth place among many others can be available. Type “?person a foaf:” and press Ctrl + Space again. In the autocomplete list the class “Person” should appear. The docs show the different properties of the class, and you can click the read more link to find out more about this class in the vocabulary repository. Choose this, type “.” and press enter again.
  8. We want to find out the person’s name. Type “?person foaf:na” and trigger autocomplete. The property name should show up, choose it and type “?name”. This way you specify that the ?name variable should be connected to the ?person from the previous line with the foaf:name property.
  9. Press enter, type “?person dbpedia:birth” and trigger autocompletion. The birth date property from the DBPedia ontology should appear, choose it and type “?birthdate”.
  10. Finally, type “}” and run the query. The result should contain people found in the DBPedia linked data encyclopedia with their birthdates.

Remember: If all this seems complicated, this is because some programming knowledge is necessary to create SPARQL queries. If you’re already used to creating SQL queries, this should not be unfamiliar. However, don’t try to use the SPARQL editor if you’re not a user with some SPARQL or SQL knowledge: LinDA Query Designer & the Query Builder will guide you in exploring linked data sources.

Example 2: Datasource live exploration from the SPARQL editor

  1. In the LinDA workbench, click the SPARQL editor link on the main menu.
  2. From the “Choose a data source” select, choose the “IMDB” option.
  3. Type “select ?film ?character ?actor where {?performance a performance” and press Ctrl + space to trigger the autocomplete dialog. Since you did not specify a prefix like “foaf:” or “dbpedia:” in the previous example, the editor will try to find classes in the datasource, not the vocabulary repository.Warning:While this technique will work very well with small datasets or endpoints with a small to medium (up to a few thousand) number of classes, it is not guaranteed to work with datasets with more than 10,000 classes or datasets without support for SPARQL 1.1.
  4. From the autocomplete select with the arrows (but don’t click or press enter yet) the Performance class. As you can see there are properties like performance file, film character & performance actor. Click to add the class, type “.” and press enter.
  5. Type “?performace performance_film”, trigger autocomplete, choose the property and type “ ?film .” to connect the performance with its film name.
  6. Change line, type “?performace film_character”, trigger autocomplete and choose the property film character. Note that this will return the URL of the character, not the character’s name (the return type is the Film Character class, not a literal). Type “?film_character .” to connect the property to a variable and press enter.
  7. Now it’s time to get the actual name of the character in the film. Type “?film_character film” and trigger autocomplete. The film character name property should show up, choose it, type “ ?character .” and press enter.
  8. To find the actor of this performance, type “?performance performance_actor”, trigger autocomplete, select the performance actor property and type “ ?actor. }”.
  9. Run the query (press the green arrow saying “Run” or press F9). Each row contains a character, the film it was portrayed in and the name of the actor who performed as this character.
  10. Press Ctrl + S and type a name to save the query.

Innovations

  • By adopting the approach of SQL wizards of popular database management systems, the Query Designer lowers the learning curve of linked data querying. With simple drag n drop functionality, a user is able to perform a simple query without any previous knowledge of either linked data or SPARQL.
  • In contrast to current linked data visual, graph-based exploration tools (e.g LodLive, Facete, OntoWiki browser, etc.) the Query Designer is not limited to the unfolding of one specific class but has the ability for matching triple patterns and conjunctions between any numbers of classes.
  • The Query Designer takes a different approach from existing visual query builders for the construction of the query. Existing visual query builders prompt users to put nodes and links for the construction of a query (e.g Virtuoso iSPARQL). In this context, visual query builders are considered more as a visual aid for constructing the query for experts rather than a query tool for non-linked-data-savvy users. The basic usage workflow of the Query Designer on the other hand is directly targeted to non-expert users.
  • Even for users who do know SPARQL Queries really well, the Query Designer eliminates the need to browse / reference the available classes and properties of an endpoint or triple store. Usually you have to know the exact structure of the stored RDF in an SPARQL endpoint before you can create a query. With LinDA Query Designer you don’t have to know anything about the SPARQL endpoint(s). You just reference the data source and the system automatically prompts you of what is inside. This is an invaluable functionality, for both experts and novice users.
  • The Query Designer can significantly lower the required speed and manual input of a SPARQL query. For instance a simple drag and drop of two classes, a selection of 3-4 properties and a quick set of order by at the desired property can easily substitute 10-15 lines of SPARQL code. For more complicated queries, this benefit becomes even more apparent. Larger queries are much easier managed through the Query Designer.
  • The Query Designer allows out-of-the-box, the ability to drag and drop classes even if they belong to different SPARQL endpoints. This functionality promotes the interlinking power of linked data and reveals a substantial advantage in comparison to working with other data models in isolated data silos. This feature is completely transparent to the user, while under the hood, the query designer takes advantage of the Federated Query syntax and auto-generates all the necessary SPARQL code.
  • Real-time, dynamic exploration of public SPARQL endpoints in the web relies on the ability of underlying tools to efficiently interact with and extract useful information from datasets containing hundreds of millions of RDF triples. A straightforward approach of constructing greedy queries, executing them and waiting for results that will be presented to the user is not realistic in the case of a web based tool for linked data queries. Internally, the Query Designer addresses a number of major performance challenges in order to provide its services to the user with speed and reliability. Identifying all categories of objects and object-to-object relationships in real world data sources can’t be entirely based on data source metadata (like the ontology been added alongside with the actual data), as such metadata are often out of date or even non-existent. As a result, the queries used to identify classes and properties must be based on the actual data while still returning all or at least the most probably relevant entities in tolerable waiting times. Moreover, filtering and (Ajax-based) pagination techniques are utilized in the front-end, as even modern browsers’ performance drops dramatically for documents containing more than a few thousands DOM elements. While entities are loading, the tool remains usable and allows the exploitation of the information already fetched from the endpoint


Description

A major difficulty in linked data adoption from enterprises comes from the fact that a different programming language, named SPARQL, is used in order to query data instead of traditional SQL queries. Although SPARQL syntax resembles SQL, transition between relational and linked data queries is not obvious due to differences between schema-less linked data and relational databases based on predefined, static schemas.

Enterprise users (data analysts, data engineers and others) depend on advanced tools that allow them to construct complex queries using only or almost only graphical methods. The construction of complex queries though a GUI builder, allows non-technical users to quickly construct queries with fewer errors. Even users familiar with SQL may use GUI applications to construct the base for a complex query and then elaborate through a text-based editor. Advanced, technical users in today’s enterprises are most typically at least somewhat familiar with the SQL language and SQL terminology (databases, tables, properties, primary and foreign keys). Graphical query builders always make sure to respect such concepts, with UI elements that resemble them in order to tighten the required learning curve.

Although advanced, GUI-based tools are nowadays important for businesses, data engineers and data experts typically prefer the use of intelligent SQL editors with features like keyword/variable/table/property suggestion, auto-completion and syntax highlighting over GUI query builders. For such users, the use of a quality text-based editor on a database they’re familiar with is far more efficient than any graphical interface.

LinDA Query Builder facilitates the need of a GUI to create queries over a specific class (table) of a data source (database), incorporating filters and sub-classes. With the Query Builder, one can quickly explore the contents and schema of a particular data source. For users who need to create more complex queries interlinking various classes and data sources, we introduce the LinDA Query Designer tools. Along with the LinDA SparQL intelli-sense Editor, LinDA tools cover SMEs needs in exploring linked datasets and constructing linked data queries.


Requirements and limitations

The Query Designer and SparQL Editor were designed and implemented using existing components and modules for specific parts of the architecture, in order to focus development effort in core functionality. This way, specific packages are required for its installation and successful deployment. These packages are all well-established, thoroughly documented, supported by strong communities and actively maintained.

Django framework Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. It is a free and open source web application framework, written in Python, which follows the model–view–controller architectural pattern. Django's primary goal is to ease the creation of complex, database-driven websites. Django emphasizes reusability and "pluggability" of components, rapid development, and the principle of “don't repeat yourself”.

JQuery jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax simpler with an easy-to-use API that works across a multitude of browsers. LinDA RDF2Any Server Via requests to the RDF2Any Server, the results of the written or auto-generated quires can be downloaded, previewed and transformed to various formats (RDF scripts, CSV, json). Relationship to the LinDA Workbench

The Query Designer is a separate app inside the LinDA workbench ecosystem. It is developed in a modular way, allowing it to be easily edited to work with other SparQL result previewing apps. However, it is recommended to install the Query Designer as part of the LinDA Workbench for non-developers. The SpaRQL intelli-sense Editor can be installed as a standalone, and it can be configured to communicate with any Vocabulary and Metadata Repository that implements the LinDA Vocabulary and Metadata Repository API.

Limitations The Query Designer offers graphical interfaces for a sub-set of SparQL 1.1 features. Major SparQL features not implemented in the Designer are:

  • Grouping (GROUP BY) and aggregate functions (SUM, COUNT, AVG, MIN, MAX) (intending to implement in next major tool release)
  • HAVING (intending to implement in next major tool release)
  • Specify output variable (AS) (intending to implement in next major tool release)
  • UNIONS (consider implementing)
  • MINUS (consider implementing)
  • NOT EXISTS – NOT IN (consider implementing – however not useful on real-world data sources due to added complexity)
  • IF (consider implementing)
  • CONSTRUCT (outside of tool scope)
  • DESCRIBE (outside of tool scope)


Usage – Query Designer

In the following image, the different regions in the Query Designer can be identified:

Image v8.png

Regions above contain:

  • [1] Data source selection: Select to which data source to connect to. When changed the contents of [2] and [3] are updated. Data source can change multiple times during the creation of a query, as instances from different data sources my co-exist in a single query.
  • [2] Data sources Items – Terms: Contains all classes and object properties (connections between two different instances). The items are presented paginated (with prev/next buttons), and they can be filtered via the “Search terms” input box. Items can be dragged into the designer workbench to create new instances and connections.
  • [3] Class tree: Previews all classes inside the selected data source. Initially shows only top-level classes (classes that are not subclasses – subcategories of others). When a user tries to expand a class (arrow on the left of the class name), all its subclasses are shown below, if any. Subclasses can have their own subclasses, forming a tree view of the data source classes. They can also be dragged into the designer workbench to create new instances.
  • [4] Designer Workbench: It is the part of the window where instances reside. It automatically expands when an instance is moved towards its lowest part.
  • [5] Instances: Instances are objects of a class type that will be returned by the query. For example, an instance of the type foaf:Person will return all people in the data source from which it was added. Instances “remember” the datasource from which they were added, so changing datasource won’t lose connections to previous data sources. An instance can be seen as a table in relational database query builders. Multiple instances of a single class can co-exist for various reasons like interlinking.
  • [6] Foreign keys: With foreign keys, different instances can be interconnected, like in relational databases. The foreign key can connect any property of a source instance (including itself through its URI) to another instance or a property of another instance. If class instances are added in the same query without foreign keys connecting them, a cross product (OUTER JOIN) operation will be performed.
  • [7] Property: Inside instances various properties can exist. In the bottom of each instance all of its available properties are shown, but only properties that will be added as property rows of the instance will be fetched from the query. Inside each property various operations on it can be triggered.
  • [8] SparQL Editor. As the design changes, the corresponding query is updated on real time. This query is presented in the SparQL Editor as is. It can be edited, but in that case the builder won’t be able to import back changes from raw SparQL to the design, and it will be able to either save it as raw SparQL or drop changes. Changing a query through the Editor is meaningful when users want to see result after small changes, or to add features not supported by the Designer.
  • [9] Run query: When running the query, the data source to which the query must be sent is auto-detected. After a while, query results are shown in [10]
  • [10] Query results: After the query is run, results are shown. They are automatically paginated using the LIMIT and OFFSET SparQL keywords. Additionally, (all) results can be downloaded in various formats, using the RDF2Any server.