LinDA Visualization and Exploration

From LinDA Wiki
Jump to: navigation, search

- Back to LinDA Tools

LinDA Visualization and Exploration

Visualization.jpg

LinDA Visualization and Analytic engines can help enterprise users gain insight from the linked data that the company generates. In simple terms...visualizing data in linked data format taking into advantage their semantics

Features

  • A wizard-based workflow for visualizing data in linked data format
  • Data selection step -The user starts with exploring datasets and selecting the data he/she intends to visualize.
  • Visualization selection: Based on the content and format of the data and the semantic descriptions of the available visualizations, a ranking of possible *visualizations is computed and presented to the user
  • Support for Map, Bubble, Area, Line, Pie, Scatter Chart
  • Drag and drop available properties to Horizontal Axis, Vertical Axis, Series
  • Save Visualization Configuration
  • List of saved visualization configurations
  • Change the layout of the visualization for instance by adding labels for axes or increasing the number of grid lines
  • Maximize visualization window
  • Export visualization to SVG and PNG
  • Automatically configure visualization chart based on the data type (XSD:decimal) of the selected dataset columns / properties
  • Automatically recommend visualization chart type based on selected RDF properties (e.g. detect log and lat in RDF and suggest the Map Chart)
  • Automatically configuration of visualization chart based on RDF properties (e.g. detect longitute and latitute and automatically set them as coordinates in a map chart)
  • Ability to embed visualization on a website

Example Tutorial

Example 1. Visualizing an RDF Data

  1. From the Linda main page click “Data Sources”
  2. Find the “Hospitas Review” data source
  3. Right-click on the “Hospitals Review” and select “Visualize”
  4. Press the Menu button in the left top corner to hide the menu bar
  5. Press “+” near Hospitals, then select “Infection_Rate”, “Patiens_Per_Room” and “Nr_of_Beds”.
  6. Press “Visualize” button
  7. Check the automatically suggested visualization (Bubble Chart).
  8. Use the slider “Select Visualization” to scroll suggested charts to the right and select Line Chart
  9. Press X button near the Axis
  10. Drag & Drop “Number_of_Beds” to “Horizontal Axis” and “Infection_Rate” to “Vertical Axis”
  11. Scroll the visualization further to “Bar Chart”
  12. Drag & Drop “Nr_of_Beds” to “Horizontal Axis” and “Patiens_Per_Room” to “Vertical Axis”
  13. Under “Configured Visualization” find “Gridlines” and change the default value (10) to 5
  14. Change “Horizontal Label” to “Horizontal Label” and “Vertical Label” to “Vertical Label”, then clear the textboxes and the default values will return

Example 2. Visualizing an RDF Data

  1. From the Linda main page click “Data Sources”
  2. Search the “Airline Passengers Per Month” data source
  3. Right-click on the “Airline Passengers Per Month” and select “Visualize”
  4. Press the Menu button in the left top corner to hide the menu bar
  5. Press “+” near “Data Entry” and select “date” and “passengers_number”
  6. Press “Visualize”
  7. Under “Configured Visualization” find “Gridlines” and change the default value (10) to 5
  8. Change “Vertical Label” to “Vertical Label”, then clear the textbox and the default value will return

Example 3. Manual assignment of visualization options

  1. From the Linda main page click “Data Sources”
  2. Find the “GDP Italy” data source on the 8th page
  3. Right-click on the “GDP Italy” and select “Visualize”
  4. Press the Menu button in the left top corner to hide the menu bar
  5. Press “+” near “Node” and select “GDP” and “year”
  6. Press Visualize
  7. Select “Area Chart” in the “Select visualization” slider
  8. Press X next to each axis to clear the settings
  9. Drag & Drop “year” to “Horizontal Axis” and “GDP” to “Vertical Axis”
  10. Feel free to change visualization options such as Labels and Gridlines

Example 4. Visualizing SPARQL Queries

  1. From the Linda main page click Queries
  2. Find #62 “env_air_emis_eurostat”
  3. Right click on the dataset and select “Visualize”
  4. Press the Menu button in the left top corner to hide the menu bar
  5. Press “+” next to Columns and select “PM2_5” and “Observation_timePeriod”
  6. Press Visualize
  7. Switch between Bar Chart and Pie Chart and notice the automatic dimensions matching
  8. Select Bar Chart back
  9. Change the Vertical Label to “Air Pollution”
  10. Enter a number in range 5-20 in the Gridlines box and notice the scale changes


Description

The goal of the LinDA project is to make the benefits of Linked Open Data accessible to SMEs and data providers by providing an ecosystem of Linked Data consumption applications for visualizing, exploring, analyzing and publishing Linked Data as well as a data renovation framework for converting data into RDF format.

The role of LinDA Visualization is to provide a largely automatic visualization workflow that enables SMEs to visualize data in different formats and modalities. In order to achieve this, a generic web application is being developed based on state-of-the-art Linked Data approaches to allow for visualizing different categories of data, e.g. statistical, geographical, temporal, arbitrary data, and a largely automatic visualization workflow for matching and binding data to visualizations.

The steps of the visualization workflow as well as the UI components for configuring and consuming visualizations are introduced in Section 3.2. Moreover, Section 3.3 contains an overview of the underlying architecture and introduces the technologies used for building the visualization tool. Section 3 concludes with a brief outlook on future work and an overview of the current state of the implementation based on the requirements derived from the scenarios.

Visualization Workflow

In order to support the user in selecting and configuring visualizations, a workflow consisting of the following steps has been developed:

  1. Data selection: The user starts with exploring datasets and selecting the data she intends to visualize.
  2. Visualization selection: Based on the content and format of the data and the semantic descriptions of the available visualizations, a ranking of possible visualizations is computed and presented to the user (Figure 2 a).
  3. Visualization configuration: After choosing a visualization from the list of recommendations, the user proceeds to the configuration step. Here, she has to provide the input necessary in order to map the data to the chosen visualization (Figure 2 b). This is accomplished by browsing the dataset and dragging and dropping properties in the according area used for configuring a visualization dimensions, for instance horizontal and vertical axes or groups.
  4. Visualization consumption: Finally, after finishing with the configuration step, the user has the possibility to consume the visualization:
  • Change the layout of the visualization for instance by adding labels for axes or increasing the number of grid lines (Figure 2 c).
  • Export the visualization in different formats (Figure 2 e).
  • Publish the visualization on a website (Figure 2 g).
  • Save visualization configuration for later re-use (Figure 2 f).


Image vv1.png

Figure: LinDA Visualization Workflow UI consisting of the following steps: (a) Select visualization; (b) Configure visualization; (c) Customize visualization; (d) Visualize data; (e) Export visualization; (f) Save visualization configuration; (g) Embed visualization into a website

Architecture

LinDA Visualization is a part of the LinDA Ecosystem of Linked Data consumption applications, as illustrated in Figure 3, and consists of the following components:

  • A configuration component for coordinating the steps necessary for configuring and consuming a visualization of a given dataset in RDF or tabular format.
  • A recommendation component (under development) for determining the compatibility between the selected dataset and the available visualizations and suggesting a list of visualizations accordingly.
  • A library of visualization widgets that contains:
    • Configurable visualizations: Charts, maps, timelines
    • Arbitrary-data visualizations: Graphs, tables
  • A triplestore for storing metadata about:
    • Visualization widgets: Configuration options, input data format and thumbnails
    • Data sources: Location and format
    • Visualization configurations (for later reuse)
  • An interface to the Linda Linked Data API for slicing, searching and filtering RDF data sources (not developed yet).


Image vv2.png

Figure: LinDA Visualization in context of the LinDA Publication and Consumption Ecosystem


The visualization workflow UI is realized as a JavaScript client-side web-application based on the web application framework Ember.js. For the implementation of the JavaScript-based visualization widgets the following visualization libraries were used:

  • Charts: C3.js/D3.js and Google Charts API
  • Maps: Leaflet/OpenStreetMap
  • Tables: DataTables
  • Graphs: vis.js (not integrated yet)
  • Timeline: vis.js(not integrated yet)

The backend for the visualization workflow is realized in Node.js, a JavaScript framework used for developing server-side applications and consists of the recommendation component, data management and a proxy for communicating with SPARQL services such as Virtuoso Open Source triplestore.

The visualization workflow can be invoked from any other component within the LinDA workbench by specifying the name, location and format of the selected dataset in the visualization route (URI fragment).

Future Work

In the future we plan to: (1) improve the approach for automatically recommending visualizations and binding data of different formats and vocabularies, with the goal of providing SMEs with an intuitive way of configuring and visualizing Linked Data (2) integrate the LinDA Linked Data API in order to provide user-friendly previews and overviews in form of summaries of the selected RDF datasets and to reduce the amount of data by slicing, searching and filtering relevant data (3) fully integrate the developed ontologies for describing data sources and visualizations into the visualization workflow (4) improve the user experience by optimizing the configuration templates for selecting and exploring RDF data, and (5) conduct an extensive evaluation through a user study.