At this post, we try to examine the need for the involvement of experienced data scientists within SMEs for the production of analytics that can lead to insights and efficient decision making. The inspiration for editing this blog post is given upon going through an article of Prof. Zettelmeyer with regards to “a leader’s guide to data analytics” [1] along with the collection of the first round of evaluation results based on the usage of the LinDA Workbench. Based on these results, many fruitful conversations have been arisen with regards to the need for an extra role within the SMEs, taking care of the specification and realization of the analysis and the production of meaningful analytic insights.

Based on the business workflow followed up to now (Business as Usual – BAU scenario), all the analysis that was taking place in terms of the involved SMEs at the LinDA project was mainly outsourced to third party data scientists. However, in the frame of the LinDA project, the SMEs realized that they have access to incredible amounts of data that they can easily explore and interlink through numerous SPARQL endpoints, transform and interlink them (including publicly and privately-owned information) and proceed to analysis on their own based on the usage of visualization and analytics tools. Actually, the SMEs realized that they are now able to measure every aspect of the available data sources in granular detail. However, the current end users, overwhelmed by this constant blizzard of metrics, remain hesitant to get involved in what they see as a technical process. A mentality shift has to be realized within the SMEs, with the involvement of managers as well as data management employees, highlighting the value of producing analytics that are meaningful with regards to their business objectives.

As stated by Prof. Zettelmeyer [1], a professor of marketing and faculty director of the program on data analytics at the Kellogg School, end users with a manager role should not view analytics as something that falls beyond their purview. It is claimed that “the most important skills in analytics are not technical skills, they’re thinking skills”. Managing well with analytics does not require a math genius or master of computer science; instead, it requires what Zettelmeyer calls “a working knowledge” of data science [1]. This means being able to separate good data from bad data, and knowing where precisely analytics can be of added value. A working knowledge of data science can help leaders turn analytics into genuine insight, while it can also save them from making decisions based on faulty assumptions.

Furthermore, it is noted that, too often, managers collect data without knowing how to use it. As stated in [1], “you have to think about the generation of data as a strategic imperative”. In other words, analytics have not to be considered as a separate business practice; it has to be integrated into the business plan itself. Whatever a company chooses to measure, the results will only be useful if the data collection is done with purpose.

Like all scientific inquiries, analytics within LinDA needs to start with a question or problem in mind. After exploring the data sources and finding a possible “scoop” within them, it is very important to think first what kind of insight is the user looking for? Is the data in hands enough or they should be enriched with extra information that is available somewhere else? Independently of the examined issue (e.g. whether the SME wants to see how the air pollution affects life quality indexes or if liberalization of the over the counter medicines in a specific country is a good decision for an interested customer), data collection has to match the specific business problem at hand. As stated in [1] “you can’t just hope that the data that gets incidentally created in the course of business is the kind of data that’s going to lead to breakthroughs”. While it is obvious that some kinds of data should be collected—for example, air pollution metrics or health revenues—data collection have to be designed with analytics in mind to ensure that the user has the appropriate metrics that he needs.

Another topic about data analysis has to do with the interpretation of the analytic results. As stated by Prof. Zettelmeyer [1], “there is a view out there that because analytics is based on data science, it somehow represents disembodied truth. Regrettably that is just wrong”. So how can the end user learn to distinguish between good and bad analytics? It is stated that “It all starts with understanding the data-generation process. You cannot judge the quality of the analytics if you don’t have a very clear idea of where the data came from” [1].

When an SME creates its own data, it is normal to have great control upon the quality of the data. But what happens when data come from public SPARQL endpoints or third party organizations that offer or sell concrete datasets? In this case there is a great obstacle to foresee the real quality of data. Some extra analytics and visualizations can highlight the poor quality of data and the existence of outliers that lead to high percentage of standard errors and low accuracy of the analytic results. Furthermore, the rating of the analyzed datasets can give valuable hints to other LinDA Workbench users in order to wisely select data sources that have already been tested and probably lead to interesting results.

Image Credit: O'Reilly (http://strata.oreilly.com/2013/04/why-why-why.html)

Prof. Zettelmeyer claims that most managers share a common behavioral bias: when results are presented as having been achieved through complicated data analytics, they tend to defer to the experts [1]. It is also stated that “There is a real danger in managers assuming that the analysis was done in a reasonable way. I think this makes it incredibly important for managers to have a sixth sense for what they can actually learn from data.” [1].  No one knows better the internal structure of the SME he is part of, so an external data scientist for sure may make hypothesis that instead of appropriately interpreting the data, introduce a set of deviations. In addition to making sure that data is generated with analytics in mind, SMEs should use their knowledge of the business to account for strange results. Prof. Zettelmeyer recommends asking the question: “Knowing what you know about your business, is there a plausible explanation for that result?” Analytics, after all, is not simply a matter of crunching numbers in a vacuum. Data scientists do not have all the domain expertise the SMEs have, and analytics is no substitute for understanding the business.

Finally, as denoted by Prof. Zettelmeyer, decision making in the business world is being revolutionized in the same way that healthcare is with the widespread adoption of “evidence-based medicine.” As big data and analytics bring about this revolution, managers with a working knowledge of data science will have an edge. Beyond being the gatekeepers of their own analytics, SMEs should ensure that this knowledge is shared across their organization—a disciplined, data-literate company is one that is likely to learn fast and add more value across the board. As stated in [1] “If we want big data and analytics to succeed, everyone needs to feel that they have a right to question established wisdom. There has to be a culture where you can’t get away with ‘thinking’ as opposed to ‘knowing.’”

Summarising the above points of view we could come up with a set of advices that lead the Linda end users to realise meaningful analysis without being necessarily data scientists. So how it really works?

  1. Have a point of interest;
  2. Ask a good and meaningful question;
  3. Have in mind how the ideal outcome of the question would look like;
  4. Understand the correlation and meaning of the metrics you’re dealing with;
  5. Have concrete answers to the questions (that after getting the analysis results, maybe will be enforced or rejected);
  6. Be aware of the basics of the supported algorithms so as to manipulate properly the datasources you can access. What kind of data each algorithm can support? What questions each algorithm usually tend to respond? (use the LindaWorkbench hints in order to get that basic training).

 

Maybe all that sounds simple but is not. Even though familiarisation has an extra cost for the end user, it is far better from having a really good hypothesis but being afraid of experimenting with the analytic tools or allocate the analysis tasks to persons who do not understand the metrics and what they really mean.

Taking into account the afore-mentioned perspectives, it could be claimed that the LinDA workbench and overall philosophy towards the production of linked data analytics contributes toward merging the worlds of “experienced data analysts” and “business decision makers”. Through the exploitation of linked data principles, the easy manipulation and maintenance of data and the realization of analysis via the LinDA tools, advanced insights can be produced with regards to the envisaged business objectives. Thus, a suggestion for SMEs would be to invest on their human capital and involve them in the data analytics production and interpretation process. In this case, the chance for producing results that are in accordance with the interests denoted prior to the analysis phase is increased, while in case where the analysis is considered too complex, the involvement of a data scientist for guidance purposes has also to be considered.

References:

[1] http://insight.kellogg.northwestern.edu/article/a-leaders-guide-to-data-analytics

Credits: