JWE Abstracts 

Vol.16 No.5&6 September 1, 2017

Engineering the Web in the Big Data Era

 Editorial (pp361-362)
Philipp Cimiano, Flavius Frasincar, and Daniel Schwabe

Relaxation of Keyword Pattern Graphs on RDF Data
Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and Dimitri Theodoratos
One of the facets of the data explosion in recent years is the growing of the repositories of RDF Data on the Web. Keyword search is a popular technique for querying repositories of RDF graph data. Recently, a number of approaches leverage a structural summary of the graph data to address the typical keyword search related problems of: (a) identifying relevant results among a multitude of candidates, and (b) performance scalability. These approaches compute queries (pattern graphs) corresponding to alternative interpretations of the keyword query and the user selects one that matches her intention to be evaluated against the data. Though promising, these approaches suffer from a drawback: because summaries are approximate representations of the data, they might return empty answers or miss results which are relevant to the user intent. In this paper, we present a novel approach which combines the use of the structural summary and the user feedback with a relaxation technique for pattern graphs. We leverage pattern graph homomorphisms to define relaxed pattern graphs that are able to extract more results potentially of interest to the user. We introduce an operation on pattern graphs and we prove that it is complete, that is, it can produce all relaxed pattern graphs. To guarantee that the result pattern graphs are as close to the initial pattern graph as possible, we devise different metrics to measure the degree of relaxation of a pattern graph. We design an algorithm that computes relaxed pattern graphs with non-empty answers in relaxation order. To improve the successive computation of relaxed pattern graphs, we suggest subquery caching and multiquery optimization techniques adapted to the context of this computation. Finally, we run experiments on different real datasets which demonstrate the effectiveness of our ranking of relaxed pattern graphs, and the efficiency of our system and optimization techniques in computing relaxed pattern graphs and their answers.

Getting the Query Right for Crisis Informatics Design Issues for Web-Based Analysis Environments (pp399-432)
Mario Barrenechea, Sahar Jambi, Ahmet A. Aydin, Mazin Hakeem, and Ken M. Anderson
Web-based data analysis environments are powerful platforms for exploring large data sets. To ensure that these environments meet the needs of analysts, a human-centered design perspective is needed. Interfaces to these platforms should provide flexible search, support user-generated content, and enable collaboration. We report on our efforts to design and develop a web interface for a custom analytics platform---EPIC Analyze---which provides interactive search over large Twitter data sets collected during crisis events. We performed seven think-aloud sessions with researchers who regularly analyze crisis data sets and compiled their feedback. They identified a need for a ``big picture'' view of an event, flexible querying capabilities, and user-defined coding schemes. Adding these features allowed EPIC Analyze to meet the needs of these analysts and enable exploratory research on crisis data. In performing this work, we identified an opportunity to migrate the software architecture of EPIC Analyze to one based on microservices. We report on the lessons learned in performing this migration and the impact it had on EPIC Analyze's capabilities. We also reflect on the benefits a microservices approach can have on the design of data-intensive software systems like EPIC Analyze.

Architecting Liquid Software (pp433-470)
Andrea Gallidabino, Cesare Pautasso, Tommi Mikkonen, Kari Systa, Jari-Pekka Voutilainen, and Antoro Taivalsaari
The Liquid Software metaphor refers to software that can operate seamlessly across multiple devices owned by one or multiple users. Liquid software applications can take advantage of the computing, storage and communication resources available on all the devices owned by the user.  Liquid software applications can also dynamically migrate from one device to another, following the user's attention and usage context. The key design goal in Liquid Software development is to minimize the additional efforts arising from multiple device ownership (e.g., installation, synchronization and general maintenance of personal computers, smartphones, tablets, home and car displays, and wearable devices), while keeping the users in full control of their devices, applications and data. In this paper we present the design space for Liquid Software, categorizing and discussing the most important architectural dimensions and technical choices. We also provide an introduction and comparison of two frameworks implementing Liquid Software capabilities in the context of the World Wide Web.

A Semantic Framework for Sequential Decision Making (pp471-504)
Patrick Philipp, Maria Maleshkova, Achim Rettinger, and Darko Katic
Current developments in the medical domain, not unlike many other sectors, are marked by the growing digitalization of data, including patient records, study results, clinical guidelines or imagery. This trend creates the opportunity for the development of innovative decision support systems to assist physicians in making a diagnosis or preparing a treatment plan. Similar conditions hold for the Web, where massive amounts of raw text are to be processed and interpreted automatically, e.g. to eventually add new information to a knowledge base. To this end, complex tasks need to be solved, requiring one or more interpretation algorithms (e.g. image- or natural language processors) to be chosen and executed based on heterogeneous data. We, therefore, propose the first approach to a semantic framework for sequential decision making and develop the foundations of a Linked agent who executes interpretation algorithms available as Linked APIs \cite{speiser2011} on a data-driven, declarative basis \cite{stadtmueller2013} by integrating structured knowledge formalized in RDF and OWL, and having access to meta components for planning and learning from experience. We evaluate our framework based on automatically processing brain images, the ad-hoc combination of surgical phase recognition algorithms and experiential learning to optimally pipeline entity linking approaches.

Other Research Articles

Identifying the Influential bloggers: A modular approach based on Sentiment Analysis (pp505-523)
Umar Ishfaq, Hikmat Ullah Khan, and Khalid Iqbal
The social web provides an easy and quick medium for public communication and online social interactions. In the web log, short as a blog, the bloggers share their views in the form of creating and commenting on blog posts. The bloggers who influence other users in a blogging community are known as the influential bloggers. Identification of such influential bloggers has vast applications in advertising, online marketing and e-commerce. This paper investigates the problem of identifying influential bloggers and presents a model which consists of two modules: Activity and Recognition. The activity module takes into account a blogger’s activity and recognition module measures a blogger’s influence in his/her social community. The integration of activity and recognition modules identifies the active as well as influential bloggers. The proposed model, MIBSA (Model to find Influential Bloggers using Sentiment Analysis), takes into account the existing and novel features of sentiment expressed in content generated by a blogger. The model is evaluated against the existing standard models using the real world blogging data. The results confirm that sentiment expressed in blog content plays an important role in measuring a blogger’s influence and should be considered as a feature for finding the top influential bloggers in the blogosphere.

Web Access Mining through Dynamic Decision Trees with Markovian Features
Arpad Gellert
In this work we propose a hybrid web access prediction method consisting in a dynamic decision tree and different order Markov predictors as components. The predictions generated by the Markov chain components are used as features within the dynamic decision tree. Our goal is to use this hybrid technique in order to anticipate and prefetch the web pages and files accessed by the users through browsers, reducing thus the load times. We use a decision tree to select the most predictive features from a considered feature set and based on those selected features we generate predictions. In our application, the feature set includes the current link, the type of the current link as well as the predictions of different order Markov chains. The optimal configuration of the proposed hybrid technique provides an average web page prediction accuracy of 72.57%.

Back to JWE Online Front Page