JWE Abstracts 

Vol.14 No.5&6 November 1, 2015

Engineering the Web for Users, Developers and the Crowd

Editorial (pp361-362)
Sven Casteleyn, Gustavo Rossi, and Marco Winckler
Patterns in Eyetracking Scanpaths and the Affecting Factors
Sukru Eraslan and Yeliz Yesilada
Web pages are typically decorated with different kinds of visual elements that help sighted people complete their tasks. Unfortunately, people accessing web pages in constrained environments, such as visually disabled and small screen device users, cannot benefit from them. In our previous work, we show that tracking the eye movements of sighted users provide good understanding of how people use these visual elements. We also show that reengineering web pages by using these visual elements can improve people's experience in constrainted environments. However, in order to reengineer web pages based on eyetracking, we first need to aggregate, analyse and understand how a group of people's eyetracking data can be combined to create a common scanpath (namely, eye movement sequence) in terms of visual elements. This paper presents an algorithm that aims to achieve this. This algorithm was developed iteratively and experimentally evaluated with an eyetracking study. This study shows that the proposed algorithm is able to identify patterns in eyetracking scanpaths and it can work well with different number of participants. We then extended our experiments to investigate the effects of the task, gender and familiarity factors on common scanpaths. The results suggest that these factors can cause some differences in common scanpaths. This study also suggests that this algorithm can be improved by considering different techniques for pre-processing the data, by addressing the drawbacks of using the hierarchical structure and by taking into account the underlying cognitive processes.

From TMR to Turtle: Predicting Result Relevance from Mouse Cursor Interactions in Web Search (pp386-413)
Maximilian Speicher, Sebastian Nuck, Lars Wesemann, Andreas Both, and Martin Gaedke 
The prime aspect of quality for search-driven web applications is to provide users with the best possible results for a given query. Thus, it is necessary to predict the relevance of results a priori. Current solutions mostly engage clicks on results for respective predictions, but research has shown that it is highly beneficial to also consider additional features of user interaction. Nowadays, such interactions are produced in steadily growing amounts by internet users. Processing these amounts calls for streaming-based approaches and incrementally updatable relevance models. We present StreamMyRelevance! --- a novel streaming-based system for ensuring quality of ranking in search engines. Our approach provides a complete pipeline from collecting interactions in real-time to processing them incrementally on the server side. We conducted a large-scale evaluation with real-world data from the hotel search domain. Results show that our system yields predictions as good as those of competing state-of-the-art systems, but by design of the underlying framework at higher efficiency, robustness, and scalability. Additionally, our system has been transferred into a real-world industry context. A modified solution called Turtle has been integrated into a new search engine for general web search. To obtain high-quality judgments for learning relevance models, it has been augmented with a novel crowdsourcing tool.

Identifying Web Performance Degradations through Synthetic and Real-User Monitoring (pp414-442)
Jurgen Cito, Devan Gotowka, Philipp Leitner, Ryan Pelette, Dritan Suljoti, and Schahram Dustdar
The large scale of the Internet has offered unique economic opportunities, that in turn introduce overwhelming challenges for development and operations to provide reliable and fast services in order to meet the high demands on the performance of online services. In this paper, we investigate how performance engineers can identify three different classes of externally-visible performance problems (global delays, partial delays, periodic delays) from concrete traces. We develop a simulation model based on a taxonomy of root causes in server performance degradation. Within an experimental setup, we obtain results through synthetic monitoring of a target Web service, and observe changes in Web performance over time through exploratory visual analysis and changepoint detection. We extend our analysis and apply our methods to real-user monitoring (RUM) data. In a use case study, we discuss how our underlying model can be applied to real performance data gathered from a multinational, high-traffic website in the financial sector. Finally, we interpret our findings and discuss various challenges and pitfalls.

Designing Complex Crowdsourcing Applications Covering Multiple Platforms and Tasks (pp443-473)
Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, and Riccardo Volonterio

A number of emerging crowd-based applications cover very different scenarios, including opinion mining, multimedia data annotation, localised information gathering, marketing campaigns, expert response gathering, and so on. In most of these scenarios, applications can be decomposed into tasks that collectively produce their results; tasks interactions give rise to arbitrarily complex workflows. In this paper we propose methods and tools for designing crowd-based workflows as interacting tasks. We describe the modelling concepts that are useful in this framework, including typical workflow patterns, whose function is to decompose a cognitively complex task into simple interacting tasks for cooperative solving. We then discuss how workflows and patterns are managed by CrowdSearcher, a system for designing, deploying and monitoring applications on top of crowd-based systems, including social networks and crowdsourcing platforms. Tasks performed by humans consist of simple operations which apply to homogeneous objects; the complexity of aggregating and interpreting task results is embodied within the framework. We show our approach at work on a validation scenario and we report quantitative findings, which highlight the effect of workflow design on the final results.

 Other Research Articles

Web Browsing Automation for Applications Quality Control (pp474-502)
Boni Garcia and Juan Carlos Duenas
Quality control comprises the set of activities aimed to evaluate that software meets its specification and delivers the functionality expected by the consumers. These activities are often removed in the development process and, as a result, the final software product usually lacks quality.
We propose a set of techniques to automate the quality control for web applications from the client-side, guiding the process by functional and non-functional requirements (performance, security, compatibility, usability and accessibility).
The first step to achieve automation is to define the structure of the web navigation. Existing software artifacts in the phase of analysis and design are reused. Then, the independent paths of navigation are found, and each path is traversed automatically using real browsers while different kinds of assessments are carried out.
The processes and methods proposed in this paper have been implemented by means of a reference architecture and open source tools. A laboratory experiment and an industrial case study have been performed in order to validate the proposal.
The definition of navigation paths is a rich approach to model web applications. Grey-box (black-box and white-box) methods have been proved to be very valuable for web assessment. The Chinese Postman Problem (CPP) is an optimal way to find the independent paths in a web navigation modeled as a directed graph.

Modified PageRank for Concept Based Search (pp503-524)
G. Pavai. E. Umamaheswari, and T.V. Geetha 
Traditional PageRank algorithm computes the weight for each hyper-linked document, which indicates the importance of a page, based on the in-links and out-links. This is an off-line and query independent process which suits a keyword based search strategy. However, owing to the problems like polynymy, synonymy etc.., existing in keyword based search, new methodologies for search like concept based search, semantic web based search etc., have been developed. Concept based search engines generally go in for content based ranking by imparting semantics to the web pages. While this approach is better than the keyword based ranking strategies, they do not consider the physical link structure between documents which is the basis of the successful PageRank algorithm. Hence, we made an attempt to combine the power of link structures with content information to suit the concept based search engines. Our main contribution includes, two modifications to the traditional PageRank Algorithm, both specifically to cater to the concept based search engines. Inspired by the topic sensitive PageRank algorithm, we have multiple PageRanks for a document, rather than just one for each document, as given in the traditional implementation of the PageRank algorithm. We have compared our methodologies with an existing concept based search engine’s ranking methodology, and found that our modifications considerably improve the ranking of the conceptual search results. Furthermore, we performed statistical significance test and found out that our Version-2 modification to the PageRank algorithm is statistically significant in its P@5 performance compared to the baseline.

Bayesian Based Type Discrimination of Web Events (pp525-544)
Qichen Ma, Xiangfeng Luo, Junyu Xuan, and Huimin Liu
There are a large number of web events emerging on the web and attracting people’s attention every day, and it is of great interest and significance to distinguish the different types of these web events in practice. For example, the distinguished emergent web events should be paid more attentions by the departments of the government to save lives and damages or by news websites to increase their hit-rates using limited resources. However, how to efficiently distinguish the types of web events remains a challenge issue due to the seldom efforts paid to this issue in the community. In this paper, we conduct a thorough consideration on this problem and then propose an innovative Bayesian-based model to distinguish the different types of web events. To be specific, all web events are firstly assumed within three types whose formal definitions are given by considering their properties. Aiming to sufficiently describe and distinguish three types web events, a set of specially designed features are then extracted from the volume and the content of web events. Finally, a Bayesian-based model is proposed based on the designed features. The experimental results demonstrate the capability of the proposed model to distinguish types of web events, and the comparisons with other state-of-the-art classifiers also show the efficiency of the proposed model.

Back to JWE Online Front Page