JWE Abstracts 

Vol.16 No.7&8 December 1, 2017

 A Taxonomy of Web Effort Predictors (pp541-570)
Ricardo Britto, Muhammad Usman and Emilia Mendes
Web engineering as a field has emerged to address challenges associated with developing Web applications. It is known that the development of Web applications differs from the development of non-Web applications, specially regarding some aspects such as Web size metrics. The classification of existing Web engineering knowledge would be beneficial for both practitioners and researchers in many different ways, such as finding research gaps and supporting decision making. In the context of Web effort estimation, a taxonomy was proposed to classify the existing size metrics, and more recently a systematic literature review was conducted to identify aspects related to Web resource/effort estimation. However, there is no study that classifies Web predictors (both size metrics and cost drivers). The main objective of this study is to organize the body of knowledge on Web effort predictors by designing and using a taxonomy, aiming at supporting both research and practice in Web effort estimation. To design our taxonomy, we used a recently proposed taxonomy design method. As input, we used the results of a previously conducted systematic literature review (updated in this study), an existing taxonomy of Web size metrics and expert knowledge. We identified 165 unique Web effort predictors from a final set of 98 primary studies; they were used as one of the basis to design our hierarchical taxonomy. The taxonomy has three levels, organized into 13 categories. We demonstrated the utility of the taxonomy and body of knowledge by using examples. The proposed taxonomy can be beneficial in the following ways: i) It can help to identify research gaps and some literature of interest and ii) it can support the selection of predictors for Web effort estimation. We also intend to extend the taxonomy presented to also include effort estimation techniques and accuracy metrics.

An SMIL-Timesheets based temporal behavior model for the visual development of Web user interfaces (pp571-594)
M. Linaje, J.C. Preciado, and R. Rodriguez-Echeverria
Temporal behaviors are being incorporated into the user interfaces of Web applications making them look more and more like multimedia applications, the so-called Rich Internet Application (RIA) user interfaces. Due to RIA complexity, some research communities have proposed models to ease its development. However, there is a gap to cover between formal temporal relationships and the current state of the art in the RIA model-driven development techniques. The purpose of this paper is to specify a temporal behavioral model for data-intensive RIA user interfaces with three main objectives. The first one is that the model must be usable by non-experts in engineering specifications (e.g., Web designers). The second one is that the model must be suitable to be implemented in a CASE tool integrating temporal behaviors in the RIA model driven development workflow. The third one is that the temporal behaviors specified must run in current Web browsers. The approach here presented is based on SMIL Timesheets, a standard that can be used as a foundation to extend RIA user interface model driven proposals.

Service Recommendation Based on Separated Time-aware Collaborative Poisson Factorization (pp595-618)
Shuhui Chen, Yushun Fan, Wei Tan, Jia Zhang, Bing Bai, and Zhenfeng Gao
With the booming of web service ecosystems, finding suitable services and making service compositions have become an principal challenge for inexperienced developers. Therefore, recommending services based on service composition queries turns out to be a promising solution. Many recent studies apply Latent Dirichlet Allocation (LDA) to model the queries and services' description. However, limited by the restrictive assumption of the Dirichlet-Multinomial distribution assumption, LDA cannot generate high-quality latent presentation, thus the accuracy of recommendation isn't quite satisfactory. Based on our previous work, we propose a Separated Time-aware Collaborative Poisson Factorization (STCPF) to tackle the problem in this paper. STCPF takes Poisson Factorization as the foundation to model mashup queries and service descriptions separately, and incorporates them with the historical usage data together by using collective matrix factorization. Experiments on the real-world show that our model outperforms than the state-of-the-art methods (e.g., Time-aware collaborative domain regression) in terms of mean average precision, and costs much less time on the sparse but massive data from web service ecosystem.

A Metric Based Automatic Selection of Ontology Matchers Using Bootstrapped Patterns (pp691-652)
B. Sathiya, Geetha T V, and Vijayan Sugumaran
The ontology matching process has become a vital part of the (semantic) web, enabling interoperability among heterogeneous data. To enable interoperability, similar entity pairs across heterogeneous data are discovered using a static set of matchers consisting of linguistic, structural and/or instance matchers that discover similar entities. Numerous sets of matchers exist in the literature; however, none of the matcher sets are capable of achieving good results across all data. In addition, it is both tedious and painstaking for domain experts to select the best set of matchers for the given data to be matched. In this paper, we propose two bootstrapping-based approaches, Bottom-up and Top-down, to automatically select the best set of matchers for the given ontologies to be matched. The selection is processed, based on the characteristics of the ontologies which are quantified by a set of quality metrics. Two new structural quality metrics, the Concept External Structural Richness (CESR) and the Concept Internal Structural Richness (CISR), have also been proposed to better quantify the structural characteristics of the ontology. The best set of matchers is chosen using the sets of patterns learned through the proposed Bottom-up and Top-down bootstrapping approaches. The proposed metrics and the patterns constructed using these approaches are evaluated using the COMA matching tool with existing benchmark ontologies (Benchmark, Conference and Benchmark2 tracks of the OAEI 2011). The proposed Bottom-up based patterns, along with the two proposed quality metrics, achieved better effectiveness (F-measure) in selecting the best set of matchers in comparison with the static set of matching, supervised ML algorithms  and the existing automatic matching. Specifically, the proposed Bottom-up patterns achieve a 14.6% Average Gain/Task and a significant improvement of 129% in comparison with the existing KNN model’s Average Gain/Task.

Discover Semantic Topics in Patents within a Specific Domain (pp653-675)
Wen Ma, Xiangfeng Luo, Junyu Xuan, and Ruirong Xue
Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.

A Hybrid Approach for Automatic Mashup Tag Recommendation (pp676-692)
Min Shi, Jianxun Liu, and Dong Zhou
Tags have been extensively utilized to annotate Web services, which is beneficial to the management, classification and retrieval of Web service data. In the past, a plenty of work have been done on tag recommendation for Web services and their compositions (e.g. mashups). Most of them mainly exploit tag service matrix and textual content of Web services. In the real world, multiple relationships could be mined from the tagging systems, such as composition relationships between mashups and Application Programming Interfaces (APIs), and co-occurrence relationships between APIs. These auxiliary information could be utilized to enhance the current tag recommendation approaches, especially when the tag service matrix is sparse and in the absence of textual content of Web services. In this paper, we propose a hybrid approach for mashup tag recommendation. Our hybrid approach consists of two continuous processes: APIs selection and tags ranking. We first select the most important APIs of a new mashup based on a probabilistic topic model and a weighted PageRank algorithm. The topic model simultaneously incorporates the composition relationships between mashups and APIs as well as the annotation relationships between APIs and tags to elicit the latent topic information. Then, tags of chosen important APIs are recommended to this mashup. In this process, a tag filtering algorithm has been employed to further select the most relevant and prevalent tags. The experimental results on a real world dataset prove that our approach outperforms several state-of-the-art methods.

Back to JWE Online Front Page