JWE Abstracts 

Vol.17 No.1&2 March 1, 2018

Research Articles

A Framework for Product Description Classification in E-commerce (pp001-027)
Damir Vandic, Flavius Frasincar, and Uzay Kaymak
We propose the Hierarchical Product Classification (HPC) framework for the purpose of classifying products using a hierarchical product taxonomy. The framework uses a classification system with multiple classification nodes, each residing on a different level of the taxonomy. The innovative part of the framework stems from the definition of classification recipes that can be used to construct high-quality classifier nodes, using the product descriptions in the most optimal way. These classifier recipes are specifically tailored for the e-commerce domain. The use of these classifier recipes enables flexible classifiers that adjust to the taxonomy depth-specific characteristics of product taxonomies. Furthermore, in order to gain insight into which components are required to perform high quality product classification, we evaluate several feature selection methods and classification techniques in the context of our framework. Based on 3000 product descriptions obtained from Amazon.com, HPC achieves an overall accuracy of 76.80% for product classification. Using 110 categories from CircuitCity.com and Amazon.com, we obtain a precision of 93.61% for mapping the categories to the taxonomy of shopping.com.

Text-Mining and Pattern-Matching based Prediction Models for Detecting Vulnerable Files in Web Applications (pp028-044)
Mukesh Kumar Gupta, Mahesh Chandra Govil, and Girdhari Singh
The proliferation of technology has empowered the web applications. At the same time, the presences of Cross-Site Scripting (XSS) vulnerabilities in web applications have become a major concern for all. Despite the many current detection and prevention approaches, attackers are exploiting XSS vulnerabilities continuously and causing significant harm to the web users. In this paper, we formulate the detection of XSS vulnerabilities as a prediction model based classification problem. A novel approach based on text-mining and pattern-matching techniques is proposed to extract a set of features from source code files. The extracted features are used to build prediction models, which can discriminate the vulnerable code files from the benign ones. The efficiency of the developed models is evaluated on a publicly available labeled dataset that contains 9408 PHP labeled (i.e. safe, unsafe) source code files. The experimental results depict the superiority of the proposed approach over existing ones.

A Quantitative Analysis of the Use of Microdata for Semantic Annotations on Educational Resources (pp045-072)
Rosa Del Carmen Mavarrete Rueda and Sergio Lujan
A current trend in the semantic web is the use of embedded markup formats aimed to semantically enrich web content by making it more understandable to search engines and other applications. The deployment of Microdata as a markup format has increased thanks to the widespread of a controlled vocabulary provided by Schema.org. Recently, a set of properties from the Learning Resource Metadata Initiative (LRMI) specification, which describes educational resources, was adopted by Schema.org. These properties, in addition to those related to accessibility and the license of resources included in Schema.org, would enable search engines to provide more relevant results in searching for educational resources for all users, including users with disabilities. In order to obtain a reliable evaluation of the use of Microdata properties related to the LRMI specification, accessibility, and the license of resources, this research conducted a quantitative analysis of the deployment of these properties in large-scale web corpora covering two consecutive years. The corpora contain hundreds of millions of web pages. The results further our understanding of this deployment in addition to highlighting the pending issues and challenges concerning the use of such properties.

Semantic Emotion-Topic Model Based Social Emotion Mining (pp073-092)
Ruirong Xue, Xiangfeng Luo, Qichen Ma, and Shengwei Gu
With the booming of social media users, more and more short texts with emotion labels appear, which contain users' rich emotions and opinions about social events or enterprise products. Social emotion mining on social media corpus can help government or enterprise make their decisions. Emotion mining models involve statistical-based and graph-based approaches. Among them, the former approaches are more popular, e.g. Latent Dirichlet Allocation (LDA)-based Emotion Topic Model. However, they are suffering from low retrieval performance, such as the bad accuracy and the poor interpretability, due to them only considering the bag-of-words or the emotion labels in social media corpus. In this paper, we propose a LDA-based Semantic Emotion-Topic Model (SETM) combining emotion labels and inter-word relations to enhance the retrieval performance of social emotion mining result. The performance influence of four factors on SETM are considered, i.e., association relations, computing time, topic number and semantic interpretability. Experimental results show that the accuracy of our proposed model is 0.750, compared with 0.606, 0.663 and 0.680 of Emotion Topic Model (ETM), Multi-label Supervised Topic Model (MSTM) and Sentiment Latent Topic Model (SLTM) respectively. Besides, the computing time of our model is reduced by 87.81% through limiting word frequency, and its accuracy is 0.703, compared with 0.501, 0.648 and 0.642 of the above baseline methods. Thus, the proposed model has broad prospects in social emotion mining area.

Unsupervised Keyword Extraction from Microblog Posts via Hashtags (pp093-120)
Lin Li, Jinghang Liu, Yueqing Sun, Guangdong Xu, Jingling Yuan and Luo Zhong
Nowadays, huge amounts of texts are being generated for social networking purposes on Web. Keyword extraction from such texts like microblog posts benefits many applications such as advertising, search, and content filtering. Unlike traditional web pages, a microblog post usually has some special social feature like a hashtag that is topical in nature and generated by users. Extracting keywords related to hashtags can reflect the intents of users and thus provides us better understanding on post content. In this paper, we propose a novel unsupervised keyword extraction approach for microblog posts by treating hashtags as topical indicators. Our approach consists of two hashtag enhanced algorithms. One is a topic model algorithm that infers topic distributions biased to hashtags on a collection of microblog posts. The words are ranked by their average topic probabilities. Our topic model algorithm can not only find the topics of a collection, but also extract hashtag-related keywords. The other is a random walk based algorithm. It first builds a word-post weighted graph by taking into account posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides the algorithm to extract keywords according to hashtag topics. Last, the final ranking score of a word is determined by the stationary probability after a number of iterations. We evaluate our proposed approach on a collection of real Chinese microblog posts. Experiments show that our approach is more effective in terms of precision than traditional approaches considering no hashtag. The result achieved by the combination of two algorithms performs even better than each individual algorithm.

A Graph Based Technique of Process Partitioning (pp121-140)
Gang Xue, Jing Liu, Liwen Wu, and Shaowen Yao
Web service is an important technology for constructing distributed applications. In order to provide more complex functionalities, services can be reused by applying service composition. A service composition can be designed and implemented through a centralization or decentralization strategy. When observing the decentralized service composition, several researchers found out that this kind of compositions has its own advantages. These findings promote the development of approaches for designing, implementing and applying decentralized service compositions. Process partitioning is a topic about dividing a process into a collection of small parts. The technique is applicable to partitioning a process in a centralized service composition, and the result can provide support to constructing a decentralized service composition. This paper presents a technique of process partitioning. The technique can be used for constructing decentralized service compositions, and it provides a graph transformation based approach to reorganizing a process which is represented as a process structure graph. Compared to existing approaches, the technique can partition well-structured and unstructured processes. Some issues about decentralized service compositions and performance tests of service compositions are discussed in this paper. Experimental results show that, when compared with the centralized service composition, the decentralized service composition can have lower average response time and higher throughput in runtime environment.

Back to JWE Online Front Page