A Review of Data Cleaning Methods for Web Information System
Web information system (WIS) is frequently-used and indispensable in daily social life, which provides information services in many scenarios such as electronic commerce, communities, and edutainment. To improve the quality of data service, data cleaning plays an essential role in various WIS scenarios. In this paper, we present a survey of the state-of-the-art methods for data cleaning in WIS. According to the characteristics of data cleaning, we extract the key elements of WIS, such as interactive object, application scenario, and core technology to classify the existing works. Then, after elaborating and analyzing each category, we summarize the descriptions and challenges of data cleaning methods with key sub-elements such as data & user interaction, data quality rule, model, crowdsourcing, and privacy preservation. Finally, we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles published by TSP are under an Open Access license, which means all articles published by TSP are accessible online free of charge and as free of technical and legal barriers to everyone. Published materials can be re-used if properly acknowledged and cited Open Access publication is supported by the authors' institutes or research funding agencies by payment of a comparatively low Article Processing Charge (APC) for accepted articles.