A Review of Data Cleaning Methods for Web Information System

  • Jinlin Wang Harbin Institute of Technology
  • Xing Wang Harbin Institute of Technology
  • Yuchen Yang Harbin Institute of Technology
  • Hongli Zhang Harbin Institute of Technology
  • Binxing Fang Harbin Institute of Technology
Keywords: data cleaning, web information system, data quality rule, crowdsourcing, privacy preservation


Web information system (WIS) is frequently-used and indispensable in daily social life, which provides information services in many scenarios such as electronic commerce, communities, and edutainment. To improve the quality of data service, data cleaning plays an essential role in various WIS scenarios. In this paper, we present a survey of the state-of-the-art methods for data cleaning in WIS. According to the characteristics of data cleaning, we extract the key elements of WIS, such as interactive object, application scenario, and core technology to classify the existing works. Then, after elaborating and analyzing each category, we summarize the descriptions and challenges of data cleaning methods with key sub-elements such as data & user interaction, data quality rule, model, crowdsourcing, and privacy preservation. Finally, we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective.

Articles on Computers