180x Filetype PDF File size 1.45 MB Source: people.cmix.louisiana.edu
Data Mining: Concepts and Techniques —Chapter 2 — 2nd Edition, Han and Kamber [Note: Materials of this presentation are from Chapter 2, 2nd Edition of textbook, unless mentioned otherwise) Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber, All rights reserved February 19, 2008 Data Mining: Concepts and Techniques 1 February 19, 2008 Data Mining: Concepts and Techniques 2 Chapter 2: Data Preprocessing ◼ Why preprocess the data? ◼ Descriptive data summarization (Ch. 2.1, 3rdEdition, textbook) ◼ Data cleaning ◼ Data integration and transformation ◼ Data reduction ◼ Discretization and concept hierarchy generation ◼ Summary February 19, 2008 Data Mining: Concepts and Techniques 3 Why Data Preprocessing? ◼ Data in the real world is dirty ◼ incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data ◼ e.g., occupation=“ ” ◼ noisy: containing errors or outliers ◼ e.g., Salary=“-10” ◼ inconsistent: containing discrepancies in codes or names ◼ e.g., Age=“42” Birthday=“03/07/1997” ◼ e.g., Was rating “1,2,3”, now rating “A, B, C” ◼ e.g., discrepancy between duplicate records February 19, 2008 Data Mining: Concepts and Techniques 4
no reviews yet
Please Login to review.