Data Mining Textbook Pdf 91366

361x Filetype PDF File size 1.45 MB Source: people.cmix.louisiana.edu

File: Data Mining Textbook Pdf 91366 | Chapter 02

data mining concepts and techniques chapter 2 2nd edition han and kamber note materials of this presentation are from chapter 2 2nd edition of textbook unless mentioned otherwise jiawei han ...

Filetype PDF | Posted on 16 Sep 2022 | 4 years ago

Download

Partial capture of text on file.

                     Data Mining:
           Concepts and Techniques
                           —Chapter 2 —
                        2nd Edition, Han and Kamber
      [Note: Materials of this presentation are from Chapter 2, 2nd Edition of textbook, 
                           unless mentioned otherwise)
                                 Jiawei Han
                    Department of Computer Science 
               University of Illinois at Urbana-Champaign
                         www.cs.uiuc.edu/~hanj
          ©2006 Jiawei Han and Micheline Kamber, All rights reserved
   February 19, 2008         Data Mining: Concepts and Techniques             1
  February 19, 2008   Data Mining: Concepts and Techniques 2
                Chapter 2: Data Preprocessing
      ◼ Why preprocess the data?
      ◼ Descriptive data summarization (Ch. 2.1, 3rdEdition, textbook)
      ◼ Data cleaning 
      ◼ Data integration and transformation
      ◼ Data reduction
      ◼ Discretization and concept hierarchy generation
      ◼ Summary
   February 19, 2008        Data Mining: Concepts and Techniques           3
                Why Data Preprocessing?
     ◼ Data in the real world is dirty
        ◼ incomplete: lacking attribute values, lacking 
           certain attributes of interest, or containing 
           only aggregate data
            ◼ e.g., occupation=“ ”
        ◼ noisy: containing errors or outliers
            ◼ e.g., Salary=“-10”
        ◼ inconsistent: containing discrepancies in codes 
           or names
            ◼ e.g., Age=“42” Birthday=“03/07/1997”
            ◼ e.g., Was rating “1,2,3”, now rating “A, B, C”
            ◼ e.g., discrepancy between duplicate records
  February 19, 2008       Data Mining: Concepts and Techniques        4

The words contained in this file might help you see if this file matches what you are looking for:

...Data mining concepts and techniques chapter nd edition han kamber note materials of this presentation are from textbook unless mentioned otherwise jiawei department computer science university illinois at urbana champaign www cs uiuc edu hanj micheline all rights reserved february preprocessing why preprocess the descriptive summarization ch rdedition cleaning integration transformation reduction discretization concept hierarchy generation summary in real world is dirty incomplete lacking attribute values certain attributes interest or containing only aggregate e g occupation noisy errors outliers salary inconsistent discrepancies codes names age birthday was rating now a b c discrepancy between duplicate records...

Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area