jagomart
digital resources
picture1_Data Preparation For Machine Learning Pdf 180628 | Data Preparation


 130x       Filetype PDF       File size 0.21 MB       Source: www.cvs.edu.in


File: Data Preparation For Machine Learning Pdf 180628 | Data Preparation
data preparation data cleansing feature engineering data preparation is the heart of data science it includes data cleansing and feature engineering domain knowledge is also very important to achieve good ...

icon picture PDF Filetype PDF | Posted on 30 Jan 2023 | 2 years ago
Partial capture of text on file.
                    Data Preparation = Data Cleansing + Feature Engineering 
                    Data Preparation is the heart of data science. It includes data cleansing and feature 
                    engineering. Domain knowledge is also very important to achieve good results. Data 
                    preparation cannot be fully automated; at least not in the beginning. Often this takes 60 
                    to 80 percent of the whole analytical pipeline. However, it’s a mandatory task to get the 
                    best accuracy from machine learning algorithms on your datasets. 
                    Data Cleansing puts data into the right shape and quality for analysis. It includes many 
                    functions, for example the following: 
                   Basics (select, filter, removal of duplicates, …) 
                   Sampling (balanced, stratified, ...) 
                   Data Partitioning (create training + validation + test data set, ...) 
                   Transformations (normalisation, standardisation, scaling, pivoting, ...) 
                   Binning (count-based, handling of missing values as its own group, …) 
                   Data Replacement (cutting, splitting, merging, ...) 
                   Weighting and Selection (attribute weighting, automatic optimization, ...) 
                   Attribute Generation (ID generation, ...) 
                   Imputation (replacement of missing observations by using statistical algorithms) 
                    Feature Engineering selects the right attributes to analyze. You use domain knowledge 
                    of the data to select or create attributes that make machine learning algorithms work. 
                    Feature Engineering process includes: 
                   Brainstorming or testing of features 
                   Feature selection 
                   Validation of how the features work with your model 
                   Improvement of features if needed 
                   Return to brainstorming / creation of more features until the work is done 
                    Note that feature engineering is already part of the modelling step to build an analytic 
                    model, but it also leverages data preparation features (such as extracting parts of a 
                    string). 
                    Both data cleansing and feature engineering are part of data preparation and 
                    fundamental to the application of machine learning and deep learning. Both are also 
                    difficult and time-consuming. 
                    Data preparation occurs in different phases of an analytics project: 
                   Data Preprocessing: Preparation of data directly after accessing it from a data source. 
                    Typically realized by a developer or data scientist for initial transformations, 
                    aggregations and data cleansing. This step is done before the interactive analysis of 
                    data begins. It is executed once. 
                   Data Wrangling: Preparation of data during the interactive data analysis and model 
                    building. Typically done by a data scientist or business analyst to change views on a 
                    dataset and for features engineering. This step iteratively changes the shape of a 
                    dataset until it works well for finding insights or building a good analytic model. 
                    The Need for Data Preprocessing and Data Wrangling 
                    Let’s take a look at the typical analytical pipeline when you build an analytic model: 
               1.  Data Access 
               2.  Data Preprocessing 
               3.  Exploratory Data Analysis (EDA) 
               4.  Model Building 
               5.  Model Validation 
               6.  Model Execution 
               7.  Deployment 
                    Step 2 focuses on data preprocessing before you build an analytic model, while data 
                    wrangling is used in step 3 and 4 to adjust data sets interactively while analyzing data 
                    and building a model. This is also called ‘data wrangling’. Note that these three steps 
                    (2,3 and 4) can include both data cleansing and feature engineering. 
                     
The words contained in this file might help you see if this file matches what you are looking for:

...Data preparation cleansing feature engineering is the heart of science it includes and domain knowledge also very important to achieve good results cannot be fully automated at least not in beginning often this takes percent whole analytical pipeline however s a mandatory task get best accuracy from machine learning algorithms on your datasets puts into right shape quality for analysis many functions example following basics select filter removal duplicates sampling balanced stratified partitioning create training validation test set transformations normalisation standardisation scaling pivoting binning count based handling missing values as its own group replacement cutting splitting merging weighting selection attribute automatic optimization generation id imputation observations by using statistical selects attributes analyze you use or that make work process brainstorming testing features how with model improvement if needed return creation more until done note already part modelli...

no reviews yet
Please Login to review.