{"id":15816,"date":"2022-04-23T20:59:38","date_gmt":"2022-04-23T19:59:38","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=15816"},"modified":"2022-04-23T21:26:01","modified_gmt":"2022-04-23T20:26:01","slug":"normaliser-standardiser-redimensionner-vos-donnees","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/","title":{"rendered":"Normalize Standardize Resize your Data"},"content":{"rendered":"<div data-elementor-type=\"wp-page\" data-elementor-id=\"15816\" class=\"elementor elementor-15816\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ce3ccf6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ce3ccf6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-c4fdaf2\" data-id=\"c4fdaf2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-21ca9fe elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"21ca9fe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Data analysis<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-4516e69\" data-id=\"4516e69\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c473152 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"c473152\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Home page<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-2804615\" data-id=\"2804615\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bee4801 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"bee4801\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/en.wikipedia.org\/wiki\/Data_analysis\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4abebeb elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4abebeb\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8e7316d\" data-id=\"8e7316d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5ccc0d3 elementor-widget elementor-widget-text-editor\" data-id=\"5ccc0d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In order to be able to analyze your data and carry out any preprocessing or reduction processing, it is very important to properly normalize, standardize and resize your data. Here are the tutorials.<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"normalize standardize resize your data\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-1b65b9f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1b65b9f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-05d23e7\" data-id=\"05d23e7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-324d4e5 elementor-widget elementor-widget-heading\" data-id=\"324d4e5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Tutoriel-sur-normaliser-standardiser-et-redimensionner-vos-donnees\" >Tutorial on normalizing, standardizing and resizing your data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Pourquoi-le-faire\" >Why do it?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Standardisation\" >Standardization:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Normalisation\" >Standardization:<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Quand-le-faire\" >When to do it?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Standardisation-2\" >Standardization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Normalisation-2\" >Standardization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/normalize-standardize-resize-your-data\/#Mise-a-lechelle-robuste\" >Robust scaling<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Tutoriel-sur-normaliser-standardiser-et-redimensionner-vos-donnees\"><\/span>Tutorial on normalizing, standardizing and resizing your data<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-31a25eb elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"31a25eb\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-418e8b5\" data-id=\"418e8b5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c9bcf37 elementor-widget elementor-widget-text-editor\" data-id=\"c9bcf37\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Before we dive into this topic, let&#039;s start with some definitions.<\/p><p>\u201cRescaling\u201d a vector means adding or subtracting a constant, then multiplying or dividing by a constant, just as you would when changing data measurement units, for example, to convert a temperature from Celsius to Fahrenheit.<\/p><p>\u201cNormalizing\u201d a vector most often means dividing by a norm of the vector. It also often refers to rescaling by the minimum and range of the vector, so that all elements fall between 0 and 1, thus bringing all the values of the numeric columns of the dataset to scale common.<\/p><p>\u201cStandardizing\u201d a vector most often means subtracting a location measure and dividing by a scale measure. For example, if the vector contains random values with a Gaussian distribution, you can subtract the mean and divide by the standard deviation, thus obtaining a &quot;standard normal&quot; random variable with a mean of 0 and a standard deviation of 1.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-1b9bc2e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1b9bc2e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-084cd89\" data-id=\"084cd89\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-68d224f elementor-widget elementor-widget-heading\" data-id=\"68d224f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Pourquoi-le-faire\"><\/span>Why do it?<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a5af20b elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a5af20b\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-109a285\" data-id=\"109a285\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bb45941 elementor-widget elementor-widget-text-editor\" data-id=\"bb45941\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h3><span class=\"ez-toc-section\" id=\"Standardisation\"><\/span>Standardization:<span class=\"ez-toc-section-end\"><\/span><\/h3><p>Standardizing features around the center and 0 with a standard deviation of 1 is important when comparing measurements that have different units. Variables measured at different scales do not contribute equally to the analysis and can end up creating a bias.<\/p><p>For example, a variable between 0 and 1000 will outweigh a variable between 0 and 1. Using these variables without standardization will result in the variable with the weight of the widest range of 1000 in the analysis. Transforming the data to comparable scales can avoid this problem. Typical data normalization procedures equalize the range and\/or variability of the data.<\/p><h3><span class=\"ez-toc-section\" id=\"Normalisation\"><\/span>Standardization:<span class=\"ez-toc-section-end\"><\/span><\/h3><p>Similarly, the purpose of normalization is to change the values of the numeric columns of the dataset to a common scale, without distorting the differences in the ranges of values. For machine learning, each data set does not require normalization. It is required only when features have different ranges.<\/p><p>For example, consider a dataset containing two characteristics, age and income (x2). Where age ranges from 0 to 100, while income ranges from 0 to 100,000 and above. Income is about 1,000 times higher than age. Thus, these two characteristics are in very different ranges. When we do a deeper analysis, like the <a href=\"https:\/\/complex-systems-ai.com\/en\/correlation-and-regressions\/data-transformation-and-regression\/\">regression<\/a> linear multivariate, for example, the assigned income will inherently influence the outcome more due to its higher value. But that doesn&#039;t necessarily mean it&#039;s more important as a predictor. So we normalize the data to bring all the variables into the same range.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-551407a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"551407a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9c5dda4\" data-id=\"9c5dda4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5006b93 elementor-widget elementor-widget-heading\" data-id=\"5006b93\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Quand-le-faire\"><\/span>When to do it?<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ce3a55d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ce3a55d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3473571\" data-id=\"3473571\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d8490c4 elementor-widget elementor-widget-text-editor\" data-id=\"d8490c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Normalization is a good technique to use when you don&#039;t know the distribution of your data or when you know the distribution is not Gaussian (a bell curve). Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k nearest neighbors and networks of <a href=\"https:\/\/complex-systems-ai.com\/en\/neural-algorithms-2\/perceptron-en\/\">neurons<\/a> artificial.<\/p><p>Standardization assumes that your data has a Gaussian distribution (bell curve). This doesn&#039;t have to be true, but the technique is more efficient if your attribute distribution is Gaussian. Standardization is useful when your data has varying scales and the algorithm you are using makes assumptions about your data having a Gaussian distribution, such as linear regression, logistic regression, and linear discriminant analysis.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8e53005 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8e53005\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0d1c71f\" data-id=\"0d1c71f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-160c2b2 elementor-widget elementor-widget-heading\" data-id=\"160c2b2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Standardisation-2\"><\/span>Standardization<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4474026 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4474026\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c445d3c\" data-id=\"c445d3c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3d5ed97 elementor-widget elementor-widget-text-editor\" data-id=\"3d5ed97\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>As we saw earlier, standardization (or Z-score normalization) means centering the variable at zero and standardizing the variance at 1. The procedure involves subtracting the mean of each observation and then dividing by the standard deviation.<\/p><p>The result of the normalization is that the features will be scaled so that they have the properties of a standard normal distribution with<\/p><pre>\u03bc=0 and \u03c3=1<\/pre><p>where \u03bc is the mean (mean) and \u03c3 is the standard deviation from the mean.<\/p><p>StandardScaler from scikit-learn removes the mean and scales the data by unit variance. We can import the StandardScaler method from sci-kit learn and apply it to our dataset.<\/p><pre>from sklearn.preprocessing import StandardScaler scaler = StandardScaler() data_scaled = scaler.fit_transform(data)<\/pre><p>Now let&#039;s check the mean and standard deviation values.<\/p><pre>print(data_scaled.mean(axis=0)) print(data_scaled.std(axis=0))<\/pre><figure><img decoding=\"async\" class=\"aligncenter wp-image-15822 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_BghCKrhOG5LI4dvpv8LPBw.png\" alt=\"\" width=\"403\" height=\"45\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_BghCKrhOG5LI4dvpv8LPBw.png 403w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_BghCKrhOG5LI4dvpv8LPBw-300x33.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_BghCKrhOG5LI4dvpv8LPBw-18x2.png 18w\" sizes=\"(max-width: 403px) 100vw, 403px\" \/><\/figure><p>As expected, the mean of each variable is now around zero and the standard deviation is set to 1. So all the values of the variables are in the same range.<\/p><pre>print(&#039;Min values (Loan Amount, Int rate and Installment): &#039;, data_scaled.min(axis=0)) print(&#039;Max values (Loan Amount, Int rate and Installment): &#039;, data_scaled.max(axis=0 ))<\/pre><figure><img decoding=\"async\" class=\"aligncenter wp-image-15823 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw.png\" alt=\"\" width=\"714\" height=\"41\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw.png 714w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw-300x17.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw-18x1.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw-700x41.png 700w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CsnAGwpCvV74nAM1hoTyLw-600x34.png 600w\" sizes=\"(max-width: 714px) 100vw, 714px\" \/><\/figure><p>However, the minimum and maximum values vary depending on the initial spread of the variable and are strongly influenced by the presence of outliers.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-91ff185 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"91ff185\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-976275a\" data-id=\"976275a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e486c3c elementor-widget elementor-widget-heading\" data-id=\"e486c3c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Normalisation-2\"><\/span>Standardization<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-676d42f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"676d42f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a0a81e7\" data-id=\"a0a81e7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ee13b05 elementor-widget elementor-widget-text-editor\" data-id=\"ee13b05\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this approach, the data is scaled in a fixed range \u2014 typically 0 to 1.<\/p><p>Unlike normalization, the cost of having this range delimited is that we will end up with smaller standard deviations, which can remove the effect of outliers. Thus, MinMax Scalar is sensitive to outliers.<\/p><p>Min-Max scaling is usually done via the following equation:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-15824\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Dl3P3Rrzto258X0Ales9Xw.png\" alt=\"\" width=\"253\" height=\"119\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Dl3P3Rrzto258X0Ales9Xw.png 253w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Dl3P3Rrzto258X0Ales9Xw-18x8.png 18w\" sizes=\"(max-width: 253px) 100vw, 253px\" \/><\/p><p>Let&#039;s import MinMaxScalar from Scikit-learn and apply it to our dataset.<\/p><pre>from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() data_scaled = scaler.fit_transform(data)<\/pre><p>Now let&#039;s check the mean and standard deviation values.<\/p><pre>print(&#039;means (Loan Amount, Int rate and Installment): &#039;, data_scaled.mean(axis=0)) print(&#039;std (Loan Amount, Int rate and Installment): &#039;, data_scaled.std(axis=0))<\/pre><figure><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15825 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__mpW5DJn4PV-jQF20TmvzQ.png\" alt=\"\" width=\"658\" height=\"47\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__mpW5DJn4PV-jQF20TmvzQ.png 658w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__mpW5DJn4PV-jQF20TmvzQ-300x21.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__mpW5DJn4PV-jQF20TmvzQ-18x1.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__mpW5DJn4PV-jQF20TmvzQ-600x43.png 600w\" sizes=\"(max-width: 658px) 100vw, 658px\" \/><\/figure><p>After MinMaxScaling, the distributions are not centered on zero and the standard deviation is not 1.<\/p><pre>print(&#039;Min (Loan Amount, Int rate and Installment): &#039;, data_scaled.min(axis=0)) print(&#039;Max (Loan Amount, Int rate and Installment): &#039;, data_scaled.max(axis=0))<\/pre><figure><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15826 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Tl79_PxbpPvE-w1FXZX86g.png\" alt=\"\" width=\"518\" height=\"48\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Tl79_PxbpPvE-w1FXZX86g.png 518w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Tl79_PxbpPvE-w1FXZX86g-300x28.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Tl79_PxbpPvE-w1FXZX86g-18x2.png 18w\" sizes=\"(max-width: 518px) 100vw, 518px\" \/><\/figure><p>But the minimum and maximum values are normalized across the variables, different from what happens with standardization.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7363e3c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7363e3c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d16a83e\" data-id=\"d16a83e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0796c14 elementor-widget elementor-widget-heading\" data-id=\"0796c14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Mise-a-lechelle-robuste\"><\/span>Robust scaling<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-97285c6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"97285c6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5dc402d\" data-id=\"5dc402d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5378053 elementor-widget elementor-widget-text-editor\" data-id=\"5378053\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Scaling using the median and quantiles consists of subtracting the median from all observations and then dividing by the interquartile difference. It scales features using statistics that are resistant to outliers.<\/p><p>The interquartile difference is the difference between the 75th and the 25th quantile:<\/p><pre>IQR = 75th quantile \u2014 25th quantile<\/pre><p>The equation to calculate the scaled values:<\/p><pre>X_scaled = (X \u2014 X.median) \/ IQR<\/pre><p>First, import RobustScalar from Scikit learn.<\/p><pre>from sklearn.preprocessing import RobustScaler scaler = RobustScaler() data_scaled = scaler.fit_transform(data)<\/pre><p>Now check the mean and standard deviation values.<\/p><pre>print(&#039;means (Loan Amount, Int rate and Installment): &#039;, data_scaled.mean(axis=0)) print(&#039;std (Loan Amount, Int rate and Installment): &#039;, data_scaled.std(axis=0))<\/pre><figure><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15827 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_z11apSE2Nns7OX0Cp2Cq_A.png\" alt=\"\" width=\"668\" height=\"47\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_z11apSE2Nns7OX0Cp2Cq_A.png 668w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_z11apSE2Nns7OX0Cp2Cq_A-300x21.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_z11apSE2Nns7OX0Cp2Cq_A-18x1.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_z11apSE2Nns7OX0Cp2Cq_A-600x42.png 600w\" sizes=\"(max-width: 668px) 100vw, 668px\" \/><\/figure><p>As you can see, the distributions are not centered on zero and the standard deviation is not 1.<\/p><pre>print(&#039;Min (Loan Amount, Int rate and Installment): &#039;, data_scaled.min(axis=0)) print(&#039;Max (Loan Amount, Int rate and Installment): &#039;, data_scaled.max(axis=0))<\/pre><figure><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15828 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_YqdzBL92Ww8foaaSTry1Ow.png\" alt=\"\" width=\"677\" height=\"40\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_YqdzBL92Ww8foaaSTry1Ow.png 677w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_YqdzBL92Ww8foaaSTry1Ow-300x18.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_YqdzBL92Ww8foaaSTry1Ow-18x1.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_YqdzBL92Ww8foaaSTry1Ow-600x35.png 600w\" sizes=\"(max-width: 677px) 100vw, 677px\" \/><\/figure><p>The min and max values are also not set to some upper and lower bounds like in MinMaxScaler.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>","protected":false},"excerpt":{"rendered":"<p>Data analysis Wiki home page In order to be able to analyze your data and perform any pre-processing or reduction processing, it is very \u2026 <\/p>","protected":false},"author":1,"featured_media":0,"parent":15503,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-15816","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15816","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/comments?post=15816"}],"version-history":[{"count":3,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15816\/revisions"}],"predecessor-version":[{"id":15831,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15816\/revisions\/15831"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15503"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/media?parent=15816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}