{"id":20712,"date":"2024-02-16T05:29:43","date_gmt":"2024-02-16T04:29:43","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=20712"},"modified":"2024-02-16T06:04:30","modified_gmt":"2024-02-16T05:04:30","slug":"gini-entropie-et-erreur","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/","title":{"rendered":"3 Measurements: Gini impurity, entropy and classification error"},"content":{"rendered":"<div data-elementor-type=\"wp-page\" data-elementor-id=\"20712\" class=\"elementor elementor-20712\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-19fdc32 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"19fdc32\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-b2d49fc\" data-id=\"b2d49fc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1ae617f elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"1ae617f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Data analysis<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-f96476a\" data-id=\"f96476a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7d1af2c elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"7d1af2c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Home page<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-6ec35d4\" data-id=\"6ec35d4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fa64005 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"fa64005\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/www.ibm.com\/docs\/fr\/cognos-analytics\/11.1.0?topic=terms-gini-impurity-measure\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-df700ef elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"df700ef\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a986a05\" data-id=\"a986a05\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4ad66dc elementor-widget elementor-widget-text-editor\" data-id=\"4ad66dc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Impurity (Gini) measure implements binary decision trees and the three impurity measures or splitting criteria commonly used in binary decision trees are Gini impurity (IG), entropy (IH) and classification error (IE).<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"gini\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-091eb7d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"091eb7d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cc8fab7\" data-id=\"cc8fab7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-86f82fb elementor-widget elementor-widget-heading\" data-id=\"86f82fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/#Impurete-de-Gini\" >Gini impurity<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/#Impurete-de-Gini-en-donnees-quantitatives\" >Gini impurity in quantitative data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/#Impurete-de-Gini-en-donnees-qualitative\" >Gini impurity in qualitative data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/#Entropie\" >Entropy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/#Erreur-de-classification-Misclassification-impurity\" >Classification error \/ Misclassification impurity<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Impurete-de-Gini\"><\/span>Gini impurity<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8755ab9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8755ab9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b89255e\" data-id=\"b89255e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a99afd8 elementor-widget elementor-widget-text-editor\" data-id=\"a99afd8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Used by the CART algorithm (<a href=\"https:\/\/complex-systems-ai.com\/en\/graph-theory-2\/trees-and-trees\/\">tree<\/a> classification and <a href=\"https:\/\/complex-systems-ai.com\/en\/correlation-and-regressions\/\">regression<\/a>) for classification trees, the Gini impurity is a measure of how often a randomly chosen item in the set would be incorrectly labeled if it were randomly labeled based on the distribution of labels in the sub-set. together.<\/p><p>Mathematically, we can write the Gini impurity as follows:<\/p><p><img decoding=\"async\" class=\"alignnone size-medium wp-image-20717\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre6-300x70.webp\" alt=\"Gini impurity\" width=\"300\" height=\"70\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre6-300x70.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre6-18x4.webp 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre6.webp 354w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p><p>where j is the number of classes present in the node and p is the class distribution in the node.<\/p><p>Simple simulation with a heart disease dataset consisting of 303 rows and 13 attributes. The target includes 138 0 values and 165 1 values<\/p><p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-20718 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre7.webp\" alt=\"gini impurity\" width=\"505\" height=\"117\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre7.webp 505w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre7-300x70.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre7-18x4.webp 18w\" sizes=\"(max-width: 505px) 100vw, 505px\" \/><\/p><p>In order to create a decision tree from the dataset and determine which separation is best, we need a way to measure and compare the impurity in each attribute. The lowest impurity value in the first iteration will be the root node. we can write equation 3 in the form:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20720 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre8.webp\" alt=\"gini impurity\" width=\"720\" height=\"55\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre8.webp 720w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre8-300x23.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre8-18x1.webp 18w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/p><p>In this simulation, use only the sex, fbs (fasting blood glucose), exang (exercise-induced angina), and target attributes.<\/p><p>How to measure impurity in the Sex attribute:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20721 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre9.webp\" alt=\"gini impurity\" width=\"523\" height=\"301\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre9.webp 523w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre9-300x173.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre9-18x10.webp 18w\" sizes=\"(max-width: 523px) 100vw, 523px\" \/><\/p><ul><li>Left knot = 0.29<\/li><li>Right knot = 0.49<\/li><\/ul><p>Now that we have measured the impurity for both leaf nodes. We can calculate the total impurity with the weight average. The left node represented 138 patients while the right node represented 165 patients.<\/p><p>Total Gini impurity \u2014 Leaf node<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20722 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre10.webp\" alt=\"gini impurity\" width=\"496\" height=\"252\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre10.webp 496w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre10-300x152.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre10-18x9.webp 18w\" sizes=\"(max-width: 496px) 100vw, 496px\" \/><\/p><p>We proceed in the same way with the other attributes:<\/p><ul><li>I_fbs_left = 0.268; right = 0.234; I_fbs = 0.249<\/li><li>I_exang_left =0.596; right = 0.234; I_Exang = 0.399<\/li><\/ul><p>Fbs (fasting blood sugar) has the lowest Gini impurity, so use it at the root node.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d420e28 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d420e28\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c970856\" data-id=\"c970856\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-06f5ecb elementor-widget elementor-widget-heading\" data-id=\"06f5ecb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Impurete-de-Gini-en-donnees-quantitatives\"><\/span>Gini impurity in quantitative data<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5585556 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5585556\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ed01d91\" data-id=\"ed01d91\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c071cd4 elementor-widget elementor-widget-text-editor\" data-id=\"c071cd4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Like weight which is one of the attributes to determine heart disease, for example we have the weight attribute:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-20723\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre11-300x291.webp\" alt=\"Quantitative Gini\" width=\"300\" height=\"291\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre11-300x291.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre11-12x12.webp 12w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre11.webp 321w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p><p>After ordering in ascending order, do the consecutive even-average and calculate the impurity for each value.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20724 size-large\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12-1024x292.webp\" alt=\"quantitative gini\" width=\"1024\" height=\"292\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12-1024x292.webp 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12-300x86.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12-768x219.webp 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12-18x5.webp 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre12.webp 1100w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p><p>The lowest Gini impurity is weight &lt; 205, this is the threshold value and the impurity value if used when we compare with another attribute.<\/p><p>For calculation reasons, it is also possible to do by quantile.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a5ebb91 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a5ebb91\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4e377d6\" data-id=\"4e377d6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f83c587 elementor-widget elementor-widget-heading\" data-id=\"f83c587\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Impurete-de-Gini-en-donnees-qualitative\"><\/span>Gini impurity in qualitative data<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-cd4c493 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"cd4c493\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3d3c0eb\" data-id=\"3d3c0eb\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8ffcdbd elementor-widget elementor-widget-text-editor\" data-id=\"8ffcdbd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>We have a preferred color attribute for determining a person&#039;s gender:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-20725\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre13-289x300.webp\" alt=\"qualitative gini\" width=\"289\" height=\"300\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre13-289x300.webp 289w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre13-12x12.webp 12w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre13.webp 300w\" sizes=\"(max-width: 289px) 100vw, 289px\" \/><\/p><p>In order to know how impurity this attribute is, calculate an impurity score for each (as a Boolean value) as well as each possible combination.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7592831 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7592831\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b681c38\" data-id=\"b681c38\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-957768c elementor-widget elementor-widget-heading\" data-id=\"957768c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Entropie\"><\/span>Entropy<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7eb9e4e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7eb9e4e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0d130dd\" data-id=\"0d130dd\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fc49e52 elementor-widget elementor-widget-text-editor\" data-id=\"fc49e52\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Used by the ID3, C4.5 and C5.0 tree generation algorithms. The information gain is based on the notion of entropy, the measure of entropy is defined as:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-20726\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre14-300x70.webp\" alt=\"entropy\" width=\"300\" height=\"70\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre14-300x70.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre14-18x4.webp 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre14.webp 315w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p><p>where j is the number of classes present in the node and p is the class distribution in the node.<\/p><p>Given the same case and data set, we need a way to measure and compare the entropy in each attribute. The highest entropy value in the first iteration will be the root node.<\/p><p>First we need to calculate the entropy in the Target attribute = 0.994.<\/p><p>We use the same split for sex:\u00a0<\/p><ul><li>sex = 0 has for entropy 0.666<\/li><li>sex = 1 has entropy 0.988<\/li><\/ul><p>Now that we have measured the entropy for the two leaf nodes. We take the average of the weights to calculate the total entropy value.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20727 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre15.webp\" alt=\"entropy decision tree\" width=\"720\" height=\"282\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre15.webp 720w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre15-300x118.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre15-18x7.webp 18w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/p><p>Entropy for column Fbs = 0.389<\/p><p>Entropy for column Exang = 0.224<\/p><p>Fbs (fasting blood sugar) has the highest entropy, so we will use it at the root node, exactly the same results we got with the Gini impurity.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-b8a58b8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b8a58b8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d67829e\" data-id=\"d67829e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-435e5a8 elementor-widget elementor-widget-heading\" data-id=\"435e5a8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Erreur-de-classification-Misclassification-impurity\"><\/span>Classification error \/ Misclassification impurity<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6b89df7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6b89df7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4836bd6\" data-id=\"4836bd6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5bafb57 elementor-widget elementor-widget-text-editor\" data-id=\"5bafb57\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Another measure of impurity is misclassification impurity or misclassification error. Mathematically, we can write the misclassification impurity as follows:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-20728\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre16-300x65.webp\" alt=\"classification impurity\" width=\"300\" height=\"65\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre16-300x65.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre16-18x4.webp 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre16.webp 348w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p><p>In terms of qualitative performance, this index is not the best choice because it is not particularly sensitive to different probability distributions (which can easily drive selection towards subdivision using Gini or entropy).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>","protected":false},"excerpt":{"rendered":"<p>Data Analysis Wiki Homepage The (Gini) impurity measure implements binary decision trees and the three impurity measures or criteria of\u2026 <\/p>","protected":false},"author":1,"featured_media":0,"parent":15503,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-20712","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20712","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/comments?post=20712"}],"version-history":[{"count":4,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20712\/revisions"}],"predecessor-version":[{"id":20731,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20712\/revisions\/20731"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15503"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/media?parent=20712"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}