{"id":20701,"date":"2024-02-16T05:09:08","date_gmt":"2024-02-16T04:09:08","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=20701"},"modified":"2024-02-16T05:28:55","modified_gmt":"2024-02-16T04:28:55","slug":"arbre-de-decision","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/","title":{"rendered":"Decision Tree Tutorial"},"content":{"rendered":"<div data-elementor-type=\"wp-page\" data-elementor-id=\"20701\" class=\"elementor elementor-20701\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5f00bf2 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5f00bf2\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-cb16b87\" data-id=\"cb16b87\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9b376d2 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"9b376d2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Data analysis<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-9598e7a\" data-id=\"9598e7a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-749c66b elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"749c66b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/en\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Home page<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-6f01196\" data-id=\"6f01196\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5a73d50 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"5a73d50\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/fr.wikipedia.org\/wiki\/Arbre_de_d%C3%A9cision\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-c977b54 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"c977b54\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6121993\" data-id=\"6121993\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-360ae26 elementor-widget elementor-widget-heading\" data-id=\"360ae26\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/#Arbre-de-decision-pour-lanalyse-des-donnees\" >Decision tree for data analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/#Algoritmes-usuels\" >Common algorithms<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/#Terminologie\" >Terminology<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/#Fonctionnement-general\" >General operation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/decision-tree\/#Fractionnement-splitting\" >Fractionation \/ splitting<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Arbre-de-decision-pour-lanalyse-des-donnees\"><\/span>Decision tree for data analysis<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-13766f1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"13766f1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9d787e4\" data-id=\"9d787e4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3c1222f elementor-widget elementor-widget-text-editor\" data-id=\"3c1222f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>a <a href=\"https:\/\/complex-systems-ai.com\/en\/graph-theory-2\/trees-and-trees\/\">tree<\/a> Decision Processing is a non-parametric supervised learning approach and can be applied to both regression and classification problems. Following the tree analogy, decision trees implement a sequential decision process. From the root node, a feature is evaluated and one of the two nodes (branches) is selected.\u00a0<\/p><p>Each node in the tree is essentially a decision rule. This procedure is repeated until a final leaf is reached, which normally represents the target. Decision trees are also attractive models if one cares about interpretability.<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"decision tree\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9eac338 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9eac338\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-57ff36d\" data-id=\"57ff36d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3a72527 elementor-widget elementor-widget-heading\" data-id=\"3a72527\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Algoritmes-usuels\"><\/span>Common algorithms<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a3fd587 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a3fd587\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9255132\" data-id=\"9255132\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f72862c elementor-widget elementor-widget-text-editor\" data-id=\"f72862c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There are algorithms for creating decision trees:<\/p><ul><li>ID3 (Iterative Dichotomiser 3) was developed in 1986 by Ross Quinlan. The algorithm creates a multidirectional tree, finding for each node (i.e., greedily) the categorical feature that will produce the greatest information gain for the categorical targets. Trees grow to their maximum size, then a pruning step is usually applied to improve the tree&#039;s ability to generalize to unseen data.<\/li><li>C4.5 was developed in 1993 by Ross Quinlan, is the successor to ID3 and removed the restriction that features must be categorical by dynamically defining a discrete attribute (based on numeric variables) that divides the attribute value continuous into a discrete set of intervals. C4.5 converts the trained trees (i.e., the output of the ID3 algorithm) into sets of if-then rules. The correctness of each rule is then evaluated to determine the order in which they should be applied. Pruning is performed by removing a rule&#039;s precondition if the rule&#039;s accuracy improves without it.<\/li><li>C5.0 is the latest proprietary licensed version of Quinlan. It uses less memory and creates smaller rule sets than C4.5 while being more precise.<\/li><li>CART (classification and regression trees) is very similar to C4.5, but it differs in that it supports numeric target variables (regression) and does not calculate rule sets. CART builds binary trees using the feature and threshold that generate the greatest information gain on each node.<\/li><\/ul><p>scikit-learn uses an optimized version of the CART algorithm<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-64c14ce elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"64c14ce\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-896b121\" data-id=\"896b121\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d48962d elementor-widget elementor-widget-heading\" data-id=\"d48962d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Terminologie\"><\/span>Terminology<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6fbd430 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6fbd430\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d157590\" data-id=\"d157590\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7a767aa elementor-widget elementor-widget-text-editor\" data-id=\"7a767aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In keeping with the tree analogy, the terminology was adopted from tree terminology.<\/p><p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-20703 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre1.webp\" alt=\"decision tree\" width=\"720\" height=\"373\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre1.webp 720w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre1-300x155.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre1-18x9.webp 18w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/p><ul><li>Root node: is the first node of decision trees<\/li><li>Splitting: is a process of dividing the node into two or more sub-nodes, starting with the root node.<\/li><li>Node: Split the results of the root node into sub-nodes and split the sub-nodes into other sub-nodes.<\/li><li>Leaf or terminal node: end of a node, since the node can no longer be divided<\/li><li>Pruning: is a technique to reduce the size of the decision tree by removing sub-nodes from the decision tree. The goal is to reduce complexity to improve predictive accuracy and avoid overfitting<\/li><li>Branch\/Sub-Tree: A sub-section of the entire tree is called a branch or sub-tree.<\/li><li>Parent and Child Node: A node divided into sub-nodes is called the parent node of the sub-nodes, while the sub-nodes are the child of the parent node.<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0dd341f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0dd341f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-36227bc\" data-id=\"36227bc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-670c0d2 elementor-widget elementor-widget-heading\" data-id=\"670c0d2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Fonctionnement-general\"><\/span>General operation<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3fe4fd6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3fe4fd6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-85bc54a\" data-id=\"85bc54a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-02f835f elementor-widget elementor-widget-text-editor\" data-id=\"02f835f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Consider the following example where a decision tree predicts a baseball player&#039;s salary:<\/p><p><img decoding=\"async\" class=\"alignnone wp-image-20705 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre2.webp\" alt=\"decision tree\" width=\"320\" height=\"331\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre2.webp 320w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre2-290x300.webp 290w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre2-12x12.webp 12w\" sizes=\"(max-width: 320px) 100vw, 320px\" \/><\/p><p>We use the Hitters dataset to predict a baseball player&#039;s salary (average salary) based on years (the number of years he played in the major leagues) and hits (the number of hits he had the previous year).<\/p><p>Based on the features, the decision tree model learns a series of splitting rules, starting from the top of the tree (root node).<\/p><ul><li>The root node is divided into sub-nodes with an observation rule having years &lt;4.5 towards the left branch, meaning players in the dataset with years &lt;4.5 having log salary average is 5.107 and we make a prediction of 5.107 thousand dollars, that is 165,174 $ for these players.<\/li><li>Players with years &gt;=4.5 are assigned to the right branch, then this group is subdivided into hits &lt;177.5 with an average salary loss of 6.<\/li><li>Players with years &gt;= 4.5 are assigned to the right branch, then this group is subdivided by Hits &gt;= 177.5 with an average salary loss of 6.74.<\/li><\/ul><p>In this case, we can see that the decision tree forms a three-region segment where this region determines the salaries of baseball players and we can say that the region is a decision boundary.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20706 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre3.webp\" alt=\"decision tree\" width=\"720\" height=\"474\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre3.webp 720w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre3-300x198.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre3-18x12.webp 18w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/p><p>These three regions can be written<\/p><ul><li>R1 ={X | Years&lt;4.5 }<\/li><li>R2 ={X | Years&gt;=4.5, Hits&lt;117.5 }<\/li><li>R3 ={X | Years&gt;=4.5, Clicks&gt;=117.5}.<\/li><\/ul><p>From this intuition, there is a process of splitting a decision tree to form a region capable of predicting the salary of baseball players.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5923524 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5923524\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-eb6fe21\" data-id=\"eb6fe21\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ab03d9d elementor-widget elementor-widget-heading\" data-id=\"ab03d9d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Fractionnement-splitting\"><\/span>Fractionation \/ splitting<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7433281 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7433281\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6e0f4d9\" data-id=\"6e0f4d9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d475926 elementor-widget elementor-widget-text-editor\" data-id=\"d475926\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In order to split the nodes at the most informative features using the decision algorithm, we start at the root of the tree and split the data on the feature that results in the greatest information gain (IG ). Here, the objective function is to maximize the information gain (IG) at each split, which we define as follows:<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20707 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre4.webp\" alt=\"splitting decision tree\" width=\"501\" height=\"96\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre4.webp 501w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre4-300x57.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre4-18x3.webp 18w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/p><p>f is the functionality to perform the division, Dp and Dj are the data set of the parent node, j-th child, I is our measure of<a href=\"https:\/\/complex-systems-ai.com\/en\/data-analysis\/gini-entropy-and-error\/\">impurity<\/a>, Np is the total number of samples at the parent node and Nj is the number of samples. in the j-th child node.<\/p><p>As we can see, the information gain is simply the difference between the impurity of the parent node and the sum of the impurities of the child node: the lower the impurity of the child nodes, the greater the information gain. However, for simplicity and to reduce the combinatorial search space, most libraries (including scikit-learn) implement binary decision trees. This means that each parent node is divided into two child nodes, D-left and D-right.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20708 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre5.webp\" alt=\"splitting decision tree\" width=\"652\" height=\"86\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre5.webp 652w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre5-300x40.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/arbre5-18x2.webp 18w\" sizes=\"(max-width: 652px) 100vw, 652px\" \/><\/p><p>Impurity measure implements binary decision trees and the three impurity measures or splitting criteria that are commonly used in binary decision trees are Gini impurity (IG), entropy (IH) and l classification error (IE).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>","protected":false},"excerpt":{"rendered":"<p>Data Analysis Wiki Homepage Decision Tree for Data Analysis A decision tree is a non-parametric supervised learning approach and... <\/p>","protected":false},"author":1,"featured_media":0,"parent":15503,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-20701","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20701","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/comments?post=20701"}],"version-history":[{"count":4,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20701\/revisions"}],"predecessor-version":[{"id":20711,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/20701\/revisions\/20711"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/pages\/15503"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/en\/wp-json\/wp\/v2\/media?parent=20701"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}