{"id":15958,"date":"2022-04-24T16:01:24","date_gmt":"2022-04-24T15:01:24","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=15958"},"modified":"2022-11-27T21:16:16","modified_gmt":"2022-11-27T20:16:16","slug":"nettoyage-des-donnees","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/","title":{"rendered":"Limpieza de datos"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"15958\" class=\"elementor elementor-15958\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4283fe8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4283fe8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-ff4c64a\" data-id=\"ff4c64a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c8593c2 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"c8593c2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/analyse-des-donnees\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Analyse des donn\u00e9es<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-bf45ea0\" data-id=\"bf45ea0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5a228cf elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"5a228cf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Page d'accueil<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-1a9180b\" data-id=\"1a9180b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ca6a77f elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"ca6a77f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/en.wikipedia.org\/wiki\/Data_analysis\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0e4b3f6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0e4b3f6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7282921\" data-id=\"7282921\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1a0804c elementor-widget elementor-widget-text-editor\" data-id=\"1a0804c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La s\u00e9lection des fonctionnalit\u00e9s, le processus de recherche et de s\u00e9lection des fonctionnalit\u00e9s les plus utiles dans un ensemble de donn\u00e9es, est une \u00e9tape cruciale du pipeline d&rsquo;apprentissage automatique. Les fonctionnalit\u00e9s inutiles diminuent la vitesse d&rsquo;apprentissage, diminuent l&rsquo;interpr\u00e9tabilit\u00e9 du mod\u00e8le et, surtout, diminuent les performances de g\u00e9n\u00e9ralisation sur l&rsquo;ensemble de test. L&rsquo;objectif est donc le nettoyage des donn\u00e9es.<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"nettoyage des donn\u00e9es\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-553da3b elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"553da3b\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6560f7a\" data-id=\"6560f7a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a2e7592 elementor-widget elementor-widget-heading\" data-id=\"a2e7592\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contenus<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Alternar tabla de contenidos\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Pipeline-pour-le-nettoyage-des-donnees\" >Pipeline pour le nettoyage des donn\u00e9es<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Valeurs-manquantes\" >Valeurs manquantes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Colonnes-colineaires\" >Colonnes colin\u00e9aires<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Colonnes-a-importance-zero\" >Colonnes \u00e0 importance z\u00e9ro<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Colonnes-a-peu-dimportance\" >Colonnes \u00e0 peu d'importance<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Colonnes-a-valeur-unique\" >Colonnes \u00e0 valeur unique<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Retirer-les-colonnes\" >Retirer les colonnes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/#Pipeline-du-nettoyage-des-donnees\" >Pipeline du nettoyage des donn\u00e9es<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Pipeline-pour-le-nettoyage-des-donnees\"><\/span>Pipeline pour le nettoyage des donn\u00e9es<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f312f99 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f312f99\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ef4a630\" data-id=\"ef4a630\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-953ce91 elementor-widget elementor-widget-text-editor\" data-id=\"953ce91\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Le FeatureSelector inclut certaines des m\u00e9thodes de s\u00e9lection de fonctionnalit\u00e9s les plus courantes\u00a0:<\/p><ul><li>Fonctionnalit\u00e9s avec un pourcentage \u00e9lev\u00e9 de valeurs manquantes<\/li><li>Caract\u00e9ristiques colin\u00e9aires (hautement corr\u00e9l\u00e9es)<\/li><li>Fonctionnalit\u00e9s sans importance dans un mod\u00e8le arborescent<\/li><li>Fonctionnalit\u00e9s de faible importance<\/li><li>Fonctionnalit\u00e9s avec une seule valeur unique<\/li><\/ul><p>Dans cet article, nous allons parcourir l&rsquo;utilisation de FeatureSelector sur un exemple d&rsquo;ensemble de donn\u00e9es d&rsquo;apprentissage automatique. Nous verrons comment cela nous permet de mettre en \u0153uvre rapidement ces m\u00e9thodes, permettant un flux de travail plus efficace.<\/p><p>Le s\u00e9lecteur de fonctionnalit\u00e9s propose cinq m\u00e9thodes pour rechercher les fonctionnalit\u00e9s \u00e0 supprimer. Nous pouvons acc\u00e9der \u00e0 toutes les fonctionnalit\u00e9s identifi\u00e9es et les supprimer manuellement des donn\u00e9es, ou utiliser la fonction de suppression dans le s\u00e9lecteur de fonctionnalit\u00e9s.<\/p><p>Ici, nous passerons en revue chacune des m\u00e9thodes d&rsquo;identification et montrerons \u00e9galement comment les 5 peuvent \u00eatre ex\u00e9cut\u00e9es en m\u00eame temps. Le FeatureSelector dispose en outre de plusieurs capacit\u00e9s de tra\u00e7age, car l&rsquo;inspection visuelle des donn\u00e9es est un \u00e9l\u00e9ment crucial de l&rsquo;apprentissage automatique.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-789de81 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"789de81\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-02b2553\" data-id=\"02b2553\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-50450e5 elementor-widget elementor-widget-heading\" data-id=\"50450e5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Valeurs-manquantes\"><\/span>Valeurs manquantes<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f74c4dc elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f74c4dc\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4eeb19f\" data-id=\"4eeb19f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f4a8509 elementor-widget elementor-widget-text-editor\" data-id=\"f4a8509\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6135\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">La premi\u00e8re m\u00e9thode pour rechercher des entit\u00e9s \u00e0 supprimer est simple\u00a0: recherchez des entit\u00e9s avec une fraction de valeurs manquantes au-dessus d&rsquo;un seuil sp\u00e9cifi\u00e9. L&rsquo;appel ci-dessous identifie les caract\u00e9ristiques avec plus de 60\u00a0% de valeurs manquantes (le gras est la sortie).<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"14c9\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.identify_missing(missing_threshold = 0.6)<\/span><span id=\"77e3\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">17 features with greater than 0.60 missing values.<\/strong><\/span><\/pre><p id=\"c659\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Nous pouvons voir la fraction de valeurs manquantes dans chaque colonne d&rsquo;un dataframe\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"5026\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.missing_stats.head()<\/span><\/pre><p id=\"f3e6\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Pour voir les fonctionnalit\u00e9s identifi\u00e9es pour suppression, nous acc\u00e9dons \u00e0 l&rsquo;attribut\u00a0ops\u00a0du\u00a0FeatureSelector\u00a0, un dict Python avec des fonctionnalit\u00e9s sous forme de listes dans les valeurs.<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"cdef\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">missing_features = fs.ops['missing']<br \/>missing_features[:5]<\/span><span id=\"c942\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">['OWN_CAR_AGE',<br \/> 'YEARS_BUILD_AVG',<br \/> 'COMMONAREA_AVG',<br \/> 'FLOORSMIN_AVG',<br \/> 'LIVINGAPARTMENTS_AVG']<\/strong><\/span><\/pre><p id=\"60ca\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Enfin, nous avons un graphique de la distribution des valeurs manquantes dans toutes les fonctionnalit\u00e9s\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"4f98\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.plot_missing()<\/span><\/pre><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"gn go pa\"><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-15962 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_0WBIKN83twXyWfyx9LG7Qg.png\" alt=\"\" width=\"606\" height=\"462\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_0WBIKN83twXyWfyx9LG7Qg.png 606w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_0WBIKN83twXyWfyx9LG7Qg-300x229.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_0WBIKN83twXyWfyx9LG7Qg-16x12.png 16w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_0WBIKN83twXyWfyx9LG7Qg-600x457.png 600w\" sizes=\"(max-width: 606px) 100vw, 606px\" \/><\/div><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f8ac31b elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f8ac31b\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8e12f9a\" data-id=\"8e12f9a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1b82e6d elementor-widget elementor-widget-heading\" data-id=\"1b82e6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Colonnes-colineaires\"><\/span>Colonnes colin\u00e9aires<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-dd6d86b elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dd6d86b\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-fdaef17\" data-id=\"fdaef17\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ca49e9d elementor-widget elementor-widget-text-editor\" data-id=\"ca49e9d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2fd5\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">Les caract\u00e9ristiques colin\u00e9aires\u00a0sont des caract\u00e9ristiques fortement corr\u00e9l\u00e9es les unes aux autres. Dans l&rsquo;apprentissage automatique, ceux-ci entra\u00eenent une diminution des performances de g\u00e9n\u00e9ralisation sur l&rsquo;ensemble de test en raison d&rsquo;une variance \u00e9lev\u00e9e et d&rsquo;une moindre interpr\u00e9tabilit\u00e9 du mod\u00e8le.<\/p><p id=\"1fda\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">La m\u00e9thode\u00a0identify_collinear\u00a0trouve les caract\u00e9ristiques colin\u00e9aires en fonction d&rsquo;une\u00a0valeur de coefficient de <a href=\"https:\/\/complex-systems-ai.com\/es\/correlacion-y-regresiones\/\">corr\u00e9lation<\/a>\u00a0sp\u00e9cifi\u00e9e. Pour chaque paire de caract\u00e9ristiques corr\u00e9l\u00e9es, il identifie l&rsquo;une des caract\u00e9ristiques \u00e0 supprimer (puisque nous n&rsquo;avons besoin d&rsquo;en supprimer qu&rsquo;une)\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"1161\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.identify_collinear(correlation_threshold = 0.98)<\/span><span id=\"dbbe\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">21 features with a correlation magnitude greater than 0.98.<\/strong><\/span><\/pre><p id=\"9be5\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Une visualisation soign\u00e9e que nous pouvons faire avec des corr\u00e9lations est une carte thermique. Cela montre toutes les caract\u00e9ristiques qui ont au moins une corr\u00e9lation au-dessus du seuil\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"8df4\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.plot_collinear()<\/span><\/pre><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><div class=\"gn go pb\"><img decoding=\"async\" class=\"aligncenter wp-image-15963 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg.png\" alt=\"\" width=\"908\" height=\"843\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg.png 908w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg-300x279.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg-768x713.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg-13x12.png 13w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__gK6g3YWylcgfL5Bz8JMUg-600x557.png 600w\" sizes=\"(max-width: 908px) 100vw, 908px\" \/><\/div><\/div><\/figure><p id=\"1249\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Comme auparavant, nous pouvons acc\u00e9der \u00e0 la liste compl\u00e8te des fonctionnalit\u00e9s corr\u00e9l\u00e9es qui seront supprim\u00e9es, ou voir les paires de fonctionnalit\u00e9s hautement corr\u00e9l\u00e9es dans une base de donn\u00e9es.<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"9726\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\"># list of collinear features to remove<br \/>collinear_features = fs.ops['collinear']<\/span><span id=\"a8ee\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"># dataframe of collinear features<br \/>fs.record_collinear.head()<\/span><\/pre><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"gn go pc\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15964 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_unCzyN2BgucGodbioUz-Kw.png\" alt=\"\" width=\"401\" height=\"176\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_unCzyN2BgucGodbioUz-Kw.png 401w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_unCzyN2BgucGodbioUz-Kw-300x132.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_unCzyN2BgucGodbioUz-Kw-18x8.png 18w\" sizes=\"(max-width: 401px) 100vw, 401px\" \/><\/div><\/figure><p id=\"621c\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Si nous voulons \u00e9tudier notre ensemble de donn\u00e9es, nous pouvons \u00e9galement cr\u00e9er un graphique de toutes les corr\u00e9lations dans les donn\u00e9es en transmettant\u00a0plot_all\u00a0=\u00a0True\u00a0\u00e0 l&rsquo;appel\u00a0:<\/p><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><div class=\"gn go pd\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15965 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fcLsRYskgzWxVoxj4npfvg.png\" alt=\"\" width=\"758\" height=\"676\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fcLsRYskgzWxVoxj4npfvg.png 758w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fcLsRYskgzWxVoxj4npfvg-300x268.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fcLsRYskgzWxVoxj4npfvg-13x12.png 13w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fcLsRYskgzWxVoxj4npfvg-600x535.png 600w\" sizes=\"(max-width: 758px) 100vw, 758px\" \/><\/div><\/div><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-c0ed418 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"c0ed418\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f6ea7a2\" data-id=\"f6ea7a2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-85b10b6 elementor-widget elementor-widget-heading\" data-id=\"85b10b6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Colonnes-a-importance-zero\"><\/span>Colonnes \u00e0 importance z\u00e9ro<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-2b286bb elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2b286bb\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2d71a7a\" data-id=\"2d71a7a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-828d830 elementor-widget elementor-widget-text-editor\" data-id=\"828d830\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"aa73\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">Les deux m\u00e9thodes pr\u00e9c\u00e9dentes peuvent \u00eatre appliqu\u00e9es \u00e0 n&rsquo;importe quel ensemble de donn\u00e9es structur\u00e9es et sont\u00a0d\u00e9terministes\u00a0: les r\u00e9sultats seront les m\u00eames \u00e0 chaque fois pour un seuil donn\u00e9. La m\u00e9thode suivante est con\u00e7ue uniquement pour les probl\u00e8mes d&rsquo;apprentissage automatique supervis\u00e9 o\u00f9 nous avons des \u00e9tiquettes pour former un mod\u00e8le et est non d\u00e9terministe. La fonction\u00a0identify_zero_importance\u00a0recherche les fonctionnalit\u00e9s qui n&rsquo;ont aucune importance selon un mod\u00e8le d&rsquo;apprentissage de la machine \u00e0 gradient boostant (GBM).<\/p><p id=\"06e6\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Avec des mod\u00e8les d&rsquo;apprentissage automatique bas\u00e9s sur des arborescences,\u00a0comme un ensemble de boosting, nous pouvons trouver l&rsquo;importance des fonctionnalit\u00e9s. La valeur absolue de l&rsquo;importance n&rsquo;est pas aussi importante que les valeurs relatives, que nous pouvons utiliser pour d\u00e9terminer les caract\u00e9ristiques les plus pertinentes pour une t\u00e2che. Nous pouvons \u00e9galement utiliser les importances des fonctionnalit\u00e9s pour la s\u00e9lection des fonctionnalit\u00e9s en supprimant les fonctionnalit\u00e9s d&rsquo;importance nulle. Dans un mod\u00e8le bas\u00e9 sur une arborescence, les fonctionnalit\u00e9s sans importance ne sont pas utilis\u00e9es pour diviser les n\u0153uds, et nous pouvons donc les supprimer sans affecter les performances du mod\u00e8le.<\/p><p id=\"e121\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Le\u00a0FeatureSelector\u00a0trouve les importances des fonctionnalit\u00e9s \u00e0 l&rsquo;aide de la machine de renforcement de gradient de la\u00a0biblioth\u00e8que LightGBM. Les importances des caract\u00e9ristiques sont moyenn\u00e9es sur 10 ex\u00e9cutions de formation du GBM afin de r\u00e9duire la variance. En outre, le mod\u00e8le est entra\u00een\u00e9 \u00e0 l&rsquo;aide d&rsquo;un arr\u00eat pr\u00e9coce avec un ensemble de validation (il existe une option pour d\u00e9sactiver cette option) pour \u00e9viter le surajustement des donn\u00e9es d&rsquo;entra\u00eenement.<\/p><p id=\"f780\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Le code ci-dessous appelle la m\u00e9thode et extrait les caract\u00e9ristiques d&rsquo;importance nulle\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"0149\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\"># Pass in the appropriate parameters<br \/>fs.identify_zero_importance(task = 'classification', <br \/>                            eval_metric = 'auc', <br \/>                            n_iterations = 10, <br \/>                             early_stopping = True)<\/span><span id=\"d609\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"># list of zero importance features<br \/>zero_importance_features = fs.ops['zero_importance']<\/span><span id=\"7dd2\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">63 features with zero importance after one-hot encoding.<\/strong><\/span><\/pre><p id=\"45dc\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Les param\u00e8tres que nous passons sont les suivants :<\/p><ul class=\"\"><li id=\"4094\" class=\"mb mc jl lc b ld le lg lh lj md ln me lr mf lv pe mh mi mj gc\" data-selectable-paragraph=\"\"><code class=\"fr lx ly lz ma b\">task<\/code> : soit \u00ab\u00a0classification\u00a0\u00bb soit \u00ab\u00a0<a href=\"https:\/\/complex-systems-ai.com\/es\/correlacion-y-regresiones\/transformacion-de-datos-y-regresion\/\">r\u00e9gression<\/a>\u00a0\u00bb correspondant \u00e0 notre probl\u00e8me<\/li><li id=\"dc9d\" class=\"mb mc jl lc b ld mk lg ml lj mm ln mn lr mo lv pe mh mi mj gc\" data-selectable-paragraph=\"\"><code class=\"fr lx ly lz ma b\">eval_metric<\/code>: m\u00e9trique \u00e0 utiliser pour l&rsquo;arr\u00eat anticip\u00e9 (inutile si l&rsquo;arr\u00eat anticip\u00e9 est d\u00e9sactiv\u00e9)<\/li><li id=\"d19a\" class=\"mb mc jl lc b ld mk lg ml lj mm ln mn lr mo lv pe mh mi mj gc\" data-selectable-paragraph=\"\"><code class=\"fr lx ly lz ma b\">n_iterations<\/code> : nombre d&rsquo;ex\u00e9cutions d&rsquo;entra\u00eenement pour faire la moyenne des importances des fonctionnalit\u00e9s<\/li><li id=\"4e99\" class=\"mb mc jl lc b ld mk lg ml lj mm ln mn lr mo lv pe mh mi mj gc\" data-selectable-paragraph=\"\"><code class=\"fr lx ly lz ma b\">early_stopping<\/code>: si oui ou non utiliser l&rsquo;arr\u00eat pr\u00e9coce pour la formation du mod\u00e8le<\/li><\/ul><p id=\"3d73\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Cette fois, nous obtenons deux graphiques avec\u00a0plot_feature_importances\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"1037\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\"># plot the feature importances<br \/>fs.plot_feature_importances(threshold = 0.99, plot_n = 12)<\/span><span id=\"8913\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">124 features required for 0.99 of cumulative importance<\/strong><\/span><\/pre><div class=\"nx ny nz oa gz o he\"><figure class=\"pf jc pg ph pi pj pk paragraph-image\"><div tabindex=\"0\" role=\"button\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15967 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-1024x525.png\" alt=\"\" width=\"1024\" height=\"525\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-1024x525.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-300x154.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-768x394.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-18x9.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g-600x308.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_hWCOAEWkH4z5BKKqkFAd1g.png 1076w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/div><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15966 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_HJk89EkbcmriiWbxpV6Uew.png\" alt=\"\" width=\"552\" height=\"396\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_HJk89EkbcmriiWbxpV6Uew.png 552w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_HJk89EkbcmriiWbxpV6Uew-300x215.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_HJk89EkbcmriiWbxpV6Uew-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_HJk89EkbcmriiWbxpV6Uew-120x85.png 120w\" sizes=\"(max-width: 552px) 100vw, 552px\" \/><\/div><\/figure><\/div><p id=\"74ac\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">En haut, nous avons les caract\u00e9ristiques les plus importantes de plot_n (trac\u00e9es en termes d&rsquo;importance normalis\u00e9e o\u00f9 le total est \u00e9gal \u00e0 1).<\/p><p class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">En bas, nous avons l&rsquo;importance cumul\u00e9e par rapport au nombre de fonctionnalit\u00e9s. La ligne verticale est trac\u00e9e au seuil de l&rsquo;importance cumul\u00e9e, dans ce cas 99\u00a0%.<\/p><p id=\"d885\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Deux notes sont bonnes \u00e0 retenir pour les m\u00e9thodes bas\u00e9es sur l&rsquo;importance\u00a0:<\/p><ul class=\"\"><li id=\"f09a\" class=\"mb mc jl lc b ld le lg lh lj md ln me lr mf lv pe mh mi mj gc\" data-selectable-paragraph=\"\">L&rsquo;entra\u00eenement de la machine d&rsquo;amplification de gradient est stochastique, ce qui signifie que l&rsquo;importance des fonctionnalit\u00e9s changera \u00e0 chaque ex\u00e9cution du mod\u00e8le<\/li><\/ul><p id=\"4a4a\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Cela ne devrait pas avoir d&rsquo;impact majeur (les fonctionnalit\u00e9s les plus importantes ne deviendront pas soudainement les moindres) mais cela modifiera l&rsquo;ordre de certaines fonctionnalit\u00e9s. Cela peut \u00e9galement affecter le nombre de caract\u00e9ristiques d&rsquo;importance nulle identifi\u00e9es. Ne soyez pas surpris si l&rsquo;importance des fonctionnalit\u00e9s change \u00e0 chaque fois\u00a0!<\/p><ul class=\"\"><li id=\"f9ef\" class=\"mb mc jl lc b ld le lg lh lj md ln me lr mf lv pe mh mi mj gc\" data-selectable-paragraph=\"\">Pour entra\u00eener le mod\u00e8le d&rsquo;apprentissage automatique, les fonctionnalit\u00e9s sont d&rsquo;abord encod\u00e9es \u00e0 chaud. Cela signifie que certaines des fonctionnalit\u00e9s identifi\u00e9es comme ayant une importance de 0 peuvent \u00eatre des fonctionnalit\u00e9s cod\u00e9es \u00e0 chaud ajout\u00e9es lors de la mod\u00e9lisation.<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a1d0012 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a1d0012\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-988df22\" data-id=\"988df22\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0ab8bc6 elementor-widget elementor-widget-heading\" data-id=\"0ab8bc6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Colonnes-a-peu-dimportance\"><\/span>Colonnes \u00e0 peu d'importance<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-71f6265 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"71f6265\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1f3afe7\" data-id=\"1f3afe7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a13f9d9 elementor-widget elementor-widget-text-editor\" data-id=\"a13f9d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"af97\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">La m\u00e9thode suivante s&rsquo;appuie sur la fonction d&rsquo;importance z\u00e9ro, en utilisant les importances des caract\u00e9ristiques du mod\u00e8le pour une s\u00e9lection plus pouss\u00e9e. La fonction\u00a0identifier_low_importance\u00a0recherche les caract\u00e9ristiques les moins importantes qui ne contribuent pas \u00e0 une importance totale sp\u00e9cifi\u00e9e.<\/p><p id=\"ade4\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Par exemple, l&rsquo;appel ci-dessous trouve les fonctionnalit\u00e9s les moins importantes qui ne sont pas requises pour atteindre 99\u00a0% de l&rsquo;importance totale\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"a322\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.identify_low_importance(cumulative_importance = 0.99)<\/span><span id=\"c78d\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">123 features required for cumulative importance of 0.99 after one hot encoding.<br \/>116 features do not contribute to cumulative importance of 0.99.<\/strong><\/span><\/pre><p id=\"5fe6\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Sur la base du trac\u00e9 de l&rsquo;importance cumulative et de ces informations, la machine d&rsquo;amplification de gradient consid\u00e8re que de nombreuses fonctionnalit\u00e9s ne sont pas pertinentes pour l&rsquo;apprentissage. Encore une fois, les r\u00e9sultats de cette m\u00e9thode changeront \u00e0 chaque entra\u00eenement.<\/p><p id=\"0d2e\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Pour afficher toutes les importances des fonctionnalit\u00e9s dans un dataframe\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"bcba\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.feature_importances.head(10)<\/span><\/pre><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"gn go pn\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15968 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_d1uRrw212LAmpjlszj7CFg.png\" alt=\"\" width=\"628\" height=\"306\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_d1uRrw212LAmpjlszj7CFg.png 628w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_d1uRrw212LAmpjlszj7CFg-300x146.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_d1uRrw212LAmpjlszj7CFg-18x9.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_d1uRrw212LAmpjlszj7CFg-600x292.png 600w\" sizes=\"(max-width: 628px) 100vw, 628px\" \/><\/div><\/figure><p id=\"c599\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">La\u00a0m\u00e9thode de faible importance emprunte \u00e0 l&rsquo;une des m\u00e9thodes d&rsquo;utilisation de l&rsquo;analyse en composantes principales (ACP)\u00a0o\u00f9 il est courant de ne conserver que le PC n\u00e9cessaire pour conserver un certain pourcentage de la variance (par exemple, 95\u00a0%). Le pourcentage de l&rsquo;importance totale pris en compte repose sur la m\u00eame id\u00e9e.<\/p><p id=\"cfb1\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Les m\u00e9thodes bas\u00e9es sur l&rsquo;importance des caract\u00e9ristiques ne sont vraiment applicables que si nous allons utiliser un mod\u00e8le bas\u00e9 sur un <a href=\"https:\/\/complex-systems-ai.com\/es\/teoria-de-grafos\/arboles-y-arboles\/\">arbre<\/a> pour faire des pr\u00e9dictions. En plus d&rsquo;\u00eatre stochastiques, les m\u00e9thodes bas\u00e9es sur l&rsquo;importance sont une approche de bo\u00eete noire en ce sens que nous ne savons pas vraiment pourquoi le mod\u00e8le consid\u00e8re les caract\u00e9ristiques comme non pertinentes. Si vous utilisez ces m\u00e9thodes, ex\u00e9cutez-les plusieurs fois pour voir comment les r\u00e9sultats changent, et cr\u00e9ez peut-\u00eatre plusieurs ensembles de donn\u00e9es avec diff\u00e9rents param\u00e8tres \u00e0 tester\u00a0!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-dd6a2e5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dd6a2e5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d9439c8\" data-id=\"d9439c8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9192b12 elementor-widget elementor-widget-heading\" data-id=\"9192b12\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Colonnes-a-valeur-unique\"><\/span>Colonnes \u00e0 valeur unique<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5afe309 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5afe309\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c9dc802\" data-id=\"c9dc802\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2ab8b97 elementor-widget elementor-widget-text-editor\" data-id=\"2ab8b97\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"cbec\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">La derni\u00e8re m\u00e9thode est assez basique\u00a0:\u00a0recherchez toutes les colonnes qui ont une seule valeur unique. Une fonctionnalit\u00e9 avec une seule valeur unique ne peut pas \u00eatre utile pour le machine learning, car cette fonctionnalit\u00e9 a une variance nulle. Par exemple, un mod\u00e8le arborescent ne peut jamais diviser une caract\u00e9ristique avec une seule valeur (puisqu&rsquo;il n&rsquo;y a pas de groupes dans lesquels diviser les observations).<\/p><p id=\"591a\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Il n&rsquo;y a pas de param\u00e8tres \u00e0 s\u00e9lectionner ici, contrairement aux autres m\u00e9thodes\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"63ff\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.identify_single_unique()<\/span><span id=\"cd7b\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">4 features with a single unique value.<\/strong><\/span><\/pre><p id=\"cc26\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Nous pouvons tracer un histogramme du nombre de valeurs uniques dans chaque cat\u00e9gorie\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"798a\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.plot_unique()<\/span><\/pre><figure class=\"nx ny nz oa gz jc gn go paragraph-image\"><div class=\"gn go po\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15969 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_F3BV5mUWG-GLP8gnS62Z6w.png\" alt=\"\" width=\"614\" height=\"462\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_F3BV5mUWG-GLP8gnS62Z6w.png 614w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_F3BV5mUWG-GLP8gnS62Z6w-300x226.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_F3BV5mUWG-GLP8gnS62Z6w-16x12.png 16w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_F3BV5mUWG-GLP8gnS62Z6w-600x451.png 600w\" sizes=\"(max-width: 614px) 100vw, 614px\" \/><\/div><\/figure><p id=\"1711\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Un point \u00e0 retenir est que les NaN sont supprim\u00e9s avant le calcul des valeurs uniques dans Pandas par d\u00e9faut.calculating unique values in Pandas by default.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-505a7de elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"505a7de\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b6b9327\" data-id=\"b6b9327\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9a461ef elementor-widget elementor-widget-heading\" data-id=\"9a461ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Retirer-les-colonnes\"><\/span>Retirer les colonnes<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5797e23 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5797e23\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-05d7f10\" data-id=\"05d7f10\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1c0aca9 elementor-widget elementor-widget-text-editor\" data-id=\"1c0aca9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fe68\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">Une fois que nous avons identifi\u00e9 les fonctionnalit\u00e9s \u00e0 supprimer, nous avons deux options pour les supprimer. Toutes les fonctionnalit\u00e9s \u00e0 supprimer sont stock\u00e9es dans\u00a0l&rsquo;ops\u00a0dict de\u00a0FeatureSelector\u00a0et nous pouvons utiliser les listes pour supprimer des fonctionnalit\u00e9s manuellement. Une autre option consiste \u00e0 utiliser la fonction int\u00e9gr\u00e9e de suppression.<\/p><p id=\"26f0\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Pour cette m\u00e9thode, nous transmettons les\u00a0m\u00e9thodes\u00a0\u00e0 utiliser pour supprimer des fonctionnalit\u00e9s. Si nous voulons utiliser toutes les m\u00e9thodes impl\u00e9ment\u00e9es, nous passons simplement\u00a0methods = &lsquo;all&rsquo;.<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"676e\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\"># Remove the features from all methods (returns a df)<br \/>train_removed = fs.remove(methods = 'all')<\/span><span id=\"3b36\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">['missing', 'single_unique', 'collinear', 'zero_importance', 'low_importance'] methods have been run<br \/><br \/>Removed 140 features.<\/strong><\/span><\/pre><p id=\"8ecf\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Cette m\u00e9thode renvoie une trame de donn\u00e9es avec les fonctionnalit\u00e9s supprim\u00e9es. Pour supprimer \u00e9galement les fonctionnalit\u00e9s encod\u00e9es \u00e0 chaud qui sont cr\u00e9\u00e9es lors du machine learning\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"d6fc\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">train_removed_all = fs.remove(methods = 'all', keep_one_hot=False)<\/span><span id=\"2c07\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">Removed 187 features including one-hot features.<\/strong><\/span><\/pre><p id=\"049c\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Il peut \u00eatre judicieux de v\u00e9rifier les fonctionnalit\u00e9s qui seront supprim\u00e9es avant de poursuivre l&rsquo;op\u00e9ration ! L&rsquo;ensemble de donn\u00e9es d&rsquo;origine est stock\u00e9 dans l&rsquo;attribut data du\u00a0FeatureSelector\u00a0en tant que sauvegarde\u00a0!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-55f1e57 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"55f1e57\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3d7b213\" data-id=\"3d7b213\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-dfaf996 elementor-widget elementor-widget-heading\" data-id=\"dfaf996\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Pipeline-du-nettoyage-des-donnees\"><\/span>Pipeline du nettoyage des donn\u00e9es<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0b0566e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0b0566e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2ba8790\" data-id=\"2ba8790\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-eced679 elementor-widget elementor-widget-text-editor\" data-id=\"eced679\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c6d8\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">Plut\u00f4t que d&rsquo;utiliser les m\u00e9thodes individuellement, nous pouvons toutes les utiliser avec\u00a0identify_all. Cela prend un dictionnaire des param\u00e8tres pour chaque m\u00e9thode\u00a0:<\/p><pre class=\"nx ny nz oa gz oe bt of\"><span id=\"3a71\" class=\"gc mw mx jl ma b do og oh l oi\" data-selectable-paragraph=\"\">fs.identify_all(selection_params = {'missing_threshold': 0.6,    <br \/>                                    'correlation_threshold': 0.98, <br \/>                                    'task': 'classification',    <br \/>                                    'eval_metric': 'auc', <br \/>                                    'cumulative_importance': 0.99})<\/span><span id=\"8a74\" class=\"gc mw mx jl ma b do oj ok ol om on oh l oi\" data-selectable-paragraph=\"\"><strong class=\"ma jm\">151 total features out of 255 identified for removal after one-hot encoding.<\/strong><\/span><\/pre><p id=\"3960\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Notez que le nombre total de fonctionnalit\u00e9s changera car nous avons r\u00e9ex\u00e9cut\u00e9 le mod\u00e8le. La fonction de suppression peut ensuite \u00eatre appel\u00e9e pour supprimer ces fonctionnalit\u00e9s.<\/p><p id=\"a73a\" class=\"pw-post-body-paragraph la lb jl lc b ld nr km lf lg ns kp li lj nt ll lm ln nu lp lq lr nv lt lu lv it gc\" data-selectable-paragraph=\"\">La classe Feature Selector impl\u00e9mente plusieurs op\u00e9rations courantes pour supprimer des fonctionnalit\u00e9s avant d&rsquo;entra\u00eener un mod\u00e8le de machine learning. Il offre des fonctions d&rsquo;identification des fonctionnalit\u00e9s \u00e0 supprimer ainsi que des visualisations. Les m\u00e9thodes peuvent \u00eatre ex\u00e9cut\u00e9es individuellement ou toutes en m\u00eame temps pour des flux de travail efficaces.<\/p><p id=\"45c8\" class=\"pw-post-body-paragraph la lb jl lc b ld le km lf lg lh kp li lj lk ll lm ln lo lp lq lr ls lt lu lv it gc\" data-selectable-paragraph=\"\">Les m\u00e9thodes\u00a0missing,\u00a0collinear et\u00a0single_unique\u00a0sont d\u00e9terministes, tandis que les m\u00e9thodes bas\u00e9es sur l&rsquo;importance des caract\u00e9ristiques changent \u00e0 chaque ex\u00e9cution. La s\u00e9lection de fonctionnalit\u00e9s, tout comme le domaine de l&rsquo;apprentissage automatique, est largement empirique\u00a0et n\u00e9cessite de tester plusieurs combinaisons pour trouver la r\u00e9ponse optimale. Il est recommand\u00e9 d&rsquo;essayer plusieurs configurations dans un pipeline, et le s\u00e9lecteur de fonctionnalit\u00e9s offre un moyen d&rsquo;\u00e9valuer rapidement les param\u00e8tres de s\u00e9lection des fonctionnalit\u00e9s.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>An\u00e1lisis de datos Wiki p\u00e1gina de inicio Selecci\u00f3n de caracter\u00edsticas, el proceso de encontrar y seleccionar las caracter\u00edsticas m\u00e1s \u00fatiles de un conjunto... <\/p>","protected":false},"author":1,"featured_media":0,"parent":15503,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-15958","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/comments?post=15958"}],"version-history":[{"count":4,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15958\/revisions"}],"predecessor-version":[{"id":17889,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15958\/revisions\/17889"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15503"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/media?parent=15958"}],"curies":[{"name":"gracias","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}