{"id":15859,"date":"2022-04-24T08:25:53","date_gmt":"2022-04-24T07:25:53","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=15859"},"modified":"2022-04-24T09:28:15","modified_gmt":"2022-04-24T08:28:15","slug":"comment-gerer-les-donnees-manquantes","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/","title":{"rendered":"C\u00f3mo manejar los datos faltantes"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"15859\" class=\"elementor elementor-15859\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8fd6e38 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8fd6e38\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-319737b\" data-id=\"319737b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f865e00 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"f865e00\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/analyse-des-donnees\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Analyse des donn\u00e9es<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-3420931\" data-id=\"3420931\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-602d5ef elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"602d5ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Page d'accueil<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-ecbeb1d\" data-id=\"ecbeb1d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cf6114f elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"cf6114f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/en.wikipedia.org\/wiki\/Data_analysis\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4eefe33 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4eefe33\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0ca390b\" data-id=\"0ca390b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-05ce4c8 elementor-widget elementor-widget-text-editor\" data-id=\"05ce4c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>L&rsquo;un des probl\u00e8mes les plus courants auxquels j&rsquo;ai \u00e9t\u00e9 confront\u00e9 dans le <a href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/limpieza-de-datos\/\">nettoyage des donn\u00e9es<\/a>\/l&rsquo;analyse exploratoire est la gestion des valeurs manquantes: comment g\u00e9rer les donn\u00e9es manquantes. Tout d&rsquo;abord, comprenez qu&rsquo;il n&rsquo;y a AUCUNE bonne fa\u00e7on de traiter les donn\u00e9es manquantes.\u00a0<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"comment g\u00e9rer les donn\u00e9es manquantes\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-eb2b56e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"eb2b56e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a44d07d\" data-id=\"a44d07d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3ac6f0e elementor-widget elementor-widget-heading\" data-id=\"3ac6f0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contenus<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Alternar tabla de contenidos\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Comment-gerer-les-donnees-manquantes-la-methodologie\" >Comment g\u00e9rer les donn\u00e9es manquantes : la m\u00e9thodologie<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Effacement-deletion\" >Effacement (deletion)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Methodes-sur-les-series-temporelles\" >M\u00e9thodes sur les s\u00e9ries temporelles<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Imputation-methodes-classiques\" >Imputation (m\u00e9thodes classiques)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Moyenne-mediane-et-mode\" >Moyenne, m\u00e9diane et mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Regression-lineaire\" >R\u00e9gression lin\u00e9aire<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Imputation-multiple\" >Imputation multiple<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Imputation-de-donnees-categoriques\" >Imputation de donn\u00e9es cat\u00e9goriques<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/como-manejar-los-datos-faltantes\/#Avec-le-machine-learning-knn-voir-les-autres-cours-pour-des-methodes-plus-elaborees\" >Avec le machine learning (knn) - voir les autres cours pour des m\u00e9thodes plus \u00e9labor\u00e9es<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Comment-gerer-les-donnees-manquantes-la-methodologie\"><\/span>Comment g\u00e9rer les donn\u00e9es manquantes : la m\u00e9thodologie<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3ac64a9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3ac64a9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b31c8b8\" data-id=\"b31c8b8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-25ec5b5 elementor-widget elementor-widget-text-editor\" data-id=\"25ec5b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Avant de passer aux m\u00e9thodes d&rsquo;imputation des donn\u00e9es, nous devons comprendre la raison pour laquelle les donn\u00e9es manquent.<\/p><p>Manquant au hasard (MAR) : Manquant au hasard signifie que la propension d&rsquo;un point de donn\u00e9es \u00e0 manquer n&rsquo;est pas li\u00e9e aux donn\u00e9es manquantes, mais elle est li\u00e9e \u00e0 certaines des donn\u00e9es observ\u00e9es<\/p><p>Missing Completely at Random (MCAR): Le fait qu&rsquo;une certaine valeur manque n&rsquo;a rien \u00e0 voir avec sa valeur hypoth\u00e9tique et avec les valeurs des autres variables.<\/p><p>Donn\u00e9es manquantes non al\u00e9atoires (MNAR)\u00a0: deux raisons possibles sont que la valeur manquante d\u00e9pend de la valeur hypoth\u00e9tique (par exemple, les personnes \u00e0 hauts salaires ne veulent g\u00e9n\u00e9ralement pas r\u00e9v\u00e9ler leurs revenus dans les enqu\u00eates) ou que la valeur manquante d\u00e9pend de la valeur d&rsquo;une autre variable (par exemple, Supposons que les femmes ne veulent g\u00e9n\u00e9ralement pas r\u00e9v\u00e9ler leur \u00e2ge\u00a0! Ici, la valeur manquante dans la variable d&rsquo;\u00e2ge est impact\u00e9e par la variable de sexe).<\/p><p>Dans les deux premiers cas, il est s\u00fbr de supprimer les donn\u00e9es avec des valeurs manquantes en fonction de leurs occurrences, tandis que dans le troisi\u00e8me cas, la suppression des observations avec des valeurs manquantes peut produire un biais dans le mod\u00e8le. Nous devons donc \u00eatre tr\u00e8s prudents avant de supprimer des observations. Notez que l&rsquo;imputation ne donne pas n\u00e9cessairement de meilleurs r\u00e9sultats.<\/p><p><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-15863 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-1024x905.png\" alt=\"\" width=\"1024\" height=\"905\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-1024x905.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-300x265.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-768x679.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-14x12.png 14w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw-600x530.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1__RA3mCS30Pr0vUxbp25Yxw.png 1222w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-72ed579 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"72ed579\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8899d7a\" data-id=\"8899d7a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d9e1f24 elementor-widget elementor-widget-heading\" data-id=\"d9e1f24\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Effacement-deletion\"><\/span>Effacement (deletion)<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-aa8132d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"aa8132d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cf2ab59\" data-id=\"cf2ab59\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6c14ac1 elementor-widget elementor-widget-text-editor\" data-id=\"6c14ac1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul class=\"\"><li id=\"7d11\" class=\"mt mu jl kk b kl mo kp mp kt ni kx nj lb nk lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Listwise<\/strong><\/li><\/ul><p>La suppression par liste (analyse de cas compl\u00e8te) supprime toutes les donn\u00e9es d&rsquo;une observation qui a une ou plusieurs valeurs manquantes. En particulier si les donn\u00e9es manquantes se limitent \u00e0 un petit nombre d&rsquo;observations, vous pouvez simplement choisir d&rsquo;\u00e9liminer ces cas de l&rsquo;analyse. Cependant, dans la plupart des cas, il est souvent d\u00e9savantageux d&rsquo;utiliser la suppression par liste. En effet, les hypoth\u00e8ses de MCAR (Missing Completely at Random) sont g\u00e9n\u00e9ralement rares \u00e0 prendre en charge. Par cons\u00e9quent, les m\u00e9thodes de suppression par liste produisent des param\u00e8tres et des estimations biais\u00e9s.<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"8e7c\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\">newdata &lt;- na.omit(mydata)<\/span><span id=\"f846\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\"># In python<br \/>mydata.dropna(inplace=True)<\/span><\/pre><ul class=\"\"><li id=\"9405\" class=\"mt mu jl kk b kl km kp kq kt mv kx mw lb mx lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Pairwise<\/strong><\/li><\/ul><p>La suppression par paires analyse tous les cas dans lesquels les variables d&rsquo;int\u00e9r\u00eat sont pr\u00e9sentes et maximise ainsi toutes les donn\u00e9es disponibles par une base d&rsquo;analyse. L&rsquo;un des points forts de cette technique est qu&rsquo;elle augmente la puissance de votre analyse, mais elle pr\u00e9sente de nombreux inconv\u00e9nients. Il suppose que les donn\u00e9es manquantes sont MCAR. Si vous supprimez par paires, vous vous retrouverez avec un nombre diff\u00e9rent d&rsquo;observations contribuant \u00e0 diff\u00e9rentes parties de votre mod\u00e8le, ce qui peut rendre l&rsquo;interpr\u00e9tation difficile.<\/p><figure class=\"li lj lk ll gz lm gn go paragraph-image\"><div class=\"ln lo dq lp cf lq\" tabindex=\"0\" role=\"button\"><div class=\"gn go nx\"><img decoding=\"async\" class=\"aligncenter wp-image-15864 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg.png\" alt=\"\" width=\"872\" height=\"230\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg.png 872w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg-300x79.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg-768x203.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg-18x5.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_3Pcgo66zuwLY2HkUhVr2mg-600x158.png 600w\" sizes=\"(max-width: 872px) 100vw, 872px\" \/><\/div><\/div><\/figure><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"4417\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\">#Pairwise Deletion<br \/>ncovMatrix &lt;- cov(mydata, use=\"pairwise.complete.obs\")<\/span><span id=\"8572\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">#Listwise Deletion<br \/>ncovMatrix &lt;- cov(mydata, use=\"complete.obs\")<\/span><\/pre><ul class=\"\"><li id=\"5130\" class=\"mt mu jl kk b kl km kp kq kt mv kx mw lb mx lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Dropping Variables<\/strong><\/li><\/ul><p>Il est toujours pr\u00e9f\u00e9rable de conserver les donn\u00e9es que de les jeter. Parfois, vous pouvez supprimer des variables si les donn\u00e9es manquent pour plus de 60\u00a0% des observations, mais uniquement si cette variable est insignifiante. Cela dit, l&rsquo;imputation est toujours un choix pr\u00e9f\u00e9rable \u00e0 l&rsquo;abandon de variables.<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"99be\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\">df &lt;- subset(mydata, select = -c(x,z) )<br \/>df &lt;- mydata[ -c(1,3:4) ]<\/span><span id=\"d59e\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">In python<br \/>del mydata.column_name<br \/>mydata.drop('column_name', axis=1, inplace=True)<\/span><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-687f959 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"687f959\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-40308f9\" data-id=\"40308f9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f09075b elementor-widget elementor-widget-heading\" data-id=\"f09075b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Methodes-sur-les-series-temporelles\"><\/span>M\u00e9thodes sur les s\u00e9ries temporelles<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ecc099f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ecc099f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-59a668a\" data-id=\"59a668a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b311c80 elementor-widget elementor-widget-text-editor\" data-id=\"b311c80\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul class=\"\"><li id=\"754c\" class=\"mt mu jl kk b kl mo kp mp kt ni kx nj lb nk lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Last Observation Carried Forward (LOCF) et Next Observation Carried Backward (NOCB)<\/strong><\/li><\/ul><p>Il s&rsquo;agit d&rsquo;une approche statistique commune \u00e0 l&rsquo;<a href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/\">analyse des donn\u00e9es<\/a> longitudinales de mesures r\u00e9p\u00e9t\u00e9es o\u00f9 certaines observations de suivi peuvent \u00eatre manquantes. Les donn\u00e9es longitudinales suivent le m\u00eame \u00e9chantillon \u00e0 diff\u00e9rents moments dans le temps. Ces deux m\u00e9thodes peuvent introduire un biais dans l&rsquo;analyse et donner de mauvais r\u00e9sultats lorsque les donn\u00e9es pr\u00e9sentent une tendance visible.<\/p><ul class=\"\"><li id=\"1cec\" class=\"mt mu jl kk b kl nc kp nd kt ne kx nf lb ng lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Interpolation lin\u00e9aire<\/strong><\/li><\/ul><p>Cette m\u00e9thode fonctionne bien pour une s\u00e9rie chronologique avec une certaine tendance, mais ne convient pas aux donn\u00e9es saisonni\u00e8res.<\/p><ul class=\"\"><li id=\"4962\" class=\"mt mu jl kk b kl nc kp nd kt ne kx nf lb ng lf nl mz na nb gc\" data-selectable-paragraph=\"\"><strong class=\"kk jm\">Saisonnalit\u00e9 + interpolation<\/strong><\/li><\/ul><p>Cette m\u00e9thode fonctionne bien pour les donn\u00e9es pr\u00e9sentant \u00e0 la fois une tendance et une saisonnalit\u00e9.<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"3c4f\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\">library(imputeTS)<\/span><span id=\"cfb3\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">na.random(mydata)                  # Random Imputation<br \/>na.locf(mydata, option = \"locf\")   # Last Obs. Carried Forward<br \/>na.locf(mydata, option = \"nocb\")   # Next Obs. Carried Backward<br \/>na.interpolation(mydata)           # Linear Interpolation<br \/>na.seadec(mydata, algorithm = \"interpolation\") # Seasonal Adjustment then Linear Interpolation<\/span><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-b94e8e1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b94e8e1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-28e40b4\" data-id=\"28e40b4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e83efba elementor-widget elementor-widget-heading\" data-id=\"e83efba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Imputation-methodes-classiques\"><\/span>Imputation (m\u00e9thodes classiques)<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-25aa30a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"25aa30a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-116b335\" data-id=\"116b335\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0de4347 elementor-widget elementor-widget-text-editor\" data-id=\"0de4347\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h2 id=\"31d6\" class=\"lt lu jl bn lv lw lx ly lz ma mb mc md kt me mf mg kx mh mi mj lb mk ml mm mn gc\" data-selectable-paragraph=\"\"><span class=\"ez-toc-section\" id=\"Moyenne-mediane-et-mode\"><\/span>Moyenne, m\u00e9diane et mode<span class=\"ez-toc-section-end\"><\/span><\/h2><p id=\"4a96\" class=\"pw-post-body-paragraph ki kj jl kk b kl mo kn ko kp mp kr ks kt mq kv kw kx mr kz la lb ms ld le lf je gc\" data-selectable-paragraph=\"\">Le calcul de la moyenne globale, de la m\u00e9diane ou du mode est une m\u00e9thode d&rsquo;imputation tr\u00e8s basique, c&rsquo;est la seule fonction test\u00e9e qui ne tire aucun avantage des caract\u00e9ristiques de la s\u00e9rie chronologique ou de la relation entre les variables. Il est tr\u00e8s rapide, mais pr\u00e9sente des inconv\u00e9nients \u00e9vidents. Un inconv\u00e9nient est que l&rsquo;imputation moyenne r\u00e9duit la variance dans l&rsquo;ensemble de donn\u00e9es.<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"d3ae\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\">library(imputeTS)<\/span><span id=\"eec1\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">na.mean(mydata, option = \"mean\")   # Mean Imputation<br \/>na.mean(mydata, option = \"median\") # Median Imputation<br \/>na.mean(mydata, option = \"mode\")   # Mode Imputation<\/span><span id=\"589d\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">In Python<br \/>from sklearn.preprocessing import Imputer<br \/>values = mydata.values<br \/>imputer = Imputer(missing_values=\u2019NaN\u2019, strategy=\u2019mean\u2019)<br \/>transformed_values = imputer.fit_transform(values)<\/span><span id=\"ec17\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\"># strategy can be changed to \"median\" and \u201cmost_frequent\u201d<\/span><\/pre><h2 id=\"ddb8\" class=\"lt lu jl bn lv lw lx ly lz ma mb mc md kt me mf mg kx mh mi mj lb mk ml mm mn gc\" data-selectable-paragraph=\"\"><span class=\"ez-toc-section\" id=\"Regression-lineaire\"><\/span><strong class=\"ba\">R\u00e9gression lin\u00e9aire<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2><p>Pour commencer, plusieurs pr\u00e9dicteurs de la variable avec des valeurs manquantes sont identifi\u00e9s \u00e0 l&rsquo;aide d&rsquo;une matrice de <a href=\"https:\/\/complex-systems-ai.com\/es\/correlacion-y-regresiones\/\">corr\u00e9lation<\/a>. Les meilleurs pr\u00e9dicteurs sont s\u00e9lectionn\u00e9s et utilis\u00e9s comme variables ind\u00e9pendantes dans une \u00e9quation de <a href=\"https:\/\/complex-systems-ai.com\/es\/correlacion-y-regresiones\/transformacion-de-datos-y-regresion\/\">r\u00e9gression<\/a>. La variable avec des donn\u00e9es manquantes est utilis\u00e9e comme variable d\u00e9pendante. Les cas avec des donn\u00e9es compl\u00e8tes pour les variables pr\u00e9dictives sont utilis\u00e9s pour g\u00e9n\u00e9rer l&rsquo;\u00e9quation de r\u00e9gression\u00a0; l&rsquo;\u00e9quation est ensuite utilis\u00e9e pour pr\u00e9dire les valeurs manquantes pour les cas incomplets.<\/p><p>Dans un processus it\u00e9ratif, les valeurs de la variable manquante sont ins\u00e9r\u00e9es, puis tous les cas sont utilis\u00e9s pour pr\u00e9dire la variable d\u00e9pendante. Ces \u00e9tapes sont r\u00e9p\u00e9t\u00e9es jusqu&rsquo;\u00e0 ce qu&rsquo;il y ait peu de diff\u00e9rence entre les valeurs pr\u00e9dites d&rsquo;une \u00e9tape \u00e0 l&rsquo;autre, c&rsquo;est-\u00e0-dire qu&rsquo;elles convergent.<\/p><p>Il fournit \u00ab\u00a0th\u00e9oriquement\u00a0\u00bb de bonnes estimations pour les valeurs manquantes. Cependant, ce mod\u00e8le pr\u00e9sente plusieurs inconv\u00e9nients qui tendent \u00e0 l&#8217;emporter sur les avantages. Premi\u00e8rement, parce que les valeurs remplac\u00e9es ont \u00e9t\u00e9 pr\u00e9dites \u00e0 partir d&rsquo;autres variables, elles ont tendance \u00e0 s&rsquo;accorder \u00ab trop bien \u00bb et l&rsquo;erreur type est donc d\u00e9gonfl\u00e9e. Il faut \u00e9galement supposer qu&rsquo;il existe une relation lin\u00e9aire entre les variables utilis\u00e9es dans l&rsquo;\u00e9quation de r\u00e9gression lorsqu&rsquo;il n&rsquo;y en a peut-\u00eatre pas.<\/p><h2 id=\"9f65\" class=\"lt lu jl bn lv lw lx ly lz ma mb mc md kt me mf mg kx mh mi mj lb mk ml mm mn gc\" data-selectable-paragraph=\"\"><span class=\"ez-toc-section\" id=\"Imputation-multiple\"><\/span>Imputation multiple<span class=\"ez-toc-section-end\"><\/span><\/h2><ol class=\"\"><li id=\"ba85\" class=\"mt mu jl kk b kl mo kp mp kt ni kx nj lb nk lf my mz na nb gc\" data-selectable-paragraph=\"\">Imputation\u00a0: imputez m\u00a0fois les entr\u00e9es manquantes des ensembles de donn\u00e9es incomplets (m\u00a0=\u00a03 dans la figure). Notez que les valeurs imput\u00e9es sont tir\u00e9es d&rsquo;une distribution. La simulation de tirages au sort n&rsquo;inclut pas l&rsquo;incertitude dans les param\u00e8tres du mod\u00e8le. Une meilleure approche consiste \u00e0 utiliser la simulation Markov Chain Monte Carlo (MCMC). Cette \u00e9tape aboutit \u00e0 m ensembles de donn\u00e9es complets.<\/li><li id=\"5770\" class=\"mt mu jl kk b kl nc kp nd kt ne kx nf lb ng lf my mz na nb gc\" data-selectable-paragraph=\"\">Analyse\u00a0: analysez chacun des\u00a0m\u00a0ensembles de donn\u00e9es compl\u00e9t\u00e9s.<\/li><li id=\"b4b3\" class=\"mt mu jl kk b kl nc kp nd kt ne kx nf lb ng lf my mz na nb gc\" data-selectable-paragraph=\"\">Mise en commun\u00a0: int\u00e9grez les r\u00e9sultats de l&rsquo;analyse\u00a0m dans un r\u00e9sultat final<\/li><\/ol><figure class=\"li lj lk ll gz lm gn go paragraph-image\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15865 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-1024x484.png\" alt=\"\" width=\"1024\" height=\"484\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-1024x484.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-300x142.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-768x363.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-18x9.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q-600x284.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_Cw4F1pzPug0BT5XNdF_P3Q.png 1324w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure><pre class=\"li lj lk ll gz nm bt nn\"><span class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\"># We will be using mice library in r<br \/>library(mice)<br \/># Deterministic regression imputation via mice<br \/>imp &lt;- mice(mydata, method = \"norm.predict\", m = 1)<\/span><\/pre><p># Store data<br \/>data_imp &lt;- complete(imp)<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"8683\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\"><\/span><span id=\"22d2\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\"># Multiple Imputation<br \/>imp &lt;- mice(mydata, m = 5)<\/span><span id=\"c479\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">#build predictive model<br \/>fit &lt;- with(data = imp, lm(y ~ x + z))<\/span><span id=\"f2c1\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">#combine results of all 5 models<br \/>combine &lt;- pool(fit)<\/span><\/pre><p id=\"6c4f\" class=\"pw-post-body-paragraph ki kj jl kk b kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf je gc\" data-selectable-paragraph=\"\">C&rsquo;est de loin la m\u00e9thode d&rsquo;imputation pr\u00e9f\u00e9r\u00e9e pour les raisons suivantes :<\/p><ul><li class=\"pw-post-body-paragraph ki kj jl kk b kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf je gc\">Facile \u00e0 utiliser<\/li><li class=\"pw-post-body-paragraph ki kj jl kk b kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf je gc\">Aucun biais (si le mod\u00e8le d&rsquo;imputation est correct)<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6e40bc5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6e40bc5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-fd80859\" data-id=\"fd80859\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-77b25fc elementor-widget elementor-widget-heading\" data-id=\"77b25fc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Imputation-de-donnees-categoriques\"><\/span>Imputation de donn\u00e9es cat\u00e9goriques<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a142ce1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a142ce1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7ce87a0\" data-id=\"7ce87a0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-dec82c0 elementor-widget elementor-widget-text-editor\" data-id=\"dec82c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li>L&rsquo;imputation modale est une m\u00e9thode, mais elle introduira certainement un biais.<\/li><li>Les valeurs manquantes peuvent \u00eatre trait\u00e9es comme une cat\u00e9gorie distincte en soi. Nous pouvons cr\u00e9er une autre cat\u00e9gorie pour les valeurs manquantes et les utiliser comme un niveau diff\u00e9rent. C&rsquo;est la m\u00e9thode la plus simple.<\/li><li>Mod\u00e8les de pr\u00e9diction : Ici, nous cr\u00e9ons un mod\u00e8le pr\u00e9dictif pour estimer les valeurs qui remplaceront les donn\u00e9es manquantes. Dans ce cas, nous divisons notre ensemble de donn\u00e9es en deux ensembles\u00a0: un ensemble sans valeurs manquantes pour la variable (formation) et un autre avec des valeurs manquantes (test). Nous pouvons utiliser des m\u00e9thodes telles que la r\u00e9gression logistique et l&rsquo;ANOVA pour la pr\u00e9diction.<\/li><li>Imputation multiple.<\/li><\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6282f9a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6282f9a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8695664\" data-id=\"8695664\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7611e12 elementor-widget elementor-widget-heading\" data-id=\"7611e12\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Avec-le-machine-learning-knn-voir-les-autres-cours-pour-des-methodes-plus-elaborees\"><\/span>Avec le machine learning (knn) - voir les autres cours pour des m\u00e9thodes plus \u00e9labor\u00e9es<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9945cb6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9945cb6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-16b0f7a\" data-id=\"16b0f7a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8d7a2d9 elementor-widget elementor-widget-text-editor\" data-id=\"8d7a2d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Il existe d&rsquo;autres techniques d&rsquo;apprentissage automatique telles que XGBoost et Random Forest pour l&rsquo;imputation de donn\u00e9es, mais nous discuterons de KNN car il est largement utilis\u00e9. Dans cette m\u00e9thode, k voisins sont choisis en fonction d&rsquo;une mesure de distance et leur moyenne est utilis\u00e9e comme estimation d&rsquo;imputation.<\/p><p>La m\u00e9thode n\u00e9cessite la s\u00e9lection du nombre de voisins les plus proches et une m\u00e9trique de distance. KNN peut pr\u00e9dire \u00e0 la fois des attributs discrets (la valeur la plus fr\u00e9quente parmi les k voisins les plus proches) et des attributs continus (la moyenne parmi les k voisins les plus proches)<br \/>La m\u00e9trique de distance varie selon le type de donn\u00e9es\u00a0:<\/p><ol><li>\u00a0Donn\u00e9es continues : les m\u00e9triques de distance couramment utilis\u00e9es pour les donn\u00e9es continues sont Euclidienne, Manhattan et Cosinus.<\/li><li>Donn\u00e9es cat\u00e9gorielles : La distance de Hamming est g\u00e9n\u00e9ralement utilis\u00e9e dans ce cas. Il prend tous les attributs cat\u00e9goriels et pour chacun, comptez-en un si la valeur n&rsquo;est pas la m\u00eame entre deux points. La distance de Hamming est alors \u00e9gale au nombre d&rsquo;attributs pour lesquels la valeur \u00e9tait diff\u00e9rente.<\/li><\/ol><p>L&rsquo;une des caract\u00e9ristiques les plus attrayantes de l&rsquo;algorithme KNN est qu&rsquo;il est simple \u00e0 comprendre et facile \u00e0 mettre en \u0153uvre. La nature non param\u00e9trique de KNN lui donne un avantage dans certains contextes o\u00f9 les donn\u00e9es peuvent \u00eatre hautement \u00ab\u00a0inhabituelles\u00a0\u00bb.<\/p><p>L&rsquo;un des inconv\u00e9nients \u00e9vidents de l&rsquo;algorithme KNN est qu&rsquo;il prend du temps lors de l&rsquo;analyse de grands ensembles de donn\u00e9es, car il recherche des instances similaires dans l&rsquo;ensemble de donn\u00e9es.<\/p><p>De plus, la pr\u00e9cision de KNN peut \u00eatre s\u00e9v\u00e8rement d\u00e9grad\u00e9e avec des donn\u00e9es de grande dimension car il y a peu de diff\u00e9rence entre le voisin le plus proche et le plus \u00e9loign\u00e9.<\/p><pre class=\"li lj lk ll gz nm bt nn\"><span id=\"acaf\" class=\"gc lt lu jl no b do np nq l nr\" data-selectable-paragraph=\"\"><strong class=\"no jm\">library<\/strong>(DMwR)<br \/>knnOutput &lt;- <strong class=\"no jm\">knnImputation<\/strong>(mydata)<\/span><span id=\"a95d\" class=\"gc lt lu jl no b do ns nt nu nv nw nq l nr\" data-selectable-paragraph=\"\">In python<br \/>from fancyimpute import KNN    <br \/><br \/># Use 5 nearest rows which have a feature to fill in each row's missing features<br \/>knnOutput = KNN(k=5).complete(mydata)<\/span><\/pre><p>Parmi toutes les m\u00e9thodes d\u00e9crites ci-dessus, l&rsquo;imputation multiple et le KNN sont largement utilis\u00e9s, et l&rsquo;imputation multiple \u00e9tant plus simple est g\u00e9n\u00e9ralement pr\u00e9f\u00e9r\u00e9e.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>P\u00e1gina de inicio de Wiki de an\u00e1lisis de datos Uno de los problemas m\u00e1s comunes que he enfrentado en la limpieza de datos\/an\u00e1lisis exploratorio es la gesti\u00f3n... <\/p>","protected":false},"author":1,"featured_media":0,"parent":15503,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-15859","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15859","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/comments?post=15859"}],"version-history":[{"count":3,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15859\/revisions"}],"predecessor-version":[{"id":15868,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15859\/revisions\/15868"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15503"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/media?parent=15859"}],"curies":[{"name":"gracias","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}