{"id":20824,"date":"2024-02-20T12:33:21","date_gmt":"2024-02-20T11:33:21","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=20824"},"modified":"2024-02-20T16:34:47","modified_gmt":"2024-02-20T15:34:47","slug":"detection-des-anomalies","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/","title":{"rendered":"Detecci\u00f3n de anomal\u00edas en series temporales"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"20824\" class=\"elementor elementor-20824\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3b896f1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3b896f1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-2539f32\" data-id=\"2539f32\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4c34254 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"4c34254\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/prediction-forecasting\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Forecasting<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-60b1604\" data-id=\"60b1604\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d4643fe elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"d4643fe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Page d'accueil<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-556751b\" data-id=\"556751b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c65e325 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"c65e325\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/plat.ai\/blog\/difference-between-prediction-and-forecast\/\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f572d00 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f572d00\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8e9bde9\" data-id=\"8e9bde9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b6fce6d elementor-widget elementor-widget-heading\" data-id=\"b6fce6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contenus<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Alternar tabla de contenidos\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-des-anomalies-dans-les-series-temporelles\" >D\u00e9tection des anomalies dans les s\u00e9ries temporelles<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Contexte\" >Contexte<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Point-Outlier\" >Point Outlier<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Sous-sequence-outlier\" >Sous-s\u00e9quence outlier<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-par-decomposition\" >D\u00e9tection par d\u00e9composition<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-par-arbres-de-regression\" >D\u00e9tection par arbres de r\u00e9gression<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-par-prediction\" >D\u00e9tection par pr\u00e9diction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-par-clustering\" >D\u00e9tection par clustering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Detection-par-autoencodeur\" >D\u00e9tection par autoencodeur<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Gerer-les-anomalies-avec-du-lissage\" >G\u00e9rer les anomalies avec du lissage<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/complex-systems-ai.com\/es\/pronostico-de-prediccion\/deteccion-de-anomalias\/#Effacer-les-anomalies\" >Effacer les anomalies<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-des-anomalies-dans-les-series-temporelles\"><\/span>D\u00e9tection des anomalies dans les s\u00e9ries temporelles<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-eeeacb5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"eeeacb5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-740792d\" data-id=\"740792d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8f6a93e elementor-widget elementor-widget-text-editor\" data-id=\"8f6a93e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Ce tutoriel r\u00e9pond \u00e0 la probl\u00e9matique de la d\u00e9tection des anomalies en pr\u00e9processing pour le forecasting des s\u00e9ries temporelles.<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"d\u00e9tection des anomalies\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3c0180c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3c0180c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-56cac8b\" data-id=\"56cac8b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b785c2b elementor-widget elementor-widget-heading\" data-id=\"b785c2b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Contexte\"><\/span>Contexte<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-bff0922 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"bff0922\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d1d958e\" data-id=\"d1d958e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-99d095f elementor-widget elementor-widget-text-editor\" data-id=\"99d095f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Lors de l\u2019<a href=\"https:\/\/complex-systems-ai.com\/es\/analisis-de-datos\/\">analyse des donn\u00e9es<\/a> de s\u00e9ries chronologiques, nous devons nous assurer des valeurs aberrantes, tout comme nous le faisons pour les donn\u00e9es statiques. Si vous avez travaill\u00e9 avec des donn\u00e9es \u00e0 quelque titre que ce soit, vous savez \u00e0 quel point les valeurs aberrantes sont p\u00e9nibles pour un analyste. Ces valeurs aberrantes sont appel\u00e9es \u00ab anomalies \u00bb dans le jargon des s\u00e9ries chronologiques.<\/p><p>Le code de Aayush Bajaj, \u00e0 l&rsquo;origine de ce tutoriel, est disponible : <a href=\"https:\/\/app.neptune.ai\/theaayushbajaj\/Anomaly-Detection\/n\/49ba1752-fc3a-4abb-b35f-0e2ea4fd4afa\/48dc19d8-3c75-4989-a2c0-67839393a093\" target=\"_blank\" rel=\"noopener\">PIPELINE<\/a><\/p><p>D\u2019un point de vue traditionnel, une valeur aberrante\/anomalie est\u00a0:<\/p><p>\u00ab\u00a0Une observation qui s&rsquo;\u00e9carte tellement des autres observations qu&rsquo;elle fait soup\u00e7onner qu&rsquo;elle a \u00e9t\u00e9 g\u00e9n\u00e9r\u00e9e par un m\u00e9canisme diff\u00e9rent.\u00a0\u00bb<\/p><p>Par cons\u00e9quent, vous pouvez consid\u00e9rer les valeurs aberrantes (outliers) comme des observations qui ne suivent pas le comportement attendu.<\/p><p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-20830 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast15.webp\" alt=\"outliers time series\" width=\"1000\" height=\"345\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast15.webp 1000w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast15-300x104.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast15-768x265.webp 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast15-18x6.webp 18w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p><p>Comme le montre la figure ci-dessus, les valeurs aberrantes dans les s\u00e9ries chronologiques peuvent avoir deux significations diff\u00e9rentes. La distinction s\u00e9mantique entre eux repose principalement sur votre int\u00e9r\u00eat en tant qu&rsquo;analyste ou sur le sc\u00e9nario particulier.<\/p><p>Ces observations ont \u00e9t\u00e9 li\u00e9es \u00e0 du bruit, \u00e0 des donn\u00e9es erron\u00e9es ou ind\u00e9sirables, qui en soi n\u2019int\u00e9ressent pas l\u2019analyste. Dans ces cas, les valeurs aberrantes doivent \u00eatre supprim\u00e9es ou corrig\u00e9es pour am\u00e9liorer la qualit\u00e9 des donn\u00e9es et g\u00e9n\u00e9rer un ensemble de donn\u00e9es plus propre pouvant \u00eatre utilis\u00e9 par d\u2019autres algorithmes d\u2019exploration de donn\u00e9es. Par exemple, les erreurs de transmission des capteurs sont \u00e9limin\u00e9es pour obtenir des pr\u00e9dictions plus pr\u00e9cises, car l\u2019objectif principal est de faire des pr\u00e9dictions.<\/p><p>N\u00e9anmoins, ces derni\u00e8res ann\u00e9es \u2013 notamment dans le domaine des donn\u00e9es de s\u00e9ries chronologiques \u2013 de nombreux chercheurs se sont efforc\u00e9s de d\u00e9tecter et d\u2019analyser des ph\u00e9nom\u00e8nes inhabituels mais int\u00e9ressants. La d\u00e9tection de la fraude est un bon exemple : l\u2019objectif principal est de d\u00e9tecter et d\u2019analyser la valeur aberrante elle-m\u00eame. Ces observations sont souvent appel\u00e9es anomalies.<\/p><p>Le probl\u00e8me de d\u00e9tection d&rsquo;anomalies pour les s\u00e9ries chronologiques est g\u00e9n\u00e9ralement formul\u00e9 comme l&rsquo;identification de points de donn\u00e9es aberrants par rapport \u00e0 une norme ou \u00e0 un signal habituel. Jetez un \u0153il \u00e0 quelques types de valeurs aberrantes\u00a0:<\/p><p><img decoding=\"async\" class=\"alignnone size-medium wp-image-20831\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast16-300x193.webp\" alt=\"outliers time series\" width=\"300\" height=\"193\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast16-300x193.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast16-18x12.webp 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast16.webp 587w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-dac16af elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dac16af\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-76c4f83\" data-id=\"76c4f83\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7ed0ce0 elementor-widget elementor-widget-heading\" data-id=\"7ed0ce0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Point-Outlier\"><\/span>Point Outlier<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7778b00 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7778b00\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3166729\" data-id=\"3166729\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-44ad6ce elementor-widget elementor-widget-text-editor\" data-id=\"44ad6ce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Une valeur aberrante ponctuelle est une donn\u00e9e qui se comporte de mani\u00e8re inhabituelle dans une instance temporelle sp\u00e9cifique par rapport aux autres valeurs de la s\u00e9rie chronologique (valeur aberrante globale) ou \u00e0 ses points voisins (valeur aberrante locale).<\/p><p>Les valeurs aberrantes ponctuelles peuvent \u00eatre univari\u00e9es ou multivari\u00e9es, selon qu\u2019elles affectent respectivement une ou plusieurs variables d\u00e9pendantes du temps.<\/p><p>La figure contient deux valeurs aberrantes ponctuelles univari\u00e9es, O1 et O2, tandis que la s\u00e9rie chronologique multivari\u00e9e est compos\u00e9e de trois variables et pr\u00e9sente des valeurs aberrantes ponctuelles univari\u00e9es (O3) et multivari\u00e9es (O1 et O2).<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20832 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast17.webp\" alt=\"outlier forecasting\" width=\"750\" height=\"301\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast17.webp 750w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast17-300x120.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast17-18x7.webp 18w\" sizes=\"(max-width: 750px) 100vw, 750px\" \/><\/p><p>Nous examinerons plus en d\u00e9tail les valeurs aberrantes univari\u00e9es dans la section D\u00e9tection des anomalies.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f45dd49 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f45dd49\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a7991b7\" data-id=\"a7991b7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-441acba elementor-widget elementor-widget-heading\" data-id=\"441acba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Sous-sequence-outlier\"><\/span>Sous-s\u00e9quence outlier<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-105468d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"105468d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7dbb92a\" data-id=\"7dbb92a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-606641f elementor-widget elementor-widget-text-editor\" data-id=\"606641f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Cela signifie des points cons\u00e9cutifs dans le temps dont le comportement conjoint est inhabituel, bien que chaque observation prise individuellement ne soit pas n\u00e9cessairement un point aberrant. Les valeurs aberrantes de sous-s\u00e9quence peuvent \u00e9galement \u00eatre globales ou locales et peuvent affecter une (valeur aberrante de sous-s\u00e9quence univari\u00e9e) ou plusieurs (valeur aberrante de sous-s\u00e9quence multivari\u00e9e) variables d\u00e9pendantes du temps.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20833 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast18.webp\" alt=\"outlier forecasting\" width=\"1000\" height=\"394\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast18.webp 1000w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast18-300x118.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast18-768x303.webp 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast18-18x7.webp 18w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-89131f8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"89131f8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4ae7c1d\" data-id=\"4ae7c1d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c662c87 elementor-widget elementor-widget-heading\" data-id=\"c662c87\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-par-decomposition\"><\/span>D\u00e9tection par d\u00e9composition<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5810443 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5810443\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-58e2e90\" data-id=\"58e2e90\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9530c70 elementor-widget elementor-widget-text-editor\" data-id=\"9530c70\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>STL signifie proc\u00e9dure de d\u00e9composition des tendances saisonni\u00e8res bas\u00e9e sur LOESS. Cette technique vous donne la possibilit\u00e9 de diviser votre signal de s\u00e9rie chronologique en trois parties\u00a0: saisonni\u00e8re, tendance et r\u00e9siduelle.<\/p><p>Il fonctionne pour les s\u00e9ries chronologiques saisonni\u00e8res, qui sont \u00e9galement le type de donn\u00e9es de s\u00e9ries chronologiques le plus populaire. Pour g\u00e9n\u00e9rer un trac\u00e9 de d\u00e9composition STL, nous utilisons simplement les mod\u00e8les de statistiques toujours \u00e9tonnants pour faire le gros du travail \u00e0 notre place.<\/p><pre class=\"hljs\">plt.rc(<span class=\"hljs-string\">'figure'<\/span>,figsize=(<span class=\"hljs-number\">12<\/span>,<span class=\"hljs-number\">8<\/span>))\nplt.rc(<span class=\"hljs-string\">'font'<\/span>,size=<span class=\"hljs-number\">15<\/span>)\nresult = seasonal_decompose(lim_catfish_sales,model=<span class=\"hljs-string\">'additive'<\/span>)\nfig = result.plot()<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20834 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast19.webp\" alt=\"outlier forecasting\" width=\"845\" height=\"558\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast19.webp 845w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast19-300x198.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast19-768x507.webp 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast19-18x12.webp 18w\" sizes=\"(max-width: 845px) 100vw, 845px\" \/><\/p><p>Si nous analysons l\u2019\u00e9cart des r\u00e9sidus et introduisons un seuil pour celui-ci, nous obtiendrons un <a href=\"https:\/\/complex-systems-ai.com\/es\/algoritmico\/\">algorithme<\/a> de d\u00e9tection d\u2019anomalies. Pour mettre en \u0153uvre cela, nous n\u2019avons besoin que des donn\u00e9es sur les r\u00e9sidus de la d\u00e9composition.<\/p><pre class=\"hljs\">plt.rc(<span class=\"hljs-string\">'figure'<\/span>,figsize=(<span class=\"hljs-number\">12<\/span>,<span class=\"hljs-number\">6<\/span>))\nplt.rc(<span class=\"hljs-string\">'font'<\/span>,size=<span class=\"hljs-number\">15<\/span>)\nfig, ax = plt.subplots()\nx = result.resid.index\ny = result.resid.values\nax.plot_date(x, y, color=<span class=\"hljs-string\">'black'<\/span>,linestyle=<span class=\"hljs-string\">'--'<\/span>)\nax.annotate(<span class=\"hljs-string\">'Anomaly'<\/span>, (mdates.date2num(x[<span class=\"hljs-number\">35<\/span>]), y[<span class=\"hljs-number\">35<\/span>]), xytext=(<span class=\"hljs-number\">30<\/span>, <span class=\"hljs-number\">20<\/span>),\n          textcoords=<span class=\"hljs-string\">'offset points'<\/span>, color=<span class=\"hljs-string\">'red'<\/span>,arrowprops=dict(facecolor=<span class=\"hljs-string\">'red'<\/span>,arrowstyle=<span class=\"hljs-string\">'fancy'<\/span>))\nfig.autofmt_xdate()\nplt.show()<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20835 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast20.webp\" alt=\"outlier forecasting\" width=\"724\" height=\"346\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast20.webp 724w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast20-300x143.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast20-18x9.webp 18w\" sizes=\"(max-width: 724px) 100vw, 724px\" \/><\/p><p>C\u2019est simple, robuste, il peut g\u00e9rer de nombreuses situations diff\u00e9rentes et toutes les anomalies peuvent toujours \u00eatre interpr\u00e9t\u00e9es intuitivement.<\/p><p>Le plus gros inconv\u00e9nient de cette technique r\u00e9side dans les options de r\u00e9glage rigides. Hormis le seuil et peut-\u00eatre l\u2019intervalle de confiance, vous ne pouvez pas faire grand-chose \u00e0 ce sujet. Par exemple, vous suivez les utilisateurs de votre site Web qui a \u00e9t\u00e9 ferm\u00e9 au public puis a \u00e9t\u00e9 soudainement ouvert. Dans ce cas, vous devez suivre s\u00e9par\u00e9ment les anomalies qui se produisent avant et apr\u00e8s les p\u00e9riodes de lancement.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-281d372 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"281d372\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4ad94d3\" data-id=\"4ad94d3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f87dab7 elementor-widget elementor-widget-heading\" data-id=\"f87dab7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-par-arbres-de-regression\"><\/span>D\u00e9tection par arbres de r\u00e9gression<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-64ecff8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"64ecff8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2210523\" data-id=\"2210523\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ab99969 elementor-widget elementor-widget-text-editor\" data-id=\"ab99969\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Nous pouvons utiliser la puissance et la robustesse des arbres de d\u00e9cision pour identifier les valeurs aberrantes\/anomalies dans les donn\u00e9es de s\u00e9ries chronologiques.<\/p><p>L&rsquo;id\u00e9e principale, qui diff\u00e8re des autres m\u00e9thodes populaires de d\u00e9tection des valeurs aberrantes, est qu&rsquo;Isolement Forest identifie explicitement les anomalies au lieu de profiler les points de donn\u00e9es normaux. Isolation Forest, comme toute m\u00e9thode d\u2019ensemble d\u2019arbres, est bas\u00e9e sur des arbres de d\u00e9cision.<\/p><p>En d\u2019autres termes, Isolation Forest d\u00e9tecte les anomalies uniquement sur la base du fait que les anomalies sont des points de donn\u00e9es peu nombreux et diff\u00e9rents. L&rsquo;isolement des anomalies est mis en \u0153uvre sans recourir \u00e0 aucune mesure de distance ou de densit\u00e9.<\/p><p>Lors de l&rsquo;application d&rsquo;un mod\u00e8le IsolationForest, nous d\u00e9finissons contamination = outliers_fraction, ce qui indique au mod\u00e8le quelle proportion de valeurs aberrantes est pr\u00e9sente dans les donn\u00e9es. Il s\u2019agit d\u2019une m\u00e9trique d\u2019essai\/erreur. Ajuster et pr\u00e9dire (donn\u00e9es) effectue une d\u00e9tection des valeurs aberrantes sur les donn\u00e9es et renvoie 1 pour la normale, -1 pour l&rsquo;anomalie. Enfin, nous visualisons les anomalies avec la vue Time Series.<\/p><p>Faisons-le \u00e9tape par \u00e9tape. Tout d\u2019abord, visualisez les donn\u00e9es de la s\u00e9rie chronologique\u00a0:<\/p><pre class=\"hljs\">plt.rc(<span class=\"hljs-string\">'figure'<\/span>,figsize=(<span class=\"hljs-number\">12<\/span>,<span class=\"hljs-number\">6<\/span>))\nplt.rc(<span class=\"hljs-string\">'font'<\/span>,size=<span class=\"hljs-number\">15<\/span>)\ncatfish_sales.plot()<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20836 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast21.webp\" alt=\"outlier forecasting\" width=\"724\" height=\"374\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast21.webp 724w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast21-300x155.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast21-18x9.webp 18w\" sizes=\"(max-width: 724px) 100vw, 724px\" \/><\/p><p>Ensuite, nous devons d\u00e9finir certains param\u00e8tres comme la fraction aberrante et entra\u00eener notre mod\u00e8le IsolationForest. Nous pouvons utiliser le scikit-learn tr\u00e8s utile pour impl\u00e9menter l&rsquo;algorithme Isolation Forest (voir git au d\u00e9but de la page).<\/p><pre class=\"hljs\">outliers_fraction = float(<span class=\"hljs-number\">.01<\/span>)\nscaler = StandardScaler()\nnp_scaled = scaler.fit_transform(catfish_sales.values.reshape(<span class=\"hljs-number\">-1<\/span>, <span class=\"hljs-number\">1<\/span>))\ndata = pd.DataFrame(np_scaled)\n<span class=\"hljs-comment\"># train isolation forest<\/span>\nmodel =  IsolationForest(contamination=outliers_fraction)\nmodel.fit(data)<\/pre><pre class=\"hljs\">catfish_sales[<span class=\"hljs-string\">'anomaly'<\/span>] = model.predict(data)\n<span class=\"hljs-comment\"># visualization<\/span>\nfig, ax = plt.subplots(figsize=(<span class=\"hljs-number\">10<\/span>,<span class=\"hljs-number\">6<\/span>))\na = catfish_sales.loc[catfish_sales[<span class=\"hljs-string\">'anomaly'<\/span>] == <span class=\"hljs-number\">-1<\/span>, [<span class=\"hljs-string\">'Total'<\/span>]] <span class=\"hljs-comment\">#anomaly<\/span>\nax.plot(catfish_sales.index, catfish_sales[<span class=\"hljs-string\">'Total'<\/span>], color=<span class=\"hljs-string\">'black'<\/span>, label = <span class=\"hljs-string\">'Normal'<\/span>)\nax.scatter(a.index,a[<span class=\"hljs-string\">'Total'<\/span>], color=<span class=\"hljs-string\">'red'<\/span>, label = <span class=\"hljs-string\">'Anomaly'<\/span>)\nplt.legend()\nplt.show();<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20837 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast22.webp\" alt=\"outlier forecasting\" width=\"612\" height=\"360\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast22.webp 612w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast22-300x176.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast22-18x12.webp 18w\" sizes=\"(max-width: 612px) 100vw, 612px\" \/><\/p><p>Comme vous pouvez le constater, l\u2019algorithme a fait un tr\u00e8s bon travail en identifiant nos anomalies implant\u00e9es, mais il a \u00e9galement qualifi\u00e9 au d\u00e9but quelques points de \u00ab valeurs aberrantes \u00bb. Cela est d\u00fb \u00e0 deux raisons :<\/p><ul><li>Au d\u00e9part, l\u2019algorithme est assez na\u00eff pour pouvoir comprendre ce qui constitue une anomalie. Plus il obtient de donn\u00e9es, plus il est capable de voir de variance et il s\u2019ajuste lui-m\u00eame.<\/li><li>Si vous voyez beaucoup de vrais n\u00e9gatifs, cela signifie que votre param\u00e8tre de contamination est trop \u00e9lev\u00e9. \u00c0 l\u2019inverse, si vous ne voyez pas les points rouges l\u00e0 o\u00f9 ils devraient \u00eatre, le param\u00e8tre de contamination est r\u00e9gl\u00e9 trop bas.<\/li><\/ul><p>Le plus grand avantage de cette technique est que vous pouvez introduire autant de variables ou de fonctionnalit\u00e9s al\u00e9atoires que vous le souhaitez pour cr\u00e9er des mod\u00e8les plus sophistiqu\u00e9s.<\/p><p>La faiblesse est qu\u2019un nombre croissant de fonctionnalit\u00e9s peuvent commencer \u00e0 avoir un impact assez rapidement sur vos performances de calcul. Dans ce cas, vous devez s\u00e9lectionner les fonctionnalit\u00e9s avec soin.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ff24d45 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ff24d45\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-455672d\" data-id=\"455672d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4b7453a elementor-widget elementor-widget-heading\" data-id=\"4b7453a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-par-prediction\"><\/span>D\u00e9tection par pr\u00e9diction<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e4202fd elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e4202fd\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4775b7c\" data-id=\"4775b7c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0ef4dba elementor-widget elementor-widget-text-editor\" data-id=\"0ef4dba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La d\u00e9tection d&rsquo;anomalies \u00e0 l&rsquo;aide de la pr\u00e9vision est bas\u00e9e sur une approche selon laquelle plusieurs points du pass\u00e9 g\u00e9n\u00e8rent une pr\u00e9vision du point suivant avec l&rsquo;ajout d&rsquo;une variable al\u00e9atoire, qui est g\u00e9n\u00e9ralement du bruit blanc.<\/p><p>Comme vous pouvez l\u2019imaginer, les points pr\u00e9vus dans le futur g\u00e9n\u00e9reront de nouveaux points et ainsi de suite. Son effet \u00e9vident sur l\u2019horizon de pr\u00e9vision est que le signal devient plus fluide.<\/p><p>La partie difficile de l&rsquo;utilisation de cette m\u00e9thode est que vous devez s\u00e9lectionner le nombre de diff\u00e9rences, le nombre d&rsquo;autor\u00e9gressions et les coefficients d&rsquo;erreur de pr\u00e9vision. Chaque fois que vous travaillez avec un nouveau signal, vous devez cr\u00e9er un nouveau mod\u00e8le de pr\u00e9vision.<\/p><p>Un autre obstacle est que votre signal doit rester stationnaire apr\u00e8s la diff\u00e9renciation. En termes simples, cela signifie que votre signal ne doit pas d\u00e9pendre du temps, ce qui constitue une contrainte importante.<\/p><p>Nous pouvons utiliser diff\u00e9rentes m\u00e9thodes de pr\u00e9vision telles que les moyennes mobiles, l&rsquo;approche autor\u00e9gressive et ARIMA avec ses diff\u00e9rentes variantes. La proc\u00e9dure de d\u00e9tection des anomalies avec ARIMA est la suivante\u00a0:<\/p><ul><li>Pr\u00e9disez le nouveau point \u00e0 partir des donn\u00e9es pass\u00e9es et trouvez la diff\u00e9rence d&rsquo;ampleur avec celles des donn\u00e9es d&rsquo;entra\u00eenement.<\/li><li>Choisissez un seuil et identifiez les anomalies en fonction de ce seuil de diff\u00e9rence.<\/li><\/ul><p>Pour tester cette technique, nous allons utiliser un module populaire dans les s\u00e9ries chronologiques appel\u00e9 fbprophet. Ce module s&rsquo;adresse sp\u00e9cifiquement \u00e0 la stationnarit\u00e9 et \u00e0 la saisonnalit\u00e9, et peut \u00eatre r\u00e9gl\u00e9 avec certains hyper-param\u00e8tres.<\/p><pre class=\"hljs\"><span class=\"hljs-keyword\">from<\/span> fbprophet <span class=\"hljs-keyword\">import<\/span> Prophet<\/pre><pre class=\"hljs\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">fit_predict_model<\/span><span class=\"hljs-params\">(dataframe, interval_width = <span class=\"hljs-number\">0.99<\/span>, changepoint_range = <span class=\"hljs-number\">0.8<\/span>)<\/span>:<\/span>\n   m = Prophet(daily_seasonality = <span class=\"hljs-keyword\">False<\/span>, yearly_seasonality = <span class=\"hljs-keyword\">False<\/span>, weekly_seasonality = <span class=\"hljs-keyword\">False<\/span>,\n               seasonality_mode = <span class=\"hljs-string\">'additive'<\/span>,\n               interval_width = interval_width,\n               changepoint_range = changepoint_range)\n   m = m.fit(dataframe)\n   forecast = m.predict(dataframe)\n   forecast[<span class=\"hljs-string\">'fact'<\/span>] = dataframe[<span class=\"hljs-string\">'y'<\/span>].reset_index(drop = <span class=\"hljs-keyword\">True<\/span>)\n   <span class=\"hljs-keyword\">return<\/span> forecast\n\npred = fit_predict_model(t)<\/pre><p>D\u00e9finissons maintenant la fonction de pr\u00e9vision. Une chose importante \u00e0 noter ici est que fbprophet ajoutera des m\u00e9triques suppl\u00e9mentaires en tant que fonctionnalit\u00e9s, afin de mieux identifier les anomalies. Par exemple, la variable de s\u00e9rie chronologique pr\u00e9vue (par le mod\u00e8le), les limites sup\u00e9rieure et inf\u00e9rieure de la variable de s\u00e9rie chronologique cible et la mesure de tendance.<\/p><p>Nous devons maintenant pousser la variable pred vers une autre fonction, qui d\u00e9tectera les anomalies en fonction d&rsquo;un seuil de limite inf\u00e9rieure et sup\u00e9rieure dans la variable de s\u00e9rie temporelle.<\/p><pre class=\"hljs\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">detect_anomalies<\/span><span class=\"hljs-params\">(forecast)<\/span>:<\/span>\n   forecasted = forecast[[<span class=\"hljs-string\">'ds'<\/span>,<span class=\"hljs-string\">'trend'<\/span>, <span class=\"hljs-string\">'yhat'<\/span>, <span class=\"hljs-string\">'yhat_lower'<\/span>, <span class=\"hljs-string\">'yhat_upper'<\/span>, <span class=\"hljs-string\">'fact'<\/span>]].copy()\nforecasted[<span class=\"hljs-string\">'anomaly'<\/span>] = <span class=\"hljs-number\">0<\/span>\n   forecasted.loc[forecasted[<span class=\"hljs-string\">'fact'<\/span>] &gt; forecasted[<span class=\"hljs-string\">'yhat_upper'<\/span>], <span class=\"hljs-string\">'anomaly'<\/span>] = <span class=\"hljs-number\">1<\/span>\n   forecasted.loc[forecasted[<span class=\"hljs-string\">'fact'<\/span>] &lt; forecasted[<span class=\"hljs-string\">'yhat_lower'<\/span>], <span class=\"hljs-string\">'anomaly'<\/span>] = <span class=\"hljs-number\">-1<\/span>\n<span class=\"hljs-comment\">#anomaly importances<\/span>\n   forecasted[<span class=\"hljs-string\">'importance'<\/span>] = <span class=\"hljs-number\">0<\/span>\n   forecasted.loc[forecasted[<span class=\"hljs-string\">'anomaly'<\/span>] ==<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'importance'<\/span>] =\n       (forecasted[<span class=\"hljs-string\">'fact'<\/span>] - forecasted[<span class=\"hljs-string\">'yhat_upper'<\/span>])\/forecast[<span class=\"hljs-string\">'fact'<\/span>]\n   forecasted.loc[forecasted[<span class=\"hljs-string\">'anomaly'<\/span>] ==<span class=\"hljs-number\">-1<\/span>, <span class=\"hljs-string\">'importance'<\/span>] =\n       (forecasted[<span class=\"hljs-string\">'yhat_lower'<\/span>] - forecasted[<span class=\"hljs-string\">'fact'<\/span>])\/forecast[<span class=\"hljs-string\">'fact'<\/span>]\n\n   <span class=\"hljs-keyword\">return<\/span> forecasted\npred = detect_anomalies(pred)<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20838 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast23.webp\" alt=\"outlier forecasting\" width=\"933\" height=\"516\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast23.webp 933w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast23-300x166.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast23-768x425.webp 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast23-18x10.webp 18w\" sizes=\"(max-width: 933px) 100vw, 933px\" \/><\/p><p>Cet algorithme g\u00e8re bien diff\u00e9rents param\u00e8tres de saisonnalit\u00e9, tels que mensuels ou annuels, et prend en charge nativement toutes les m\u00e9triques de s\u00e9ries chronologiques.<\/p><p>\u00c9tant donn\u00e9 que cette technique est bas\u00e9e sur la pr\u00e9vision, elle aura des difficult\u00e9s dans des sc\u00e9narios de donn\u00e9es limit\u00e9es. La qualit\u00e9 de la pr\u00e9diction dans des donn\u00e9es limit\u00e9es sera moindre, tout comme la pr\u00e9cision de la d\u00e9tection des anomalies.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f9927a6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f9927a6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-968e7a2\" data-id=\"968e7a2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7e0f525 elementor-widget elementor-widget-heading\" data-id=\"7e0f525\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-par-clustering\"><\/span>D\u00e9tection par clustering<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-aef4d47 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"aef4d47\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-aaeb51d\" data-id=\"aaeb51d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7f424ed elementor-widget elementor-widget-text-editor\" data-id=\"7f424ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>L\u2019approche est assez simple. Les instances de donn\u00e9es qui ne font pas partie des clusters d\u00e9finis peuvent potentiellement \u00eatre marqu\u00e9es comme anomalies.<\/p><p>Par souci de visualisation, nous utiliserons un ensemble de donn\u00e9es diff\u00e9rent qui correspond \u00e0 une s\u00e9rie temporelle multivari\u00e9e avec une ou plusieurs variables temporelles. L&rsquo;ensemble de donn\u00e9es sera un sous-ensemble de celui trouv\u00e9 ici (les colonnes\/caract\u00e9ristiques sont les m\u00eames).<\/p><p>Description de l&rsquo;ensemble de donn\u00e9es\u00a0: Les donn\u00e9es contiennent des informations sur les achats et les achats ainsi que des informations sur la comp\u00e9titivit\u00e9 des prix.<\/p><p>Maintenant, pour traiter les k-moyennes, nous devons d\u2019abord conna\u00eetre le <a href=\"https:\/\/complex-systems-ai.com\/es\/particionamiento-de-datos\/calidad-sobre-numero-de-clusteres\/\">nombre de clusters<\/a> que nous allons traiter. La m\u00e9thode Elbow fonctionne assez efficacement pour cela.<\/p><p>La m\u00e9thode Elbow est un graphique du nombre de clusters par rapport \u00e0 la variance expliqu\u00e9e\/objectif\/score<\/p><p>Pour impl\u00e9menter cela, nous utiliserons l\u2019impl\u00e9mentation de K-means par scikit-learn.<\/p><pre class=\"hljs\">data = df[[<span class=\"hljs-string\">'price_usd'<\/span>, <span class=\"hljs-string\">'srch_booking_window'<\/span>, <span class=\"hljs-string\">'srch_saturday_night_bool'<\/span>]]\nn_cluster = range(<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">20<\/span>)\nkmeans = [KMeans(n_clusters=i).fit(data) <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> n_cluster]\nscores = [kmeans[i].score(data) <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(len(kmeans))]\nfig, ax = plt.subplots(figsize=(<span class=\"hljs-number\">10<\/span>,<span class=\"hljs-number\">6<\/span>))\nax.plot(n_cluster, scores)\nplt.xlabel(<span class=\"hljs-string\">'Number of Clusters'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Score'<\/span>)\nplt.title(<span class=\"hljs-string\">'Elbow Curve'<\/span>)\nplt.show();<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20839 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast24.webp\" alt=\"outlier forecasting\" width=\"639\" height=\"388\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast24.webp 639w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast24-300x182.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast24-18x12.webp 18w\" sizes=\"(max-width: 639px) 100vw, 639px\" \/><\/p><p>\u00c0 partir de la courbe coud\u00e9e ci-dessus, nous voyons que le graphique se stabilise apr\u00e8s 10 clusters, ce qui implique que l&rsquo;ajout de clusters suppl\u00e9mentaires n&rsquo;explique pas beaucoup plus la variance de notre variable pertinente\u00a0; dans ce cas, price_usd.<\/p><p>Nous d\u00e9finissons n_clusters = 10 et, lors de la g\u00e9n\u00e9ration de la sortie k-means, utilisons les donn\u00e9es pour tracer les clusters 3D.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20840 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast25.webp\" alt=\"outlier forecasting\" width=\"605\" height=\"483\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast25.webp 605w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast25-300x240.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast25-15x12.webp 15w\" sizes=\"(max-width: 605px) 100vw, 605px\" \/><\/p><p>Il nous faut maintenant conna\u00eetre le nombre de composants (fonctionnalit\u00e9s) \u00e0 conserver.<\/p><pre class=\"hljs\">data = df[[<span class=\"hljs-string\">'price_usd'<\/span>, <span class=\"hljs-string\">'srch_booking_window'<\/span>, <span class=\"hljs-string\">'srch_saturday_night_bool'<\/span>]]\nX = data.values\nX_std = StandardScaler().fit_transform(X)\n<span class=\"hljs-comment\">#Calculating Eigenvecors and eigenvalues of Covariance matrix<\/span>\nmean_vec = np.mean(X_std, axis=<span class=\"hljs-number\">0<\/span>)\ncov_mat = np.cov(X_std.T)\neig_vals, eig_vecs = np.linalg.eig(cov_mat)\n<span class=\"hljs-comment\"># Create a list of (eigenvalue, eigenvector) tuples<\/span>\neig_pairs = [ (np.abs(eig_vals[i]),eig_vecs[:,i]) <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(len(eig_vals))]\neig_pairs.sort(key = <span class=\"hljs-keyword\">lambda<\/span> x: x[<span class=\"hljs-number\">0<\/span>], reverse= <span class=\"hljs-keyword\">True<\/span>)\n<span class=\"hljs-comment\"># Calculation of Explained Variance from the eigenvalues<\/span>\ntot = sum(eig_vals)\nvar_exp = [(i\/tot)*<span class=\"hljs-number\">100<\/span> <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> sorted(eig_vals, reverse=<span class=\"hljs-keyword\">True<\/span>)] <span class=\"hljs-comment\"># Individual explained variance<\/span>\ncum_var_exp = np.cumsum(var_exp) <span class=\"hljs-comment\"># Cumulative explained variance<\/span>\nplt.figure(figsize=(<span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">5<\/span>))\nplt.bar(range(len(var_exp)), var_exp, alpha=<span class=\"hljs-number\">0.3<\/span>, align=<span class=\"hljs-string\">'center'<\/span>, label=<span class=\"hljs-string\">'individual explained variance'<\/span>, color = <span class=\"hljs-string\">'y'<\/span>)\nplt.step(range(len(cum_var_exp)), cum_var_exp, where=<span class=\"hljs-string\">'mid'<\/span>,label=<span class=\"hljs-string\">'cumulative explained variance'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Explained variance ratio'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'Principal components'<\/span>)\nplt.legend(loc=<span class=\"hljs-string\">'best'<\/span>)\nplt.show();<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20841 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast26.webp\" alt=\"outlier detection\" width=\"614\" height=\"319\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast26.webp 614w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast26-300x156.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast26-18x9.webp 18w\" sizes=\"(max-width: 614px) 100vw, 614px\" \/><\/p><p>On voit que la premi\u00e8re composante explique pr\u00e8s de 50 % de la variance. La deuxi\u00e8me composante explique plus de 30 %. Cependant, notez que presque aucun des composants n\u2019est vraiment n\u00e9gligeable. Les 2 premiers composants contiennent plus de 80 % des informations. Nous d\u00e9finirons donc n_components=2.<\/p><p>L&rsquo;hypoth\u00e8se sous-jacente \u00e0 la d\u00e9tection des anomalies bas\u00e9e sur le <a href=\"https:\/\/complex-systems-ai.com\/es\/particionamiento-de-datos\/\">clustering<\/a> est que si nous regroupons les donn\u00e9es, les donn\u00e9es normales appartiendront \u00e0 des clusters tandis que les anomalies n&rsquo;appartiendront \u00e0 aucun cluster ou appartiendront \u00e0 de petits clusters.<\/p><p>Nous utilisons les \u00e9tapes suivantes pour rechercher et visualiser les anomalies\u00a0:<\/p><ul><li>Calculez la distance entre chaque point et son centre de gravit\u00e9 le plus proche. Les plus grandes distances sont consid\u00e9r\u00e9es comme des anomalies.<\/li><li>Nous utilisons outliers_fraction pour fournir des informations \u00e0 l&rsquo;algorithme sur la proportion de valeurs aberrantes pr\u00e9sentes dans notre ensemble de donn\u00e9es, de la m\u00eame mani\u00e8re que l&rsquo;algorithme IsolationForest. Il s\u2019agit en grande partie d\u2019un hyperparam\u00e8tre qui n\u00e9cessite un hit\/essai ou une recherche sur grille pour \u00eatre correctement d\u00e9fini \u2013 comme chiffre de d\u00e9part, estimons, outliers_fraction=0,1<\/li><li>Calculez number_of_outliers \u00e0 l\u2019aide de outliers_fraction.<\/li><li>D\u00e9finissez le seuil comme distance minimale de ces valeurs aberrantes.<\/li><li>Le r\u00e9sultat de l&rsquo;anomalie d&rsquo;anomalie1 contient la m\u00e9thode ci-dessus Cluster (0\u00a0:\u00a0normal, 1\u00a0:\u00a0anomalie).<\/li><li>Visualisez les anomalies avec la vue cluster.<\/li><li>Visualisez les anomalies avec la vue Time Series.<\/li><\/ul><pre class=\"hljs\"><span class=\"hljs-comment\"># return Series of distance between each point and its distance with the closest centroid<\/span>\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">getDistanceByPoint<\/span><span class=\"hljs-params\">(data, model)<\/span>:<\/span>\n   distance = pd.Series()\n   <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(<span class=\"hljs-number\">0<\/span>,len(data)):\n       Xa = np.array(data.loc[i])\n       Xb = model.cluster_centers_[model.labels_[i]<span class=\"hljs-number\">-1<\/span>]\n       distance.at[i]=np.linalg.norm(Xa-Xb)\n   <span class=\"hljs-keyword\">return<\/span> distance\noutliers_fraction = <span class=\"hljs-number\">0.1<\/span>\n<span class=\"hljs-comment\"># get the distance between each point and its nearest centroid. The biggest distances are considered as anomaly<\/span>\ndistance = getDistanceByPoint(data, kmeans[<span class=\"hljs-number\">9<\/span>])\nnumber_of_outliers = int(outliers_fraction*len(distance))\nthreshold = distance.nlargest(number_of_outliers).min()\n<span class=\"hljs-comment\"># anomaly1 contain the anomaly result of the above method Cluster (0:normal, 1:anomaly)<\/span>\ndf[<span class=\"hljs-string\">'anomaly1'<\/span>] = (distance &gt;= threshold).astype(int)\nfig, ax = plt.subplots(figsize=(<span class=\"hljs-number\">10<\/span>,<span class=\"hljs-number\">6<\/span>))\ncolors = {<span class=\"hljs-number\">0<\/span>:<span class=\"hljs-string\">'blue'<\/span>, <span class=\"hljs-number\">1<\/span>:<span class=\"hljs-string\">'red'<\/span>}\nax.scatter(df[<span class=\"hljs-string\">'principal_feature1'<\/span>], df[<span class=\"hljs-string\">'principal_feature2'<\/span>], c=df[<span class=\"hljs-string\">\"anomaly1\"<\/span>].apply(<span class=\"hljs-keyword\">lambda<\/span> x: colors[x]))\nplt.xlabel(<span class=\"hljs-string\">'principal feature1'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'principal feature2'<\/span>)\nplt.show();<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20842 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast27.webp\" alt=\"outlier forecasting\" width=\"617\" height=\"374\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast27.webp 617w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast27-300x182.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast27-18x12.webp 18w\" sizes=\"(max-width: 617px) 100vw, 617px\" \/><\/p><p>D\u00e9sormais, afin de voir les anomalies par rapport aux fonctionnalit\u00e9s du monde r\u00e9el, nous traitons la trame de donn\u00e9es que nous avons cr\u00e9\u00e9e \u00e0 l&rsquo;\u00e9tape pr\u00e9c\u00e9dente.<\/p><pre class=\"hljs\">df = df.sort_values(<span class=\"hljs-string\">'date_time'<\/span>)\nfig, ax = plt.subplots(figsize=(<span class=\"hljs-number\">10<\/span>,<span class=\"hljs-number\">6<\/span>))\na = df.loc[df[<span class=\"hljs-string\">'anomaly1'<\/span>] == <span class=\"hljs-number\">1<\/span>, [<span class=\"hljs-string\">'date_time'<\/span>, <span class=\"hljs-string\">'price_usd'<\/span>]] <span class=\"hljs-comment\">#anomaly<\/span>\nax.plot(pd.to_datetime(df[<span class=\"hljs-string\">'date_time'<\/span>]), df[<span class=\"hljs-string\">'price_usd'<\/span>], color=<span class=\"hljs-string\">'k'<\/span>,label=<span class=\"hljs-string\">'Normal'<\/span>)\nax.scatter(pd.to_datetime(a[<span class=\"hljs-string\">'date_time'<\/span>]),a[<span class=\"hljs-string\">'price_usd'<\/span>], color=<span class=\"hljs-string\">'red'<\/span>, label=<span class=\"hljs-string\">'Anomaly'<\/span>)\nax.xaxis_date()\nplt.xlabel(<span class=\"hljs-string\">'Date Time'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'price in USD'<\/span>)\nplt.legend()\nfig.autofmt_xdate()\nplt.show()<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20843 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast28.webp\" alt=\"outlier detection\" width=\"614\" height=\"360\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast28.webp 614w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast28-300x176.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast28-18x12.webp 18w\" sizes=\"(max-width: 614px) 100vw, 614px\" \/><\/p><p>Cette m\u00e9thode est capable d&rsquo;encapsuler assez bien les pics, avec quelques rat\u00e9s bien s\u00fbr. Une partie du probl\u00e8me peut \u00eatre due au fait que outlier_fraction n\u2019a pas jou\u00e9 avec de nombreuses valeurs.<\/p><p>Le plus grand avantage de cette technique est similaire \u00e0 d&rsquo;autres techniques non supervis\u00e9es, \u00e0 savoir que vous pouvez introduire autant de variables ou de fonctionnalit\u00e9s al\u00e9atoires que vous le souhaitez pour cr\u00e9er des mod\u00e8les plus sophistiqu\u00e9s.<\/p><p>La faiblesse est qu\u2019un nombre croissant de fonctionnalit\u00e9s peuvent commencer \u00e0 avoir un impact assez rapidement sur vos performances de calcul. En plus de cela, il y a davantage d&rsquo;hyper-param\u00e8tres \u00e0 r\u00e9gler et \u00e0 corriger, il y a donc toujours un risque de variation \u00e9lev\u00e9e des performances du mod\u00e8le.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-96a304c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"96a304c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1b597b4\" data-id=\"1b597b4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f2d54f2 elementor-widget elementor-widget-heading\" data-id=\"f2d54f2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Detection-par-autoencodeur\"><\/span>D\u00e9tection par autoencodeur<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-edbc442 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"edbc442\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2eaf4ee\" data-id=\"2eaf4ee\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-65066a3 elementor-widget elementor-widget-text-editor\" data-id=\"65066a3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Les auto-encodeurs sont une technique non supervis\u00e9e qui recr\u00e9e les donn\u00e9es d&rsquo;entr\u00e9e tout en extrayant leurs caract\u00e9ristiques \u00e0 travers diff\u00e9rentes dimensions. Donc, en d\u2019autres termes, si l\u2019on utilise la repr\u00e9sentation latente des donn\u00e9es des auto-encodeurs, cela correspond \u00e0 une r\u00e9duction de dimensionnalit\u00e9. De nombreuses techniques bas\u00e9es sur la distance (par exemple les KNN) souffrent de la mal\u00e9diction de la dimensionnalit\u00e9 lorsqu&rsquo;elles calculent les distances de chaque point de donn\u00e9es dans l&rsquo;espace complet des fonctionnalit\u00e9s. La haute dimensionnalit\u00e9 doit \u00eatre r\u00e9duite.<\/p><p>Il existe de nombreux outils utiles, tels que l&rsquo;analyse en composantes principales (ACP), pour d\u00e9tecter les valeurs aberrantes. Pourquoi avons-nous besoin d\u2019auto-encodeurs\u00a0? La raison en est que l\u2019ACP utilise l\u2019<a href=\"https:\/\/complex-systems-ai.com\/es\/logica-matematica-27\/\">alg\u00e8bre<\/a> lin\u00e9aire pour transformer. En revanche, les techniques d&rsquo;auto-encodeur peuvent effectuer des transformations non lin\u00e9aires gr\u00e2ce \u00e0 leur fonction d&rsquo;activation non lin\u00e9aire et leurs multiples couches.\u00a0<\/p><p>Il est plus efficace d\u2019entra\u00eener plusieurs couches avec un encodeur automatique, plut\u00f4t que d\u2019entra\u00eener une \u00e9norme transformation avec PCA. Les techniques d\u2019auto-encodeur montrent ainsi leurs m\u00e9rites lorsque les probl\u00e8mes de donn\u00e9es sont de nature complexe et non lin\u00e9aire.<\/p><p>Nous pouvons impl\u00e9menter des auto-encodeurs avec des frameworks populaires comme TensorFlow ou Pytorch, mais \u2013 par souci de simplicit\u00e9 \u2013 nous allons utiliser un module python appel\u00e9 PyOD, qui construit des auto-encodeurs en interne en utilisant quelques entr\u00e9es de l&rsquo;utilisateur.<\/p><p>Pour la partie donn\u00e9es, utilisons la fonction utilitaire generate_data() de PyOD pour g\u00e9n\u00e9rer 25 variables, 500 observations et dix pour cent de valeurs aberrantes.<\/p><pre class=\"hljs\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np\n<span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n<span class=\"hljs-keyword\">from<\/span> pyod.models.auto_encoder <span class=\"hljs-keyword\">import<\/span> AutoEncoder\n<span class=\"hljs-keyword\">from<\/span> pyod.utils.data <span class=\"hljs-keyword\">import<\/span> generate_data\ncontamination = <span class=\"hljs-number\">0.1<\/span>  <span class=\"hljs-comment\"># percentage of outliers<\/span>\nn_train = <span class=\"hljs-number\">500<\/span>  <span class=\"hljs-comment\"># number of training points<\/span>\nn_test = <span class=\"hljs-number\">500<\/span>  <span class=\"hljs-comment\"># number of testing points<\/span>\nn_features = <span class=\"hljs-number\">25<\/span> <span class=\"hljs-comment\"># Number of features<\/span>\nX_train, y_train, X_test, y_test = generate_data(\n   n_train=n_train, n_test=n_test,\n   n_features= n_features,\n   contamination=contamination,random_state=<span class=\"hljs-number\">1234<\/span>)\nX_train = pd.DataFrame(X_train)\nX_test = pd.DataFrame(X_test)<\/pre><pre class=\"hljs\"><span class=\"hljs-keyword\">from<\/span> sklearn.preprocessing <span class=\"hljs-keyword\">import<\/span> StandardScaler\nX_train = StandardScaler().fit_transform(X_train)\nX_train = pd.DataFrame(X_train)\nX_test = StandardScaler().fit_transform(X_test)\nX_test = pd.DataFrame(X_test)<\/pre><pre class=\"hljs\"><span class=\"hljs-keyword\">from<\/span> sklearn.decomposition <span class=\"hljs-keyword\">import<\/span> PCA\npca = PCA(<span class=\"hljs-number\">2<\/span>)\nx_pca = pca.fit_transform(X_train)\nx_pca = pd.DataFrame(x_pca)\nx_pca.columns=[<span class=\"hljs-string\">'PC1'<\/span>,<span class=\"hljs-string\">'PC2'<\/span>]\ncdict = {<span class=\"hljs-number\">0<\/span>: <span class=\"hljs-string\">'red'<\/span>, <span class=\"hljs-number\">1<\/span>: <span class=\"hljs-string\">'blue'<\/span>}\n<span class=\"hljs-comment\"># Plot<\/span>\n<span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\nplt.scatter(X_train[<span class=\"hljs-number\">0<\/span>], X_train[<span class=\"hljs-number\">1<\/span>], c=y_train, alpha=<span class=\"hljs-number\">1<\/span>)\nplt.title(<span class=\"hljs-string\">'Scatter plot'<\/span>)\nplt.xlabel(<span class=\"hljs-string\">'x'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'y'<\/span>)\nplt.show()<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20844 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast29.webp\" alt=\"outlier detection\" width=\"385\" height=\"279\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast29.webp 385w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast29-300x217.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast29-18x12.webp 18w\" sizes=\"(max-width: 385px) 100vw, 385px\" \/><\/p><p>Maintenant tunons un autoencodeur. [25, 2, 2, 25]. La couche d&rsquo;entr\u00e9e et la couche de sortie comportent chacune 25 <a href=\"https:\/\/complex-systems-ai.com\/es\/algoritmos-neuronales\/perceptron-es\/\">neurones<\/a>. Il y a deux couches cach\u00e9es, chacune comportant deux neurones.<\/p><pre class=\"hljs\">clf = AutoEncoder(hidden_neurons =[<span class=\"hljs-number\">25<\/span>, <span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">25<\/span>])\nclf.fit(X_train)<\/pre><p>Appliquons le mod\u00e8le entra\u00een\u00e9 Clf pour pr\u00e9dire le score d&rsquo;anomalie pour chaque observation dans les donn\u00e9es de test. Comment d\u00e9finit-on une valeur aberrante\u00a0? Une valeur aberrante est un point \u00e9loign\u00e9 des autres points, le score aberrant est donc d\u00e9fini par la distance. La fonction PyOD .decision_function() calcule la distance, ou le score d&rsquo;anomalie, pour chaque point de donn\u00e9es.<\/p><pre class=\"hljs\"><span class=\"hljs-comment\"># Get the outlier scores for the train data<\/span>\ny_train_scores = clf.decision_scores_\n<span class=\"hljs-comment\"># Predict the anomaly scores<\/span>\ny_test_scores = clf.decision_function(X_test)  <span class=\"hljs-comment\"># outlier scores<\/span>\ny_test_scores = pd.Series(y_test_scores)\n<span class=\"hljs-comment\"># Plot it!<\/span>\n<span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\nplt.hist(y_test_scores, bins=<span class=\"hljs-string\">'auto'<\/span>)\nplt.title(<span class=\"hljs-string\">\"Histogram for Model Clf1 Anomaly Scores\"<\/span>)\nplt.show()<\/pre><p>Si nous utilisons un histogramme pour compter la fr\u00e9quence par le score d&rsquo;anomalie, nous verrons que les scores \u00e9lev\u00e9s correspondent \u00e0 une fr\u00e9quence faible\u200a\u2014\u200apreuve de valeurs aberrantes. Nous choisissons 4,0 comme point de coupure et ceux &gt;=4,0 comme valeurs aberrantes.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20845 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast30.webp\" alt=\"outlier forecasting\" width=\"594\" height=\"265\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast30.webp 594w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast30-300x134.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast30-18x8.webp 18w\" sizes=\"(max-width: 594px) 100vw, 594px\" \/><\/p><p>Attribuons les observations avec des scores d&rsquo;anomalie inf\u00e9rieurs \u00e0 4,0 au cluster 0 et au cluster 1 pour celles sup\u00e9rieures \u00e0 4,0. Calculons \u00e9galement les statistiques r\u00e9capitulatives par cluster en utilisant .groupby() . Ce mod\u00e8le a identifi\u00e9 50 valeurs aberrantes (non pr\u00e9sent\u00e9es).<\/p><pre class=\"hljs\">df_test = X_test.copy()\ndf_test[<span class=\"hljs-string\">'score'<\/span>] = y_test_scores\ndf_test[<span class=\"hljs-string\">'cluster'<\/span>] = np.where(df_test[<span class=\"hljs-string\">'score'<\/span>]&lt;<span class=\"hljs-number\">4<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>)\ndf_test[<span class=\"hljs-string\">'cluster'<\/span>].value_counts()\ndf_test.groupby(<span class=\"hljs-string\">'cluster'<\/span>).mean()<\/pre><p>Le r\u00e9sultat suivant montre les valeurs moyennes des variables dans chaque cluster. Les valeurs du cluster \u00ab 1 \u00bb (le cluster anormal) sont tr\u00e8s diff\u00e9rentes de celles du cluster \u00ab 0 \u00bb (le cluster normal). Les valeurs du \u00ab score \u00bb montrent la distance moyenne de ces observations par rapport aux autres. Un \u00ab score \u00bb \u00e9lev\u00e9 signifie que l\u2019observation est tr\u00e8s \u00e9loign\u00e9e de la norme.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20846 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast31.webp\" alt=\"outlier detection\" width=\"238\" height=\"838\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast31.webp 238w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast31-85x300.webp 85w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast31-3x12.webp 3w\" sizes=\"(max-width: 238px) 100vw, 238px\" \/><\/p><p>De cette fa\u00e7on, nous pouvons distinguer et \u00e9tiqueter assez parfaitement les donn\u00e9es typiques et les anomalies.<\/p><p>Les encodeurs automatiques peuvent g\u00e9rer facilement des donn\u00e9es de grande dimension. En ce qui concerne son comportement de non-lin\u00e9arit\u00e9, il peut trouver des mod\u00e8les complexes dans des ensembles de donn\u00e9es de grande dimension.<\/p><p>Puisqu\u2019il s\u2019agit d\u2019une strat\u00e9gie bas\u00e9e sur l\u2019apprentissage en profondeur, elle sera particuli\u00e8rement difficile si les donn\u00e9es sont moindres. Les co\u00fbts de calcul monteront en fl\u00e8che si la profondeur du r\u00e9seau augmente et en traitant du Big Data.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-45754bf elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"45754bf\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b3f56b8\" data-id=\"b3f56b8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-02a7b0a elementor-widget elementor-widget-heading\" data-id=\"02a7b0a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Gerer-les-anomalies-avec-du-lissage\"><\/span>G\u00e9rer les anomalies avec du lissage<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-58a5e5e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"58a5e5e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3279678\" data-id=\"3279678\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-73699d6 elementor-widget elementor-widget-text-editor\" data-id=\"73699d6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Les m\u00e9thodes statistiques vous permettent d&rsquo;ajuster la valeur de votre valeur aberrante pour qu&rsquo;elle corresponde \u00e0 la distribution d&rsquo;origine. Voyons une des m\u00e9thodes utilis\u00e9es pour lisser les anomalies.<\/p><p>L&rsquo;id\u00e9e est d&rsquo;att\u00e9nuer l&rsquo;anomalie en utilisant les donn\u00e9es du DateTime pr\u00e9c\u00e9dent. Par exemple, pour compenser une consommation soudaine d&rsquo;\u00e9lectricit\u00e9 due \u00e0 un \u00e9v\u00e9nement survenu dans votre maison, vous pouvez prendre la moyenne des consommations au cours du m\u00eame mois pour les ann\u00e9es pr\u00e9c\u00e9dentes.<\/p><p>Mettons en \u0153uvre la m\u00eame chose pour avoir une image claire. Nous utiliserons les m\u00eames donn\u00e9es sur les ventes de poisson-chat que nous avons utilis\u00e9es plus t\u00f4t. Nous pouvons ajuster avec la moyenne en utilisant le script ci-dessous.<\/p><pre class=\"hljs\">adjusted_data = lim_catfish_sales.copy()\nadjusted_data.loc[curr_anomaly] = december_data[(december_data.index != curr_anomaly) &amp; (december_data.index &lt; test_data.index[<span class=\"hljs-number\">0<\/span>])].mean()<\/pre><pre class=\"hljs\">plt.figure(figsize=(<span class=\"hljs-number\">10<\/span>,<span class=\"hljs-number\">4<\/span>))\nplt.plot(lim_catfish_sales, color=<span class=\"hljs-string\">'firebrick'<\/span>, alpha=<span class=\"hljs-number\">0.4<\/span>)\nplt.plot(adjusted_data)\nplt.title(<span class=\"hljs-string\">'Catfish Sales in 1000s of Pounds'<\/span>, fontsize=<span class=\"hljs-number\">20<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Sales'<\/span>, fontsize=<span class=\"hljs-number\">16<\/span>)\n<span class=\"hljs-keyword\">for<\/span> year <span class=\"hljs-keyword\">in<\/span> range(start_date.year,end_date.year):\n   plt.axvline(pd.to_datetime(str(year)+<span class=\"hljs-string\">'-01-01'<\/span>), color=<span class=\"hljs-string\">'k'<\/span>, linestyle=<span class=\"hljs-string\">'--'<\/span>, alpha=<span class=\"hljs-number\">0.2<\/span>)\nplt.axvline(curr_anomaly, color=<span class=\"hljs-string\">'k'<\/span>, alpha=<span class=\"hljs-number\">0.7<\/span>)<\/pre><p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-20850 size-full\" src=\"http:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast32.webp\" alt=\"outlier detection\" width=\"631\" height=\"272\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast32.webp 631w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast32-300x129.webp 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2024\/02\/forecast32-18x8.webp 18w\" sizes=\"(max-width: 631px) 100vw, 631px\" \/><\/p><p>De cette fa\u00e7on, vous pouvez proc\u00e9der \u00e0 l\u2019application de pr\u00e9visions ou d\u2019analyses sans trop vous soucier de l\u2019asym\u00e9trie de vos r\u00e9sultats.<\/p><p>Il existe de nombreuses m\u00e9thodes pour traiter des donn\u00e9es non chronologiques, mais elles ne peuvent malheureusement pas \u00eatre utilis\u00e9es directement dans Timeseries en raison de la diff\u00e9rence dans les structures sous-jacentes. Les m\u00e9thodes de traitement sans s\u00e9ries chronologiques impliquent de nombreuses m\u00e9thodes bas\u00e9es sur la distribution qui ne peuvent pas \u00eatre simplement traduites en donn\u00e9es Timeseries.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8067809 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8067809\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e2a22f9\" data-id=\"e2a22f9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-574d5f5 elementor-widget elementor-widget-heading\" data-id=\"574d5f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Effacer-les-anomalies\"><\/span>Effacer les anomalies<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e1d3b7f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e1d3b7f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-18d3aa9\" data-id=\"18d3aa9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-790b2b7 elementor-widget elementor-widget-text-editor\" data-id=\"790b2b7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La derni\u00e8re option si aucune des deux solutions ci-dessus ne suscite de d\u00e9bat dans votre solution est de supprimer les anomalies. Ceci n\u2019est pas recommand\u00e9 (car vous vous d\u00e9barrassez essentiellement de certaines informations potentiellement pr\u00e9cieuses) \u00e0 moins que cela ne soit absolument n\u00e9cessaire et que cela ne nuise pas \u00e0 l\u2019analyse future.<\/p><p>Vous pouvez utiliser la fonctionnalit\u00e9 .drop() dans les pandas apr\u00e8s identification. Il fera le gros du travail \u00e0 votre place.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Previsi\u00f3n P\u00e1gina de inicio Wiki Detecci\u00f3n de anomal\u00edas en series de tiempo Este tutorial aborda el problema de la detecci\u00f3n de anomal\u00edas en el preprocesamiento para... <\/p>","protected":false},"author":1,"featured_media":0,"parent":20753,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-20824","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/20824","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/comments?post=20824"}],"version-history":[{"count":4,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/20824\/revisions"}],"predecessor-version":[{"id":20853,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/20824\/revisions\/20853"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/20753"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/media?parent=20824"}],"curies":[{"name":"gracias","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}