{"id":15832,"date":"2022-04-23T21:28:05","date_gmt":"2022-04-23T20:28:05","guid":{"rendered":"https:\/\/complex-systems-ai.com\/?page_id=15832"},"modified":"2022-04-23T22:30:52","modified_gmt":"2022-04-23T21:30:52","slug":"analyse-exploratoire-de-textes","status":"publish","type":"page","link":"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/","title":{"rendered":"An\u00e1lisis exploratorio de textos"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"15832\" class=\"elementor elementor-15832\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9f1c923 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9f1c923\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-4869e20\" data-id=\"4869e20\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4e92625 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"4e92625\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/analyse-descriptive\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Analyse descriptive<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-008eb9a\" data-id=\"008eb9a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b17f48e elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"b17f48e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/complex-systems-ai.com\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Page d'accueil<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-b88a03e\" data-id=\"b88a03e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a496c85 elementor-align-justify elementor-widget elementor-widget-button\" data-id=\"a496c85\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/en.wikipedia.org\/wiki\/Descriptive_statistics\" target=\"_blank\" rel=\"noopener\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Wiki<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ff76643 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ff76643\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-35618d1\" data-id=\"35618d1\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ee3cdc4 elementor-widget elementor-widget-text-editor\" data-id=\"ee3cdc4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Repr\u00e9senter visuellement le contenu d&rsquo;un document texte est l&rsquo;une des t\u00e2ches les plus importantes dans le domaine de l&rsquo;exploration de texte (aussi dit analyse exploratoire de textes). En tant que data scientist ou sp\u00e9cialiste du NLP, non seulement nous explorons le contenu des documents sous diff\u00e9rents aspects et \u00e0 diff\u00e9rents niveaux de d\u00e9tails, mais nous r\u00e9sumons \u00e9galement un seul document, montrons les mots et les sujets, d\u00e9tectons les \u00e9v\u00e9nements et cr\u00e9ons des sc\u00e9narios.<\/p><p>Cependant, il existe des \u00e9carts entre la <a href=\"https:\/\/complex-systems-ai.com\/es\/visualizacion-de-datos\/\">visualisation des donn\u00e9es<\/a> non structur\u00e9es (texte) et des donn\u00e9es structur\u00e9es. Par exemple, de nombreuses visualisations de texte ne repr\u00e9sentent pas le texte directement, elles repr\u00e9sentent une sortie d&rsquo;un mod\u00e8le de langage (nombre de mots, longueur des caract\u00e8res, s\u00e9quences de mots, etc.).<\/p><p>Dans cet article, nous utiliserons l&rsquo;ensemble de donn\u00e9es Womens Clothing E-Commerce Reviews et essaierons d&rsquo;explorer et de visualiser autant que possible, en utilisant la biblioth\u00e8que graphique Python de Plotly et la biblioth\u00e8que de visualisation Bokeh. Non seulement nous allons explorer les donn\u00e9es textuelles, mais nous allons \u00e9galement visualiser les caract\u00e9ristiques num\u00e9riques et cat\u00e9gorielles.\u00a0<\/p><p><img decoding=\"async\" class=\"aligncenter wp-image-11096 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2020\/09\/cropped-Capture.png\" alt=\"analyse exploratoire de textes\" width=\"97\" height=\"97\" title=\"\"><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8b85aaa elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8b85aaa\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d1c8118\" data-id=\"d1c8118\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e94aaf6 elementor-widget elementor-widget-heading\" data-id=\"e94aaf6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contenus<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Alternar tabla de contenidos\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Analyse-exploratoire-de-textes-les-donnees\" >Analyse exploratoire de textes : les donn\u00e9es<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Visualisation-univariee\" >Visualisation univari\u00e9e<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Les-n-grammes\" >Les n-grammes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Part-of-Speech\" >Part-of-Speech<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Analyse-par-classe\" >Analyse par classe<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Analyse-bivariee\" >Analyse bivari\u00e9e<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/complex-systems-ai.com\/es\/analisis-descriptivo\/analisis-exploratorio-de-texto\/#Modelisation-du-contenu\" >Mod\u00e9lisation du contenu<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Analyse-exploratoire-de-textes-les-donnees\"><\/span>Analyse exploratoire de textes : les donn\u00e9es<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0763a0a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0763a0a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-13992b5\" data-id=\"13992b5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-678934e elementor-widget elementor-widget-text-editor\" data-id=\"678934e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Apr\u00e8s une br\u00e8ve inspection des donn\u00e9es, nous avons constat\u00e9 qu&rsquo;il y a une s\u00e9rie de pr\u00e9traitements de donn\u00e9es que nous devons effectuer.<\/p><p><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-15841 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-1024x276.png\" alt=\"\" width=\"1024\" height=\"276\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-1024x276.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-300x81.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-768x207.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-18x5.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw-600x162.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_1E-zIJXMas05676qvuWSzw.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p><p>Supprimez la fonction \u00ab\u00a0Titre\u00a0\u00bb.<br \/>Supprimez les lignes o\u00f9 \u00ab\u00a0Review Text\u00a0\u00bb manquait.<br \/>Nettoyez la colonne \u00ab\u00a0Texte de r\u00e9vision\u00a0\u00bb.<br \/>Utilisation de TextBlob pour calculer la polarit\u00e9 des sentiments qui se situe dans la plage de [-1,1] o\u00f9 1 signifie un sentiment positif et -1 signifie un sentiment n\u00e9gatif.<br \/>Cr\u00e9er une nouvelle fonctionnalit\u00e9 pour la dur\u00e9e de l&rsquo;examen.<br \/>Cr\u00e9er une nouvelle fonctionnalit\u00e9 pour le nombre de mots de l&rsquo;examen.<\/p><p id=\"7b28\" class=\"pw-post-body-paragraph ld le jo lf b lg lh kp li lj lk ks ll lm ln lo lp lq lr ls lt lu lv lw lx ly it gc\" data-selectable-paragraph=\"\">Pour pr\u00e9visualiser si le score de polarit\u00e9 des sentiments fonctionne, nous s\u00e9lectionnons au hasard 5 avis avec le score de polarit\u00e9 des sentiments le plus \u00e9lev\u00e9 (1)\u00a0:<\/p><pre class=\"ms mt mu mv gz mw bt mx\"><span id=\"1c11\" class=\"gc my mb jo mz b do na nb l nc\" data-selectable-paragraph=\"\">print('5 random reviews with the highest positive sentiment polarity: \\n')<br \/>cl = df.loc[df.polarity == 1, ['Review Text']].sample(5).values<br \/>for c in cl:<br \/>    print(c[0])<\/span><\/pre><figure class=\"ms mt mu mv gz jc gn go paragraph-image\"><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><div class=\"gn go nu\"><img decoding=\"async\" class=\"aligncenter wp-image-15844 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-1024x222.png\" alt=\"\" width=\"1024\" height=\"222\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-1024x222.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-300x65.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-768x167.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-18x4.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ-600x130.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_TK40esWITCAcFu6QzXFbnQ.png 1253w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/div><\/div><\/figure><p id=\"6f60\" class=\"pw-post-body-paragraph ld le jo lf b lg lh kp li lj lk ks ll lm ln lo lp lq lr ls lt lu lv lw lx ly it gc\" data-selectable-paragraph=\"\">S\u00e9lectionnez ensuite au hasard 5 avis avec le score de polarit\u00e9 des sentiments le plus neutre (z\u00e9ro)\u00a0:<\/p><pre class=\"ms mt mu mv gz mw bt mx\"><span id=\"f487\" class=\"gc my mb jo mz b do na nb l nc\" data-selectable-paragraph=\"\">print('5 random reviews with the most neutral sentiment(zero) polarity: \\n')<br \/>cl = df.loc[df.polarity == 0, ['Review Text']].sample(5).values<br \/>for c in cl:<br \/>    print(c[0])<\/span><\/pre><figure class=\"ms mt mu mv gz jc gn go paragraph-image\"><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><div class=\"gn go nv\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15843 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-1024x201.png\" alt=\"\" width=\"1024\" height=\"201\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-1024x201.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-300x59.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-768x151.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-18x4.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg-600x118.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_tvAJ4q8qldsPARCsPCedhg.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/div><\/div><\/figure><p id=\"84b9\" class=\"pw-post-body-paragraph ld le jo lf b lg lh kp li lj lk ks ll lm ln lo lp lq lr ls lt lu lv lw lx ly it gc\" data-selectable-paragraph=\"\">Il n&rsquo;y avait que 2 avis avec le score de polarit\u00e9 des sentiments le plus n\u00e9gatif\u00a0:<\/p><pre class=\"ms mt mu mv gz mw bt mx\"><span id=\"fb67\" class=\"gc my mb jo mz b do na nb l nc\" data-selectable-paragraph=\"\">print('2 reviews with the most negative polarity: \\n')<br \/>cl = df.loc[df.polarity == -0.97500000000000009, ['Review Text']].sample(2).values<br \/>for c in cl:<br \/>    print(c[0])<\/span><\/pre><figure class=\"ms mt mu mv gz jc gn go paragraph-image\"><div class=\"jd je dq jf cf jg\" tabindex=\"0\" role=\"button\"><div class=\"gn go nw\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15842 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-1024x134.png\" alt=\"\" width=\"1024\" height=\"134\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-1024x134.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-300x39.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-768x100.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-18x2.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q-600x78.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_-UYbEjFfnpzyeBgAmugT6Q.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/div><\/div><\/figure><p id=\"1f36\" class=\"pw-post-body-paragraph ld le jo lf b lg lh kp li lj lk ks ll lm ln lo lp lq lr ls lt lu lv lw lx ly it gc\" data-selectable-paragraph=\"\">Cela semble fonctionner<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-cb10e52 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"cb10e52\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4f5d2a9\" data-id=\"4f5d2a9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9d9caa3 elementor-widget elementor-widget-heading\" data-id=\"9d9caa3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Visualisation-univariee\"><\/span>Visualisation univari\u00e9e<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-1e122e7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1e122e7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4735d3c\" data-id=\"4735d3c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1a019e8 elementor-widget elementor-widget-text-editor\" data-id=\"1a019e8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La visualisation \u00e0 variable unique ou univari\u00e9e est le type de visualisation le plus simple qui consiste en des observations sur une seule caract\u00e9ristique ou un seul attribut. La visualisation univari\u00e9e comprend un histogramme, des diagrammes \u00e0 barres et des graphiques lin\u00e9aires.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15845 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122.png\" alt=\"\" width=\"699\" height=\"692\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122.png 699w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122-300x297.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122-12x12.png 12w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122-600x594.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224122-100x100.png 100w\" sizes=\"(max-width: 699px) 100vw, 699px\" \/><\/p><p>La grande majorit\u00e9 des scores de polarit\u00e9 des sentiments sont sup\u00e9rieurs \u00e0 z\u00e9ro, ce qui signifie que la plupart d&rsquo;entre eux sont plut\u00f4t positifs.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15846 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224215.png\" alt=\"\" width=\"677\" height=\"666\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224215.png 677w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224215-300x295.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224215-12x12.png 12w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224215-600x590.png 600w\" sizes=\"(max-width: 677px) 100vw, 677px\" \/><\/p><p>Les notes sont align\u00e9es sur le score de polarit\u00e9, c&rsquo;est-\u00e0-dire que la plupart des notes sont assez \u00e9lev\u00e9es \u00e0 4 ou 5 plages.<\/p><p>Il est possible de faire de m\u00eame avec l&rsquo;\u00e2ge des reviewers, le nombre de caract\u00e8res par reviews et le nombres de mots par reviews mais cela n&rsquo;est pas le coeur de ce tutoriel.<\/p><p>\u00a0<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ff3ab8a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ff3ab8a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-484c28d\" data-id=\"484c28d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e7d7e4a elementor-widget elementor-widget-heading\" data-id=\"e7d7e4a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Les-n-grammes\"><\/span>Les n-grammes<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a1260da elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a1260da\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-36337e0\" data-id=\"36337e0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-15f1270 elementor-widget elementor-widget-text-editor\" data-id=\"15f1270\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Nous arrivons maintenant \u00e0 la fonctionnalit\u00e9 qui nous int\u00e9resse, avant d&rsquo;explorer cette fonctionnalit\u00e9, nous devons extraire les fonctionnalit\u00e9s N-Gram. Les N-grammes sont utilis\u00e9s pour d\u00e9crire le nombre de mots utilis\u00e9s comme points d&rsquo;observation, par exemple, unigramme signifie un seul mot, bigramme signifie une phrase \u00e0 2 mots et trigramme signifie une phrase \u00e0 3 mots. Pour ce faire, nous utilisons la fonction CountVectorizer de scikit-learn.<\/p><p>Pour faire l&rsquo;analyse des unigrammes, il est tr\u00e8s important de nettoyer le texte des stopwords.<\/p><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-top_unigram_no_stopwords-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"top_unigram_no_stopwords.py\"><tbody><tr><td id=\"file-top_unigram_no_stopwords-py-L1\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"1\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">def<\/span> <span class=\"pl-en\">get_top_n_words<\/span>(<span class=\"pl-s1\">corpus<\/span>, <span class=\"pl-s1\">n<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">None<\/span>):<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">vec<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">CountVectorizer<\/span>(<span class=\"pl-s1\">stop_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s\">&lsquo;english&rsquo;<\/span>).<span class=\"pl-en\">fit<\/span>(<span class=\"pl-s1\">corpus<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">bag_of_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">vec<\/span>.<span class=\"pl-en\">transform<\/span>(<span class=\"pl-s1\">corpus<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">sum_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">bag_of_words<\/span>.<span class=\"pl-en\">sum<\/span>(<span class=\"pl-s1\">axis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">words_freq<\/span> <span class=\"pl-c1\">=<\/span> [(<span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">sum_words<\/span>[<span class=\"pl-c1\">0<\/span>, <span class=\"pl-s1\">idx<\/span>]) <span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">idx<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">vec<\/span>.<span class=\"pl-s1\">vocabulary_<\/span>.<span class=\"pl-en\">items<\/span>()]<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">words_freq<\/span> <span class=\"pl-c1\">=<\/span><span class=\"pl-en\">sorted<\/span>(<span class=\"pl-s1\">words_freq<\/span>, <span class=\"pl-s1\">key<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-k\">lambda<\/span> <span class=\"pl-s1\">x<\/span>: <span class=\"pl-s1\">x<\/span>[<span class=\"pl-c1\">1<\/span>], <span class=\"pl-s1\">reverse<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">True<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">return<\/span> <span class=\"pl-s1\">words_freq<\/span>[:<span class=\"pl-s1\">n<\/span>]<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">common_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-en\">get_top_n_words<\/span>(<span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Review Text&rsquo;<\/span>], <span class=\"pl-c1\">20<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L9\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"9\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC9\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">freq<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">common_words<\/span>:<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L10\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"10\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-en\">print<\/span>(<span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">freq<\/span>)<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L11\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"11\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC11\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df2<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">pd<\/span>.<span class=\"pl-v\">DataFrame<\/span>(<span class=\"pl-s1\">common_words<\/span>, <span class=\"pl-s1\">columns<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s\">&lsquo;ReviewText&rsquo;<\/span> , <span class=\"pl-s\">&lsquo;count&rsquo;<\/span>])<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L12\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"12\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC12\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df2<\/span>.<span class=\"pl-en\">groupby<\/span>(<span class=\"pl-s\">&lsquo;ReviewText&rsquo;<\/span>).<span class=\"pl-en\">sum<\/span>()[<span class=\"pl-s\">&lsquo;count&rsquo;<\/span>].<span class=\"pl-en\">sort_values<\/span>(<span class=\"pl-s1\">ascending<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>).<span class=\"pl-en\">iplot<\/span>(<\/td><\/tr><tr><td id=\"file-top_unigram_no_stopwords-py-L13\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"13\">\u00a0<\/td><td id=\"file-top_unigram_no_stopwords-py-LC13\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">kind<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;bar&rsquo;<\/span>, <span class=\"pl-s1\">yTitle<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Count&rsquo;<\/span>, <span class=\"pl-s1\">linecolor<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;black&rsquo;<\/span>, <span class=\"pl-s1\">title<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Top 20 words in review after removing stop words&rsquo;<\/span>)<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15847 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224741.png\" alt=\"\" width=\"660\" height=\"455\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224741.png 660w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224741-300x207.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224741-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-224741-600x414.png 600w\" sizes=\"(max-width: 660px) 100vw, 660px\" \/><\/div><div>Faisons de m\u00eame pour \u00e9tudier les bigrammes.<\/div><div><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-top_bigram_no_stopwords-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python  \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"top_bigram_no_stopwords.py\"><tbody><tr><td id=\"file-top_bigram_no_stopwords-py-L1\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"1\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">def<\/span> <span class=\"pl-en\">get_top_n_bigram<\/span>(<span class=\"pl-s1\">corpus<\/span>, <span class=\"pl-s1\">n<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">None<\/span>):<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">vec<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">CountVectorizer<\/span>(<span class=\"pl-s1\">ngram_range<\/span><span class=\"pl-c1\">=<\/span>(<span class=\"pl-c1\">2<\/span>, <span class=\"pl-c1\">2<\/span>), <span class=\"pl-s1\">stop_words<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;english&rsquo;<\/span>).<span class=\"pl-en\">fit<\/span>(<span class=\"pl-s1\">corpus<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">bag_of_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">vec<\/span>.<span class=\"pl-en\">transform<\/span>(<span class=\"pl-s1\">corpus<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">sum_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">bag_of_words<\/span>.<span class=\"pl-en\">sum<\/span>(<span class=\"pl-s1\">axis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">words_freq<\/span> <span class=\"pl-c1\">=<\/span> [(<span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">sum_words<\/span>[<span class=\"pl-c1\">0<\/span>, <span class=\"pl-s1\">idx<\/span>]) <span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">idx<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">vec<\/span>.<span class=\"pl-s1\">vocabulary_<\/span>.<span class=\"pl-en\">items<\/span>()]<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">words_freq<\/span> <span class=\"pl-c1\">=<\/span><span class=\"pl-en\">sorted<\/span>(<span class=\"pl-s1\">words_freq<\/span>, <span class=\"pl-s1\">key<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-k\">lambda<\/span> <span class=\"pl-s1\">x<\/span>: <span class=\"pl-s1\">x<\/span>[<span class=\"pl-c1\">1<\/span>], <span class=\"pl-s1\">reverse<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">True<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">return<\/span> <span class=\"pl-s1\">words_freq<\/span>[:<span class=\"pl-s1\">n<\/span>]<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">common_words<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-en\">get_top_n_bigram<\/span>(<span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Review Text&rsquo;<\/span>], <span class=\"pl-c1\">20<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L9\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"9\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC9\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">freq<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">common_words<\/span>:<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L10\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"10\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-en\">print<\/span>(<span class=\"pl-s1\">word<\/span>, <span class=\"pl-s1\">freq<\/span>)<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L11\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"11\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC11\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df4<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">pd<\/span>.<span class=\"pl-v\">DataFrame<\/span>(<span class=\"pl-s1\">common_words<\/span>, <span class=\"pl-s1\">columns<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s\">&lsquo;ReviewText&rsquo;<\/span> , <span class=\"pl-s\">&lsquo;count&rsquo;<\/span>])<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L12\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"12\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC12\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df4<\/span>.<span class=\"pl-en\">groupby<\/span>(<span class=\"pl-s\">&lsquo;ReviewText&rsquo;<\/span>).<span class=\"pl-en\">sum<\/span>()[<span class=\"pl-s\">&lsquo;count&rsquo;<\/span>].<span class=\"pl-en\">sort_values<\/span>(<span class=\"pl-s1\">ascending<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>).<span class=\"pl-en\">iplot<\/span>(<\/td><\/tr><tr><td id=\"file-top_bigram_no_stopwords-py-L13\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"13\">\u00a0<\/td><td id=\"file-top_bigram_no_stopwords-py-LC13\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">kind<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;bar&rsquo;<\/span>, <span class=\"pl-s1\">yTitle<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Count&rsquo;<\/span>, <span class=\"pl-s1\">linecolor<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;black&rsquo;<\/span>, <span class=\"pl-s1\">title<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Top 20 bigrams in review after removing stop words&rsquo;<\/span>)<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15848 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225019.png\" alt=\"\" width=\"659\" height=\"452\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225019.png 659w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225019-300x206.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225019-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225019-600x412.png 600w\" sizes=\"(max-width: 659px) 100vw, 659px\" \/><\/div><div>Et ainsi de suite.<\/div><\/div><div>\u00a0<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f33bef0 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f33bef0\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9ea75a0\" data-id=\"9ea75a0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2031349 elementor-widget elementor-widget-heading\" data-id=\"2031349\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Part-of-Speech\"><\/span>Part-of-Speech<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-980a90d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"980a90d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-959943a\" data-id=\"959943a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1626283 elementor-widget elementor-widget-text-editor\" data-id=\"1626283\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Part-Of-Speech Tagging (POS) est un processus d&rsquo;attribution de parties du discours \u00e0 chaque mot, comme le nom, le verbe, l&rsquo;adjectif, etc.<\/p><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-pos-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"POS.py\"><tbody><tr><td id=\"file-pos-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">blob<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">TextBlob<\/span>(<span class=\"pl-en\">str<\/span>(<span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Review Text&rsquo;<\/span>]))<\/td><\/tr><tr><td id=\"file-pos-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-pos-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">pos_df<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">pd<\/span>.<span class=\"pl-v\">DataFrame<\/span>(<span class=\"pl-s1\">blob<\/span>.<span class=\"pl-s1\">tags<\/span>, <span class=\"pl-s1\">columns<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s\">&lsquo;word&rsquo;<\/span> , <span class=\"pl-s\">&lsquo;pos&rsquo;<\/span>])<\/td><\/tr><tr><td id=\"file-pos-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-pos-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">pos_df<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">pos_df<\/span>.<span class=\"pl-s1\">pos<\/span>.<span class=\"pl-en\">value_counts<\/span>()[:<span class=\"pl-c1\">20<\/span>]<\/td><\/tr><tr><td id=\"file-pos-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-pos-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">pos_df<\/span>.<span class=\"pl-en\">iplot<\/span>(<\/td><\/tr><tr><td id=\"file-pos-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-pos-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">kind<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;bar&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-pos-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-pos-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">xTitle<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;POS&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-pos-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-pos-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">yTitle<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;count&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-pos-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-pos-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">title<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Top 20 Part-of-speech tagging for review corpus&rsquo;<\/span>)<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15849 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225236.png\" alt=\"\" width=\"659\" height=\"446\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225236.png 659w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225236-300x203.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225236-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225236-600x406.png 600w\" sizes=\"(max-width: 659px) 100vw, 659px\" \/><\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-37fee28 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"37fee28\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-01b9c53\" data-id=\"01b9c53\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5389e6c elementor-widget elementor-widget-heading\" data-id=\"5389e6c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Analyse-par-classe\"><\/span>Analyse par classe<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6ec07cd elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6ec07cd\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6eaf38a\" data-id=\"6eaf38a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0823361 elementor-widget elementor-widget-text-editor\" data-id=\"0823361\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La bo\u00eete \u00e0 moustaches est utilis\u00e9e pour comparer le score de polarit\u00e9 des sentiments, la notation, la longueur des textes de r\u00e9vision de chaque d\u00e9partement ou division du magasin de commerce \u00e9lectronique.<\/p><p>Le score de polarit\u00e9 des sentiments le plus \u00e9lev\u00e9 a \u00e9t\u00e9 obtenu par l&rsquo;ensemble des six d\u00e9partements, \u00e0 l&rsquo;exception du d\u00e9partement Tendance, et le score de polarit\u00e9 des sentiments le plus bas a \u00e9t\u00e9 obtenu par le d\u00e9partement Tops. Et le d\u00e9partement Trend a le score de polarit\u00e9 m\u00e9dian le plus bas. Si vous vous souvenez, le d\u00e9partement Trend a le moins d&rsquo;avis. Cela explique pourquoi il n&rsquo;a pas une aussi grande vari\u00e9t\u00e9 de distribution des scores que les autres d\u00e9partements.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15850 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225412.png\" alt=\"\" width=\"637\" height=\"437\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225412.png 637w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225412-300x206.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225412-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225412-600x412.png 600w\" sizes=\"(max-width: 637px) 100vw, 637px\" \/><\/p><p>\u00c0 l&rsquo;exception du d\u00e9partement Tendance, la note m\u00e9diane de tous les autres d\u00e9partements \u00e9tait de 5. Dans l&rsquo;ensemble, les notes sont \u00e9lev\u00e9es et le sentiment est positif dans cet ensemble de donn\u00e9es d&rsquo;examen.<\/p><p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15851 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225513.png\" alt=\"\" width=\"625\" height=\"433\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225513.png 625w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225513-300x208.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225513-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225513-600x416.png 600w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><\/p><p>Et ainsi de suite.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d8b9747 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d8b9747\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0442fa4\" data-id=\"0442fa4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d6ced41 elementor-widget elementor-widget-heading\" data-id=\"d6ced41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Analyse-bivariee\"><\/span>Analyse bivari\u00e9e<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-58a14e1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"58a14e1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2b8041e\" data-id=\"2b8041e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-38bab49 elementor-widget elementor-widget-text-editor\" data-id=\"38bab49\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>La visualisation bivari\u00e9e est un type de visualisation qui consiste en deux caract\u00e9ristiques \u00e0 la fois. Il d\u00e9crit l&rsquo;association ou la relation entre deux caract\u00e9ristiques.<\/p><p>Regardons l&rsquo;analyse des sentiments en fonction de si la personne recommande ou non le produit.<\/p><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-polarity_recommendation-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"polarity_recommendation.py\"><tbody><tr><td id=\"file-polarity_recommendation-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x1<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">df<\/span>.<span class=\"pl-s1\">loc<\/span>[<span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Recommended IND&rsquo;<\/span>] <span class=\"pl-c1\">==<\/span> <span class=\"pl-c1\">1<\/span>, <span class=\"pl-s\">&lsquo;polarity&rsquo;<\/span>]<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x0<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">df<\/span>.<span class=\"pl-s1\">loc<\/span>[<span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Recommended IND&rsquo;<\/span>] <span class=\"pl-c1\">==<\/span> <span class=\"pl-c1\">0<\/span>, <span class=\"pl-s\">&lsquo;polarity&rsquo;<\/span>]<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC3\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace1<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Histogram<\/span>(<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">x0<\/span>, <span class=\"pl-s1\">name<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Not recommended&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">opacity<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0.75<\/span><\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC7\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace2<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Histogram<\/span>(<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L9\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"9\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC9\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">x1<\/span>, <span class=\"pl-s1\">name<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s\">&lsquo;Recommended&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L10\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"10\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">opacity<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0.75<\/span><\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L11\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"11\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC11\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L12\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"12\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC12\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L13\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"13\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC13\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">data<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s1\">trace1<\/span>, <span class=\"pl-s1\">trace2<\/span>]<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L14\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"14\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC14\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">layout<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Layout<\/span>(<span class=\"pl-s1\">barmode<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;overlay&rsquo;<\/span>, <span class=\"pl-s1\">title<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Distribution of Sentiment polarity of reviews based on Recommendation&rsquo;<\/span>)<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L15\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"15\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC15\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">fig<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Figure<\/span>(<span class=\"pl-s1\">data<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">data<\/span>, <span class=\"pl-s1\">layout<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">layout<\/span>)<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L16\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"16\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC16\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-polarity_recommendation-py-L17\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"17\">\u00a0<\/td><td id=\"file-polarity_recommendation-py-LC17\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-en\">iplot<\/span>(<span class=\"pl-s1\">fig<\/span>, <span class=\"pl-s1\">filename<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;overlaid histogram&rsquo;<\/span>)<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15852 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225739.png\" alt=\"\" width=\"633\" height=\"435\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225739.png 633w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225739-300x206.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225739-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-225739-600x412.png 600w\" sizes=\"(max-width: 633px) 100vw, 633px\" \/><\/div><div>Il est aussi possible de repr\u00e9senter sur un 2Dplot de densit\u00e9 l&rsquo;analyse de sentiment et le rating (il n&rsquo;est pas int\u00e9ressant de le faire pour des variables binaires).<\/div><div><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-sentiment_polarity_rating-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python  \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"sentiment_polarity_rating.py\"><tbody><tr><td id=\"file-sentiment_polarity_rating-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace1<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Scatter<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;polarity&rsquo;<\/span>], <span class=\"pl-s1\">y<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Rating&rsquo;<\/span>], <span class=\"pl-s1\">mode<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;markers&rsquo;<\/span>, <span class=\"pl-s1\">name<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;points&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">marker<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<span class=\"pl-s1\">color<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;rgb(102,0,0)&rsquo;<\/span>, <span class=\"pl-s1\">size<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">2<\/span>, <span class=\"pl-s1\">opacity<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0.4<\/span>)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC4\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace2<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Histogram2dContour<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;polarity&rsquo;<\/span>], <span class=\"pl-s1\">y<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Rating&rsquo;<\/span>], <span class=\"pl-s1\">name<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;density&rsquo;<\/span>, <span class=\"pl-s1\">ncontours<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">20<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">colorscale<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Hot&rsquo;<\/span>, <span class=\"pl-s1\">reversescale<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">True<\/span>, <span class=\"pl-s1\">showscale<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC8\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L9\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"9\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC9\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace3<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Histogram<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L10\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"10\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;polarity&rsquo;<\/span>], <span class=\"pl-s1\">name<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Sentiment polarity density&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L11\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"11\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC11\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">marker<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<span class=\"pl-s1\">color<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;rgb(102,0,0)&rsquo;<\/span>),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L12\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"12\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC12\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">yaxis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;y2&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L13\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"13\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC13\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L14\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"14\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC14\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">trace4<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Histogram<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L15\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"15\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC15\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">y<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Rating&rsquo;<\/span>], <span class=\"pl-s1\">name<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;Rating density&rsquo;<\/span>, <span class=\"pl-s1\">marker<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<span class=\"pl-s1\">color<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;rgb(102,0,0)&rsquo;<\/span>),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L16\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"16\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC16\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">xaxis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;x2&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L17\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"17\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC17\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L18\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"18\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC18\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">data<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s1\">trace1<\/span>, <span class=\"pl-s1\">trace2<\/span>, <span class=\"pl-s1\">trace3<\/span>, <span class=\"pl-s1\">trace4<\/span>]<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L19\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"19\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC19\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L20\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"20\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC20\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">layout<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Layout<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L21\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"21\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC21\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">showlegend<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L22\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"22\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC22\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">autosize<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L23\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"23\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC23\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">width<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">600<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L24\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"24\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC24\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">height<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">550<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L25\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"25\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC25\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">xaxis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L26\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"26\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC26\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">domain<\/span><span class=\"pl-c1\">=<\/span>[<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">0.85<\/span>],<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L27\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"27\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC27\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">showgrid<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L28\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"28\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC28\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">zeroline<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L29\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"29\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC29\" class=\"blob-code blob-code-inner js-file-line\">),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L30\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"30\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC30\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">yaxis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L31\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"31\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC31\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">domain<\/span><span class=\"pl-c1\">=<\/span>[<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">0.85<\/span>],<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L32\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"32\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC32\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">showgrid<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L33\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"33\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC33\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">zeroline<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L34\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"34\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC34\" class=\"blob-code blob-code-inner js-file-line\">),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L35\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"35\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC35\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">margin<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L36\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"36\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC36\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">t<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">50<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L37\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"37\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC37\" class=\"blob-code blob-code-inner js-file-line\">),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L38\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"38\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC38\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">hovermode<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;closest&rsquo;<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L39\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"39\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC39\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">bargap<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">0<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L40\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"40\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC40\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">xaxis2<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L41\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"41\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC41\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">domain<\/span><span class=\"pl-c1\">=<\/span>[<span class=\"pl-c1\">0.85<\/span>, <span class=\"pl-c1\">1<\/span>],<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L42\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"42\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC42\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">showgrid<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L43\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"43\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC43\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">zeroline<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L44\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"44\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC44\" class=\"blob-code blob-code-inner js-file-line\">),<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L45\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"45\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC45\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">yaxis2<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-en\">dict<\/span>(<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L46\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"46\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC46\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">domain<\/span><span class=\"pl-c1\">=<\/span>[<span class=\"pl-c1\">0.85<\/span>, <span class=\"pl-c1\">1<\/span>],<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L47\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"47\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC47\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">showgrid<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span>,<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L48\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"48\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC48\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">zeroline<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">False<\/span><\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L49\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"49\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC49\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L50\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"50\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC50\" class=\"blob-code blob-code-inner js-file-line\">)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L51\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"51\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC51\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L52\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"52\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC52\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">fig<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">go<\/span>.<span class=\"pl-v\">Figure<\/span>(<span class=\"pl-s1\">data<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">data<\/span>, <span class=\"pl-s1\">layout<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">layout<\/span>)<\/td><\/tr><tr><td id=\"file-sentiment_polarity_rating-py-L53\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"53\">\u00a0<\/td><td id=\"file-sentiment_polarity_rating-py-LC53\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-en\">iplot<\/span>(<span class=\"pl-s1\">fig<\/span>, <span class=\"pl-s1\">filename<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;2dhistogram-2d-density-plot-subplots&rsquo;<\/span>)<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15853 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116.png\" alt=\"\" width=\"602\" height=\"422\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116.png 602w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116-300x210.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116-18x12.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116-120x85.png 120w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/2022-04-23-230116-600x421.png 600w\" sizes=\"(max-width: 602px) 100vw, 602px\" \/><\/div><\/div><div>\u00a0<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-8f43a80 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8f43a80\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f9d5c49\" data-id=\"f9d5c49\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fcbcef1 elementor-widget elementor-widget-heading\" data-id=\"fcbcef1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><span class=\"ez-toc-section\" id=\"Modelisation-du-contenu\"><\/span>Mod\u00e9lisation du contenu<span class=\"ez-toc-section-end\"><\/span><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-18775ea elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"18775ea\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-aee576a\" data-id=\"aee576a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-81f1d46 elementor-widget elementor-widget-text-editor\" data-id=\"81f1d46\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Enfin, nous voulons explorer l&rsquo;algorithme de mod\u00e9lisation de sujet pour cet ensemble de donn\u00e9es, pour voir s&rsquo;il apporterait des avantages et cadrerait avec ce que nous faisons pour notre fonctionnalit\u00e9 de texte de r\u00e9vision.<\/p><p>Nous exp\u00e9rimenterons la technique d&rsquo;analyse s\u00e9mantique latente (LSA) dans la mod\u00e9lisation th\u00e9matique.<\/p><ul><li>G\u00e9n\u00e9ration de notre matrice de termes de document \u00e0 partir du texte de r\u00e9vision vers une matrice de fonctionnalit\u00e9s TF-IDF.<\/li><li>Le mod\u00e8le LSA remplace les d\u00e9comptes bruts dans la matrice de termes de document par un score TF-IDF.<\/li><li>Effectuez une r\u00e9duction de dimensionnalit\u00e9 sur la matrice de termes de document \u00e0 l&rsquo;aide d&rsquo;un SVD tronqu\u00e9.<\/li><li>Comme le nombre de d\u00e9partements est 6, nous d\u00e9finissons n_topics=6.<\/li><li>Prendre l&rsquo;argmax de chaque texte de r\u00e9vision dans cette matrice de sujets donnera les sujets pr\u00e9dits de chaque texte de r\u00e9vision dans les donn\u00e9es. Nous pouvons ensuite les trier en nombre de chaque sujet.<\/li><li>Pour mieux comprendre chaque sujet, nous allons retrouver les trois mots les plus fr\u00e9quents dans chaque sujet.<\/li><\/ul><div class=\"gist-data\"><div class=\"js-gist-file-update-container js-task-list-container file-box\"><div id=\"file-topic_model_lsa-py\" class=\"file my-2\"><div class=\"Box-body p-0 blob-wrapper data type-python  \"><div class=\"js-check-bidi js-blob-code-container blob-code-content\"><table class=\"highlight tab-size js-file-line-container js-code-nav-container js-tagsearch-file\" data-tab-size=\"8\" data-paste-markdown-skip=\"\" data-tagsearch-lang=\"Python\" data-tagsearch-path=\"topic_model_LSA.py\"><tbody><tr><td id=\"file-topic_model_lsa-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">reindexed_data<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">df<\/span>[<span class=\"pl-s\">&lsquo;Review Text&rsquo;<\/span>]<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L2\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"2\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">tfidf_vectorizer<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">TfidfVectorizer<\/span>(<span class=\"pl-s1\">stop_words<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s\">&lsquo;english&rsquo;<\/span>, <span class=\"pl-s1\">use_idf<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">True<\/span>, <span class=\"pl-s1\">smooth_idf<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">True<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L3\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"3\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">reindexed_data<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">reindexed_data<\/span>.<span class=\"pl-s1\">values<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L4\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"4\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">document_term_matrix<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">tfidf_vectorizer<\/span>.<span class=\"pl-en\">fit_transform<\/span>(<span class=\"pl-s1\">reindexed_data<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L5\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"5\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">n_topics<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-c1\">6<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L6\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"6\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">lsa_model<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">TruncatedSVD<\/span>(<span class=\"pl-s1\">n_components<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">n_topics<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L7\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"7\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">lsa_topic_matrix<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">lsa_model<\/span>.<span class=\"pl-en\">fit_transform<\/span>(<span class=\"pl-s1\">document_term_matrix<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L8\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"8\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC8\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L9\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"9\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC9\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">def<\/span> <span class=\"pl-en\">get_keys<\/span>(<span class=\"pl-s1\">topic_matrix<\/span>):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L10\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"10\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\">\u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L11\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"11\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC11\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> returns an integer list of predicted topic <\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L12\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"12\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC12\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> categories for a given topic matrix<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L13\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"13\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC13\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> \u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L14\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"14\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC14\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">keys<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">topic_matrix<\/span>.<span class=\"pl-en\">argmax<\/span>(<span class=\"pl-s1\">axis<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">1<\/span>).<span class=\"pl-en\">tolist<\/span>()<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L15\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"15\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC15\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">return<\/span> <span class=\"pl-s1\">keys<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L16\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"16\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC16\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L17\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"17\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC17\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">def<\/span> <span class=\"pl-en\">keys_to_counts<\/span>(<span class=\"pl-s1\">keys<\/span>):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L18\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"18\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC18\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\">\u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L19\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"19\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC19\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> returns a tuple of topic categories and their <\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L20\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"20\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC20\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> accompanying magnitudes for a given list of keys<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L21\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"21\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC21\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> \u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L22\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"22\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC22\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">count_pairs<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-v\">Counter<\/span>(<span class=\"pl-s1\">keys<\/span>).<span class=\"pl-en\">items<\/span>()<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L23\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"23\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC23\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">categories<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s1\">pair<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">pair<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">count_pairs<\/span>]<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L24\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"24\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC24\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">counts<\/span> <span class=\"pl-c1\">=<\/span> [<span class=\"pl-s1\">pair<\/span>[<span class=\"pl-c1\">1<\/span>] <span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">pair<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">count_pairs<\/span>]<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L25\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"25\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC25\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">return<\/span> (<span class=\"pl-s1\">categories<\/span>, <span class=\"pl-s1\">counts<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L26\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"26\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC26\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L27\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"27\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC27\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">lsa_keys<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-en\">get_keys<\/span>(<span class=\"pl-s1\">lsa_topic_matrix<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L28\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"28\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC28\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">lsa_categories<\/span>, <span class=\"pl-s1\">lsa_counts<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-en\">keys_to_counts<\/span>(<span class=\"pl-s1\">lsa_keys<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L29\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"29\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC29\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L30\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"30\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC30\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">def<\/span> <span class=\"pl-en\">get_top_n_words<\/span>(<span class=\"pl-s1\">n<\/span>, <span class=\"pl-s1\">keys<\/span>, <span class=\"pl-s1\">document_term_matrix<\/span>, <span class=\"pl-s1\">tfidf_vectorizer<\/span>):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L31\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"31\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC31\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\">\u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L32\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"32\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC32\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> returns a list of n_topic strings, where each string contains the n most common <\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L33\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"33\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC33\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> words in a predicted category, in order<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L34\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"34\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC34\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s\"> \u00a0\u00bb&rsquo;<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L35\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"35\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC35\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_word_indices<\/span> <span class=\"pl-c1\">=<\/span> []<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L36\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"36\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC36\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">topic<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-en\">range<\/span>(<span class=\"pl-s1\">n_topics<\/span>):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L37\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"37\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC37\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">temp_vector_sum<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-c1\">0<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L38\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"38\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC38\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">i<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-en\">range<\/span>(<span class=\"pl-en\">len<\/span>(<span class=\"pl-s1\">keys<\/span>)):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L39\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"39\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC39\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">if<\/span> <span class=\"pl-s1\">keys<\/span>[<span class=\"pl-s1\">i<\/span>] <span class=\"pl-c1\">==<\/span> <span class=\"pl-s1\">topic<\/span>:<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L40\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"40\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC40\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">temp_vector_sum<\/span> <span class=\"pl-c1\">+=<\/span> <span class=\"pl-s1\">document_term_matrix<\/span>[<span class=\"pl-s1\">i<\/span>]<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L41\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"41\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC41\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">temp_vector_sum<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">temp_vector_sum<\/span>.<span class=\"pl-en\">toarray<\/span>()<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L42\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"42\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC42\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_n_word_indices<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">np<\/span>.<span class=\"pl-en\">flip<\/span>(<span class=\"pl-s1\">np<\/span>.<span class=\"pl-en\">argsort<\/span>(<span class=\"pl-s1\">temp_vector_sum<\/span>)[<span class=\"pl-c1\">0<\/span>][<span class=\"pl-c1\">&#8211;<\/span><span class=\"pl-s1\">n<\/span>:],<span class=\"pl-c1\">0<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L43\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"43\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC43\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_word_indices<\/span>.<span class=\"pl-en\">append<\/span>(<span class=\"pl-s1\">top_n_word_indices<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L44\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"44\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC44\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_words<\/span> <span class=\"pl-c1\">=<\/span> []<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L45\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"45\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC45\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">topic<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">top_word_indices<\/span>:<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L46\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"46\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC46\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">topic_words<\/span> <span class=\"pl-c1\">=<\/span> []<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L47\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"47\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC47\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">index<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-s1\">topic<\/span>:<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L48\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"48\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC48\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">temp_word_vector<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">np<\/span>.<span class=\"pl-en\">zeros<\/span>((<span class=\"pl-c1\">1<\/span>,<span class=\"pl-s1\">document_term_matrix<\/span>.<span class=\"pl-s1\">shape<\/span>[<span class=\"pl-c1\">1<\/span>]))<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L49\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"49\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC49\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">temp_word_vector<\/span>[:,<span class=\"pl-s1\">index<\/span>] <span class=\"pl-c1\">=<\/span> <span class=\"pl-c1\">1<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L50\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"50\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC50\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">the_word<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s1\">tfidf_vectorizer<\/span>.<span class=\"pl-en\">inverse_transform<\/span>(<span class=\"pl-s1\">temp_word_vector<\/span>)[<span class=\"pl-c1\">0<\/span>][<span class=\"pl-c1\">0<\/span>]<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L51\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"51\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC51\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">topic_words<\/span>.<span class=\"pl-en\">append<\/span>(<span class=\"pl-s1\">the_word<\/span>.<span class=\"pl-en\">encode<\/span>(<span class=\"pl-s\">&lsquo;ascii&rsquo;<\/span>).<span class=\"pl-en\">decode<\/span>(<span class=\"pl-s\">&lsquo;utf-8&rsquo;<\/span>))<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L52\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"52\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC52\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_words<\/span>.<span class=\"pl-en\">append<\/span>(<span class=\"pl-s\">\u00a0\u00bb \u00ab\u00a0<\/span>.<span class=\"pl-en\">join<\/span>(<span class=\"pl-s1\">topic_words<\/span>))<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L53\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"53\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC53\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">return<\/span> <span class=\"pl-s1\">top_words<\/span><\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L54\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"54\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC54\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L55\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"55\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC55\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">top_n_words_lsa<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-en\">get_top_n_words<\/span>(<span class=\"pl-c1\">3<\/span>, <span class=\"pl-s1\">lsa_keys<\/span>, <span class=\"pl-s1\">document_term_matrix<\/span>, <span class=\"pl-s1\">tfidf_vectorizer<\/span>)<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L56\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"56\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC56\" class=\"blob-code blob-code-inner js-file-line\">\u00a0<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L57\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"57\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC57\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">for<\/span> <span class=\"pl-s1\">i<\/span> <span class=\"pl-c1\">in<\/span> <span class=\"pl-en\">range<\/span>(<span class=\"pl-en\">len<\/span>(<span class=\"pl-s1\">top_n_words_lsa<\/span>)):<\/td><\/tr><tr><td id=\"file-topic_model_lsa-py-L58\" class=\"blob-num js-line-number js-code-nav-line-number js-blob-rnum\" data-line-number=\"58\">\u00a0<\/td><td id=\"file-topic_model_lsa-py-LC58\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-en\">print<\/span>(<span class=\"pl-s\">\u00ab\u00a0Topic {}: \u00ab\u00a0<\/span>.<span class=\"pl-en\">format<\/span>(<span class=\"pl-s1\">i<\/span><span class=\"pl-c1\">+<\/span><span class=\"pl-c1\">1<\/span>), <span class=\"pl-s1\">top_n_words_lsa<\/span>[<span class=\"pl-s1\">i<\/span>])<\/td><\/tr><\/tbody><\/table><\/div><\/div><\/div><\/div><\/div><div class=\"gist-meta\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15854 size-full\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fJyNhWujbPjMluWyY6bnlw.png\" alt=\"\" width=\"559\" height=\"182\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fJyNhWujbPjMluWyY6bnlw.png 559w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fJyNhWujbPjMluWyY6bnlw-300x98.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_fJyNhWujbPjMluWyY6bnlw-18x6.png 18w\" sizes=\"(max-width: 559px) 100vw, 559px\" \/><\/div><div><pre class=\"ms mt mu mv gz mw bt mx\"><span id=\"6a0a\" class=\"gc my mb jo mz b do na nb l nc\" data-selectable-paragraph=\"\">top_3_words = get_top_n_words(3, lsa_keys, document_term_matrix, tfidf_vectorizer)<br \/>labels = ['Topic {}: \\n'.format(i) + top_3_words[i] for i in lsa_categories]<\/span><span id=\"e62f\" class=\"gc my mb jo mz b do pc pd pe pf pg nb l nc\" data-selectable-paragraph=\"\">fig, ax = plt.subplots(figsize=(16,8))<br \/>ax.bar(lsa_categories, lsa_counts);<br \/>ax.set_xticks(lsa_categories);<br \/>ax.set_xticklabels(labels);<br \/>ax.set_ylabel('Number of review text');<br \/>ax.set_title('LSA topic counts');<br \/>plt.show();<\/span><\/pre><\/div><div><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-15855 size-large\" src=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-1024x521.png\" alt=\"\" width=\"1024\" height=\"521\" title=\"\" srcset=\"https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-1024x521.png 1024w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-300x153.png 300w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-768x391.png 768w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-18x9.png 18w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA-600x305.png 600w, https:\/\/complex-systems-ai.com\/wp-content\/uploads\/2022\/04\/1_CccrIoMfxRb74oYr8E6LhA.png 1220w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/div><div>\u00a0<\/div><div><p>En examinant les mots les plus fr\u00e9quents dans chaque sujet, nous avons le sentiment que nous n&rsquo;atteindrons peut-\u00eatre aucun degr\u00e9 de s\u00e9paration entre les cat\u00e9gories de sujets. En d&rsquo;autres termes, nous ne pouvions pas s\u00e9parer les textes d&rsquo;examen par d\u00e9partement \u00e0 l&rsquo;aide de techniques de mod\u00e9lisation th\u00e9matique.<\/p><p>Les techniques de mod\u00e9lisation th\u00e9matique pr\u00e9sentent un certain nombre de limitations importantes. Pour commencer, le terme \u00ab\u00a0sujet\u00a0\u00bb est quelque peu ambigu, et il est peut-\u00eatre maintenant clair que les mod\u00e8les de sujet ne produiront pas une classification tr\u00e8s nuanc\u00e9e des textes pour nos donn\u00e9es.<\/p><\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>An\u00e1lisis descriptivo P\u00e1gina de inicio de wiki Representar visualmente el contenido de un documento de texto es una de las tareas m\u00e1s importantes en el campo de la miner\u00eda de wiki. <\/p>","protected":false},"author":1,"featured_media":0,"parent":15506,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-15832","page","type-page","status-publish","hentry"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/comments?post=15832"}],"version-history":[{"count":3,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15832\/revisions"}],"predecessor-version":[{"id":15858,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15832\/revisions\/15858"}],"up":[{"embeddable":true,"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/pages\/15506"}],"wp:attachment":[{"href":"https:\/\/complex-systems-ai.com\/es\/wp-json\/wp\/v2\/media?parent=15832"}],"curies":[{"name":"gracias","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}