{"id":439,"date":"2020-09-23T11:57:11","date_gmt":"2020-09-23T15:57:11","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=439"},"modified":"2020-09-23T11:57:11","modified_gmt":"2020-09-23T15:57:11","slug":"how-does-machine-learning-understand-sentiment","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2020\/09\/23\/how-does-machine-learning-understand-sentiment\/","title":{"rendered":"How does machine learning understand sentiment?"},"content":{"rendered":"\n<p>Sometimes I come across a video on YouTube that&#8217;s almost too simple \u2014 and that&#8217;s exactly what makes it great. Andy Kim, a junior at the elite prep school Deerfield Academy in Massachusetts, gave a local TED Talk about <strong>sentiment analysis,<\/strong> and I think it&#8217;s really perfect for anyone who&#8217;s spent a little time on understanding image recognition, but who has not yet studied much about natural language processing.<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube aligncenter wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jetpack-video-wrapper\"><iframe loading=\"lazy\" title=\"Sentiment Analysis: extracting emotion through machine learning | Andy Kim | TEDxDeerfield\" width=\"739\" height=\"416\" src=\"https:\/\/www.youtube.com\/embed\/n4L5hHFcGVk?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p>Your first thought might be that detecting the sentiment of a tweet, a movie review, or a response to customer service is just a matter of word definitions. <em>Love<\/em> is a positive word; <em>hate<\/em> is a negative word.<\/p>\n\n\n\n<p>But as Melanie Mitchell wrote in <a rel=\"noreferrer noopener\" href=\"https:\/\/us.macmillan.com\/books\/9780374257835\" target=\"_blank\">Artificial Intelligence: A Guide for Thinking Humans<\/a> (2019): &#8220;Looking at single words or short sequences in isolation is generally <em>not sufficient<\/em> to glean the overall sentiment; it&#8217;s necessary to capture the semantics of words in the <strong>context<\/strong> of the whole sentence&#8221; (p. 183; my emphasis).<\/p>\n\n\n\n<p>Kim, in his TED Talk, does a good job of explaining how words are represented as <strong>vectors,<\/strong> and how this enables complex associations with similar or related terms. He doesn&#8217;t use a diagram of three-dimensional space (which I find helpful for conceptualizing this in my own mind); instead he refers to &#8220;an <em>n<\/em> dimensional space,&#8221; which I think my journalism students might not instantly visualize.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;These word vectors can span from 25 up to a thousand components. Now, conveniently, as these vectors are still simply a list of numbers, they can be plotted on an <em>n<\/em> dimensional space &#8230;&#8221;<\/p><cite>\u2014Andy Kim<\/cite><\/blockquote>\n\n\n\n<p>In computer programming, a <strong>vector<\/strong> is a list of values, which you can think of as points or coordinates. In a two-dimensional space, you might have <em>x<\/em> and <em>y, <\/em>with the value of <em>x<\/em> representing the point&#8217;s position on a horizontal line, and the value of <em>y<\/em> representing the point&#8217;s position on a vertical line. Add a third dimension, and you have a third coordinate, <em>z<\/em>.<\/p>\n\n\n\n<p>To simulate more dimensions, we add even more values to the list. A single word will have a list of many values, and those values signify its relations to other words in the collection of all words in the system.<\/p>\n\n\n\n<p>At about the middle of his talk, Kim makes it perfectly clear why so many dimensions are needed to represent relationships among terms that have multiple meanings.<\/p>\n\n\n\n<p>Kim goes on to talk about the <strong>labeled data<\/strong> for training a system to detect, or recognize, sentiment in text. He used a freely available dataset from Kaggle, probably the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.kaggle.com\/kazanova\/sentiment140\" target=\"_blank\">Sentiment140 dataset with 1.6 million tweets<\/a>. (Another widely used dataset for sentiment analysis training is the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.kaggle.com\/lakshmi25npathi\/imdb-dataset-of-50k-movie-reviews\" target=\"_blank\">IMDB Dataset of 50K Movie Reviews<\/a>.) Kim also demonstrates <strong>cleaning<\/strong> the Twitter data so that usernames, hashtags and <a rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Stop_word\" target=\"_blank\">stop words<\/a> are eliminated.<\/p>\n\n\n\n<p>Kim used the <a rel=\"noreferrer noopener\" href=\"https:\/\/nlp.stanford.edu\/projects\/glove\/\" target=\"_blank\">GloVe algorithm<\/a> to construct vectors for the words in his dataset, but he skips over the details of the training and just tells us that he wasn&#8217;t very successful; his model only reached a 60 percent accuracy level. He closes by summarizing some of the uses of sentiment analysis.<\/p>\n\n\n\n<p><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes I come across a video on YouTube that&#8217;s almost too simple \u2014 and that&#8217;s exactly what makes it great. Andy Kim, a junior at the elite prep school Deerfield Academy in Massachusetts, gave a local TED Talk about sentiment analysis, and I think it&#8217;s really perfect for anyone who&#8217;s spent a little time on&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2020\/09\/23\/how-does-machine-learning-understand-sentiment\/\">Continue reading <span class=\"screen-reader-text\">How does machine learning understand sentiment?<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2],"tags":[90,18],"class_list":["post-439","post","type-post","status-publish","format-standard","hentry","category-nlp","tag-sentiment_analysis","tag-training"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=439"}],"version-history":[{"count":6,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/439\/revisions"}],"predecessor-version":[{"id":445,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/439\/revisions\/445"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}