{"id":485,"date":"2020-10-01T12:20:01","date_gmt":"2020-10-01T16:20:01","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=485"},"modified":"2020-10-01T12:35:04","modified_gmt":"2020-10-01T16:35:04","slug":"how-recurrent-neural-networks-read-sequences","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2020\/10\/01\/how-recurrent-neural-networks-read-sequences\/","title":{"rendered":"How recurrent neural networks \u2018read\u2019 sequences"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">When I first read a description of how <strong>recurrent neural networks<\/strong> differ from other neural networks, I was all like, <em>yeah, that&#8217;s cool.<\/em> I looked at a diagram that had little loops drawn around the units in the hidden layer, and I thought I understood it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As I thought more about it, though, I realized I didn&#8217;t understand how it could possibly do what the author said it did. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In many cases, the input to a recurrent neural net (RNN) is text (more accurately: a numeric representation of text). It might be a sentence, or a tweet, or an entire review of a restaurant or a movie. The output might tell us whether that text is positive or negative, hostile or benign, racist or not \u2014 depending on the application. So the system needs to &#8220;consider&#8221; the text <em>as a whole<\/em>. Word by word will not work. The meanings of words depend on the context in which we find them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And yet, the text has to come in, as input, word by word. The recurrent action (the loops in the diagram) are the way the system &#8220;holds in memory&#8221; the words that have already come in. I thought I understood that \u2014 but then I didn&#8217;t.<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube aligncenter wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jetpack-video-wrapper\"><iframe loading=\"lazy\" title=\"Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition\" width=\"739\" height=\"416\" src=\"https:\/\/www.youtube.com\/embed\/LHXXI4-IEns?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Michael Nguyen&#8217;s excellent video (under 10 minutes!), above, was just what I needed. It is a beautiful explanation \u2014 and what&#8217;s more, he made a text version too: <a rel=\"noreferrer noopener\" href=\"https:\/\/towardsdatascience.com\/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9\" target=\"_blank\">Illustrated Guide to Recurrent Neural Networks<\/a>. It includes embedded animations, like the ones in the video.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the video, Nguyen begins with a short list of the ways we are using the output from RNNs in our everyday lives. Like many of the videos I post here, this one doesn&#8217;t get into the math but instead focuses on the concepts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you can remember the idea of <strong>time steps, <\/strong>you will be able to remember how RNNs differ from other types of neural nets. The time steps are one-by-one inputs that are parts of a larger whole. For a sentence or longer text, each time step is a word. The order matters. Nguyen shows an animated example of movement to make the idea clear: we don&#8217;t know the <em>direction<\/em> of a moving dot unless we know where it&#8217;s been. One freeze-frame doesn&#8217;t tell us the whole story.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">RNNs are helpful for &#8220;reading&#8221; any kind of data in a sequence. The hidden layer reads word 1, produces an output, and then returns it as a precursor to word 2. Word 2 comes in and is modified by that prior output. The output from word 2 loops back and serves as a precursor to word 3. This continues until a stop symbol is reached, signifying the end of the input sequence.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"270\" src=\"https:\/\/www.macloo.com\/ai\/wp-content\/uploads\/2020\/10\/michael_nguyen_RNN.gif\" alt=\"\" class=\"wp-image-497\"\/><figcaption><em>Animation by <a rel=\"noreferrer noopener\" href=\"https:\/\/towardsdatascience.com\/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9\" target=\"_blank\">Michael Nguyen<\/a> a k a Michael Phi<\/em><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">There&#8217;s a bit of a problem in that <em>the longer the sequence,<\/em> the less influence the earliest steps have on the current one. This led me down a long rabbit hole of learning about <strong>long short-term memory networks<\/strong> and <strong>gradient descent<\/strong>. I used <a rel=\"noreferrer noopener\" href=\"https:\/\/builtin.com\/data-science\/recurrent-neural-networks-and-lstm\" target=\"_blank\">this article<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/watch?v=IHZwWFHWa-w\" target=\"_blank\">this video<\/a> to help me with those.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At 6:23, Nguyen begins to explain the effects of <strong>back propagation<\/strong> on a deep feed-forward neural network (not an RNN). This was very helpful! He defines the <strong>gradient<\/strong> as &#8220;a value used to adjust the network&#8217;s internal weights, allowing the network to learn.&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At 8:35, he explains long short-term memory networks (LSTMs) and <a rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Gated_recurrent_unit\" target=\"_blank\">gated recurrent units<\/a> (GRUs). To grossly simplify, these address the problem noted above by essentially learning what is important to keep and what can be thrown away. For example, in the animation above, <em>what<\/em> and <em>time<\/em> are the most important; <em>is<\/em> and <em>it<\/em> can be thrown away.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So an RNN will be used for shorter sequences, and for longer sequences, LSTMs or GRUs will be used. Any of these will <em>loop back<\/em> within the hidden layer to obtain a value <em>for the complete sequence<\/em> before outputting a prediction \u2014 a value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When I first read a description of how recurrent neural networks differ from other neural networks, I was all like, yeah, that&#8217;s cool. I looked at a diagram that had little loops drawn around the units in the hidden layer, and I thought I understood it. As I thought more about it, though, I realized&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2020\/10\/01\/how-recurrent-neural-networks-read-sequences\/\">Continue reading <span class=\"screen-reader-text\">How recurrent neural networks \u2018read\u2019 sequences<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2],"tags":[97,15,100,99],"class_list":["post-485","post","type-post","status-publish","format-standard","hentry","category-nlp","tag-language","tag-neural_network","tag-recurrent","tag-rnns"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/485","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=485"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/485\/revisions"}],"predecessor-version":[{"id":507,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/485\/revisions\/507"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=485"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=485"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}