{"id":39,"date":"2020-08-11T10:03:39","date_gmt":"2020-08-11T14:03:39","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=39"},"modified":"2020-08-24T12:07:45","modified_gmt":"2020-08-24T16:07:45","slug":"how-machines-see","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2020\/08\/11\/how-machines-see\/","title":{"rendered":"How machines \u2018see\u2019"},"content":{"rendered":"\n<p>I am fascinated by image recognition. I read about how ImageNet changed the whole universe of machine &#8220;vision&#8221; in 2009 in the excellent book <a rel=\"noreferrer noopener\" href=\"https:\/\/us.macmillan.com\/books\/9780374715236\" target=\"_blank\">Artificial Intelligence: A Guide for Thinking Humans<\/a>, but I&#8217;m not going to discuss ImageNet in this post. (I will get to it eventually.)<\/p>\n\n\n\n<p>To think about how a machine <em>sees<\/em> requires us first to think about human eyes vs. cameras. The machine doesn&#8217;t have a biological eyeball and an optic nerve and a brain. The machine <em>might <\/em>have one or more cameras to allow it to take in visual information.<\/p>\n\n\n\n<p>Whether the machine has cameras or not, the images it receives are the same: digital images, made up entirely of pixels. This is true even if the visual inputs are video. The machine will need to sample that video, taking discrete frames from it to process and analyze.<\/p>\n\n\n\n<p>So the first thing to absorb, as you begin to understand how a machine sees, is that it receives a grid of pixels. If it&#8217;s video, then there are a lot of separate grids. If it&#8217;s one still image, there is one grid. And how does the machine process that grid? It analyzes the <em>differences<\/em> between <em>groups<\/em> of pixels.<\/p>\n\n\n\n<p>This 4-minute video, from an artist and programmer named <a href=\"https:\/\/genekogan.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Gene Kogan<\/a>, helped me a lot.<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube aligncenter wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jetpack-video-wrapper\"><iframe loading=\"lazy\" title=\"What convolutional neural networks see\" width=\"739\" height=\"554\" src=\"https:\/\/www.youtube.com\/embed\/Gu0MkmynWkw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p>Most people have an idea (possibly vague) of how the human brain works, with neurons kind of &#8220;wired together&#8221; in a network. When we imagine a computer <em>neural network, <\/em>most of us probably factor in that mental image of a brain full of neurons. This is both semi-accurate and wildly inaccurate.<\/p>\n\n\n\n<p>In his video, Kogan points out that an image-recognition system uses a <em>convolutional<\/em> neural network, and this network has many, many <em>layers<\/em>.<\/p>\n\n\n\n<p>When he&#8217;s clicking down the list in his video, Kogan is showing us what the different layers are &#8220;paying attention to&#8221; as the video is continuously chopped into one-frame segments. The mind-blowing thing (to me) is that the layers feed forward and backward to each other \u2014 ultimately producing the result he shows near the end, when he can hold a water bottle in front of his webcam, and the software says it <em>sees<\/em> a water bottle.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"669\" src=\"https:\/\/www.macloo.com\/ai\/wp-content\/uploads\/2020\/08\/water_bottle.png\" alt=\"Screenshot of man holding water bottle and neural net evaluation of video image\" class=\"wp-image-43\" srcset=\"https:\/\/www.macloo.com\/ai\/wp-content\/uploads\/2020\/08\/water_bottle.png 1024w, https:\/\/www.macloo.com\/ai\/wp-content\/uploads\/2020\/08\/water_bottle-300x196.png 300w, https:\/\/www.macloo.com\/ai\/wp-content\/uploads\/2020\/08\/water_bottle-768x502.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>Above: Screenshot from 3:10 in the video<\/em><\/figcaption><\/figure><\/div>\n\n\n\n<p>Notice too, that &#8220;water bottle&#8221; is the machine&#8217;s top guess at that moment. Its number 2 guess is &#8220;bow tie.&#8221; Its confidence in &#8220;water bottle&#8221; is not very high, as shown by the red bar to the left of the label. However, the machine&#8217;s confidence in &#8220;water bottle&#8221; <em>is<\/em> much higher than all the other things it determines it <em>might be<\/em> seeing in that frame.<\/p>\n\n\n\n<p>After watching this video, I understood why super-fast graphics-processing hardware is <em>so important<\/em> to image recognition and machine vision.<\/p>\n\n\n\n<p>In tomorrow&#8217;s post, I&#8217;m going to say a bit more about these ideas and share a completely different video that also helped me <em>a lot<\/em> in my attempt to understand how machines see.<\/p>\n\n\n\n<p><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am fascinated by image recognition. I read about how ImageNet changed the whole universe of machine &#8220;vision&#8221; in 2009 in the excellent book Artificial Intelligence: A Guide for Thinking Humans, but I&#8217;m not going to discuss ImageNet in this post. (I will get to it eventually.) To think about how a machine sees requires&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2020\/08\/11\/how-machines-see\/\">Continue reading <span class=\"screen-reader-text\">How machines \u2018see\u2019<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3],"tags":[14,15],"class_list":["post-39","post","type-post","status-publish","format-standard","hentry","category-image-recognition","tag-machine_vision","tag-neural_network"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/39","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=39"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/39\/revisions"}],"predecessor-version":[{"id":195,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/39\/revisions\/195"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=39"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=39"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=39"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}