{"id":136,"date":"2020-08-20T11:11:34","date_gmt":"2020-08-20T15:11:34","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=136"},"modified":"2020-08-24T12:02:20","modified_gmt":"2020-08-24T16:02:20","slug":"who-labels-the-data-for-ai","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2020\/08\/20\/who-labels-the-data-for-ai\/","title":{"rendered":"Who labels the data for AI?"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/www.macloo.com\/ai\/2020\/08\/19\/imagenet-and-labels-for-data\/\">yesterday&#8217;s post<\/a>, I referred to the labels that are required for supervised machine learning. To train a model \u2014 which enables an AI system to correctly identify or sort images or documents or iris flowers (and so much more) \u2014 <em>each data record<\/em> must include one or more labels. For an image of a dog, for example, the labels might be <em>dog<\/em> and <em>Great Dane<\/em>. For an iris flower, the label is the name of the exact species of that individual flower.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Nowadays there are people <em>all around the world<\/em> sitting at computers and labeling data.<\/p>\n\n\n\n<figure class=\"wp-block-embed-youtube aligncenter wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jetpack-video-wrapper\"><iframe loading=\"lazy\" title=\"How do you teach artificial intelligence? - BBC Click\" width=\"739\" height=\"416\" src=\"https:\/\/www.youtube.com\/embed\/1jK0ZVJ0gIo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In the 6-minute video above, BBC journalist Dave Lee travels to Kenya, where about 2,000 people work in a Nairobi office for <a rel=\"noreferrer noopener\" href=\"https:\/\/www.samasource.com\/\" target=\"_blank\">Samasource<\/a>, which produces training data for use in machine learning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You&#8217;ll see exactly how <em>every single item<\/em> in one video frame is marked and tagged \u2014 this is what a vision system for a self-driving car needs if it is to avoid crashing into mailboxes or people.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the Nairobi office, 52 percent of the workers are women. The pay is terribly low by Silicon Valley standards, but high for Kenya. Lee doesn&#8217;t gloss over this aspect of the story \u2014 in fact, it&#8217;s central to the telling.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Financial Times<\/em> journalist Madhumita Murgia <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ft.com\/content\/56dde36c-aa40-11e9-984c-fac8325aaa04\" target=\"_blank\">wrote about Samasource<\/a> in July 2019. Her story also covers <a rel=\"noreferrer noopener\" href=\"https:\/\/imerit.net\/\" target=\"_blank\">iMerit<\/a>, a similar company with offices in Kolkata, India, as well as California and Louisiana. <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;An hour of video takes eight hours to annotate. In fact, a McKinsey report from 2018 listed data labeling as the biggest obstacle to AI adoption in industry.&#8221;<\/p><cite>\u2014Financial Times<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Some very large and widely used datasets such as <a rel=\"noreferrer noopener\" href=\"http:\/\/image-net.org\/\" target=\"_blank\">ImageNet<\/a> were labeled by self-employed workers for extremely low rates of pay \u2014 often through the Amazon-owned Mechanical Turk crowdsourcing website (which also offers up <a rel=\"noreferrer noopener\" href=\"https:\/\/www.gizmodo.co.uk\/2020\/01\/horror-stories-from-inside-amazons-mechanical-turk\/\" target=\"_blank\">far worse tasks<\/a> for similarly low compensation). In contrast, Samasource&#8217;s CEO Leila Janah told Murgia that the company&#8217;s pay rate is &#8220;almost quadruple&#8221; the previous income of their workers in developing countries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Janah also pointed out that these workers are not just labeling cats and dogs. They have been trained, for example, to label diseased cells in photos of cross-sections of plants for one particular project. They are providing real human intelligence that is specialized to very particular problem sets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Fortune<\/em> journalist Jeremy Kahn <a rel=\"noreferrer noopener\" href=\"https:\/\/fortune.com\/2020\/02\/04\/artificial-intelligence-data-labeling-labelbox\/\" target=\"_blank\">wrote about other companies<\/a> that also provide data-labeling services for top multinational firms. <a rel=\"noreferrer noopener\" href=\"https:\/\/labelbox.com\/\" target=\"_blank\">Labelbox<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/scale.com\/\" target=\"_blank\">Scale AI<\/a> have received heaps of funding from venture capitalists, but I couldn&#8217;t find <em>any<\/em> information about their workers who label the data. Is this something we should be concerned about? Probably so. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Both Samasource and iMerit are upfront about who their workers are and where they do the work (this might have changed since the spread of COVID-19 in early 2020). Are the dozens of other companies supplying labeled data to corporations <em>and universities<\/em> in the wealthy countries paying their workers a living wage?<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;Often companies have a need for both general and more expert labeling and employ a combination of outsourcing firms, freelancers, and in-house experts to affix these annotations.&#8221;<\/p><cite>\u2014Fortune<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Labelbox, in fact, doesn\u2019t employ people who do the labeling work, according to <em>Fortune<\/em>. It provides &#8220;a tool for managing labeling projects and data across different contract labelers, who often work for large outsourcing firms.&#8221; <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In yesterday&#8217;s post, I referred to the labels that are required for supervised machine learning. To train a model \u2014 which enables an AI system to correctly identify or sort images or documents or iris flowers (and so much more) \u2014 each data record must include one or more labels. For an image of a&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2020\/08\/20\/who-labels-the-data-for-ai\/\">Continue reading <span class=\"screen-reader-text\">Who labels the data for AI?<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[5],"tags":[34,37,38],"class_list":["post-136","post","type-post","status-publish","format-standard","hentry","category-machine-learning","tag-labels","tag-people","tag-workers"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/136","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=136"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/136\/revisions"}],"predecessor-version":[{"id":188,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/136\/revisions\/188"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=136"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=136"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}