{"id":641,"date":"2021-05-04T13:36:53","date_gmt":"2021-05-04T17:36:53","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=641"},"modified":"2021-05-04T13:36:53","modified_gmt":"2021-05-04T17:36:53","slug":"ground-truth-and-labeled-data","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2021\/05\/04\/ground-truth-and-labeled-data\/","title":{"rendered":"\u2018Ground truth\u2019 and labeled data"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Cassie Kozyrkov, who wrote <a rel=\"noreferrer noopener\" href=\"https:\/\/towardsdatascience.com\/in-ai-the-objective-is-subjective-4614795d179b\" target=\"_blank\">this article<\/a>, is head of decision intelligence at Google. It starts out with what looks like a standard explanation of an image-recognition system \u2014 which she deprecatingly refers to as the &#8220;the cat\/not-cat task.&#8221; But don&#8217;t be fooled \u2014 Kozyrkov communicates with clear, sharp precision, and very quickly she asks us to consider circumstances in which we would want a <em>tiger<\/em> to be considered <em>a cat <\/em>and those in which we would want it to be <em>not-cat<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This leads to a discussion of <strong>ground truth.<\/strong> This is &#8220;an ideal expected result&#8221; \u2014 but for whom? Well, for the people who originally built the system. Kozyrkov notes that ground truth is NOT an objective, perfect truth like something studied in a philosophy class (Truth with a capital <em>T<\/em>). It&#8217;s whether a tiger <em>is a cat<\/em> in your reality or <em>not-cat <\/em>in mine.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I am reminded of one of my favorite lines in the rock opera <em>Jesus Christ Superstar<\/em>: &#8220;But what is truth? Is truth unchanging law? We both have truths. Are mine the same as yours?&#8221;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;When such a dataset is used to train ML\/AI systems, <em>systems based on it<\/em> will inherit and amplify the implicit values of the people who decided what the ideal system behavior looked like to them.&#8221;<\/p><cite>\u2014 Cassie Kozyrkov<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">It also brings to mind the practice of testing for <a rel=\"noreferrer noopener\" href=\"https:\/\/methods.sagepub.com\/reference\/encyclopedia-of-survey-research-methods\/n228.xml\" target=\"_blank\">intercoder reliability<\/a> \u2014 standard practice in research that relies on qualitative data. (<a rel=\"noreferrer noopener\" href=\"https:\/\/delvetool.com\/blog\/intercoder\" target=\"_blank\">More about that here.<\/a>)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Say you are using <strong>an existing labeled dataset <\/strong>\u2014 not one you yourself have created \u2014 which is often the case. The labels attached to the data items are the ground truth for that dataset. If it&#8217;s a dataset of images, and some labels applied to photos of people are racist, then that&#8217;s the <em>ground truth<\/em> in that dataset. If it&#8217;s a dataset for sentiment analysis, and a lot of toxic comments are labeled &#8220;not toxic,&#8221; then that&#8217;s the <em>ground truth<\/em> you&#8217;re adopting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It&#8217;s essential for developers to <strong>test systems extensively<\/strong> to uncover these flaws in ground truth.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;You wouldn\u2019t want to fall victim to a myopic fraud detection system with sloppy definitions of what financial fraud looks like, especially if such a system is allowed to falsely accuse people without giving them an easy way to prove their innocence.&#8221;<\/p><cite>\u2014 Cassie Kozyrkov<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/watch?v=EjBXZrQ7fTs\" target=\"_blank\">In a video<\/a> embedded in the same article, Kozyrkov pithily proclaims: &#8220;There are only actually two real lines there. Here&#8217;s what they are: This objective. That data set.&#8221; (At 9:16.) Of course there&#8217;s a ton more code than that (she&#8217;s talking about the programming of the system that creates the model), but in terms of <strong>what you want the system to be able to do,<\/strong> that&#8217;s it in a nutshell: How have you framed your objective? And what&#8217;s in your dataset? More important, in many cases, is what&#8217;s NOT in your dataset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">She says this is where <strong>the core danger in AI<\/strong> lies, because in traditional programming &#8220;it might take 10,000 lines of code, a hundred thousand lines of code maybe, and some human being has to worry about every single one of those lines, agonize over it.&#8221; With supervised machine learning, you&#8217;ve only got the objective and the (gigantic) dataset, and the question is, Have <em>enough people with expertise<\/em> really agonized over each of those things?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">My other favorite bits from the video:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>&#8220;A system that is built and designed for one purpose may not work for a different purpose.&#8221; (6:17)<\/li><li>&#8220;Remember that the objective is <em>subjective<\/em>.&#8221; (6:31)<\/li><li>&#8220;And if you take those two parts really seriously, that is how you are going to build a safe and effective and kind AI system.&#8221; (20:16)<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cassie Kozyrkov, who wrote this article, is head of decision intelligence at Google. It starts out with what looks like a standard explanation of an image-recognition system \u2014 which she deprecatingly refers to as the &#8220;the cat\/not-cat task.&#8221; But don&#8217;t be fooled \u2014 Kozyrkov communicates with clear, sharp precision, and very quickly she asks us&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2021\/05\/04\/ground-truth-and-labeled-data\/\">Continue reading <span class=\"screen-reader-text\">\u2018Ground truth\u2019 and labeled data<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[5],"tags":[126,34,19,125],"class_list":["post-641","post","type-post","status-publish","format-standard","hentry","category-machine-learning","tag-datasets","tag-labels","tag-supervised_learning","tag-truth"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/641","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=641"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/641\/revisions"}],"predecessor-version":[{"id":651,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/641\/revisions\/651"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=641"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=641"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=641"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}