{"id":14855,"date":"2015-02-01T21:32:50","date_gmt":"2015-02-02T03:32:50","guid":{"rendered":"http:\/\/bucktownbell.com\/?p=14855"},"modified":"2015-02-01T21:33:11","modified_gmt":"2015-02-02T03:33:11","slug":"test-shows-big-data-text-analysis-inconsistent-inaccurate","status":"publish","type":"post","link":"http:\/\/bucktownbell.com\/?p=14855","title":{"rendered":"Test shows big data text analysis inconsistent, inaccurate"},"content":{"rendered":"<blockquote><p>Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are &#8220;actually very poor, since they are for an exceedingly easy case,&#8221; Amaral said in <a href=\"http:\/\/www.eurekalert.org\/pub_releases\/2015-01\/nu-btb012915.php\">an announcement from Northwestern about the study.<\/a><\/p>\n<p>Applied to messy, inconsistently scrubbed data from many sources in many formats \u2013 the base of data for which big data is often praised for its ability to manage \u2013 the results would be far less accurate and far less reproducible, according to the paper.<\/p><\/blockquote>\n<p>via <a href=\"http:\/\/www.computerworld.com\/article\/2878080\/test-shows-big-data-text-analysis-inconsistent-inaccurate.html\">Test shows big data text analysis inconsistent, inaccurate | Computerworld<\/a>.<\/p>\n<p>Here&#8217;s an interesting explanation as to how LDA, Latent Dinchlet Allocation works.\u00a0 From: <a href=\"http:\/\/www.quora.com\/What-is-a-good-explanation-of-Latent-Dirichlet-Allocation\">What is a good explanation of Latent Dirichlet Allocation?<\/a><\/p>\n<p>From a 3000 foot level as I understand the explanation of LDA; it seems like a mechanism to score words in order to categorize sets of words like paragraphs or entire papers.\u00a0 Interesting exercise but a human must data model this first.\u00a0 Any time some program has to estimate or guess like this there will be error, the only issue is how much is acceptable to even use the results that this kind of analysis produces.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are &#8220;actually very poor, since they are for an exceedingly easy case,&#8221; Amaral said in an announcement from Northwestern about the study. Applied to messy, inconsistently scrubbed &hellip; <a href=\"http:\/\/bucktownbell.com\/?p=14855\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[737],"tags":[532,475,170],"class_list":["post-14855","post","type-post","status-publish","format-standard","hentry","category-stem","tag-analytics","tag-big-data","tag-data-modeling"],"_links":{"self":[{"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/posts\/14855","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14855"}],"version-history":[{"count":3,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/posts\/14855\/revisions"}],"predecessor-version":[{"id":14859,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=\/wp\/v2\/posts\/14855\/revisions\/14859"}],"wp:attachment":[{"href":"http:\/\/bucktownbell.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14855"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14855"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bucktownbell.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14855"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}