{"id":1074,"date":"2014-02-22T08:09:14","date_gmt":"2014-02-22T16:09:14","guid":{"rendered":"http:\/\/www.ticoneva.com\/journal\/?p=1074"},"modified":"2014-02-22T08:11:51","modified_gmt":"2014-02-22T16:11:51","slug":"preserving-constants-in-a-stata-collapse-operation","status":"publish","type":"post","link":"https:\/\/www.ticoneva.com\/journal\/?p=1074","title":{"rendered":"Preserving Constants in a Stata Collapse Operation"},"content":{"rendered":"<p>Let&#8217;s say you have a variable that you know is constant within each group, what is the best way to preserve it during a collapse operation in Stata? You might think taking the first value (firstnm) must be the fastest, since it theoretically only requires 1 step per group. If that is the case, you are in for a surprise&#8212;Stata is actually better in calculating the mean.<\/p>\n<p>Here are the simulation results for 100 groups of 1000 randomly generated observations, averaged over 30 runs:<\/p>\n<p>collapse mean 0.0443<br \/>\ncollapse median 0.1062<br \/>\ncollapse min 0.0844<br \/>\ncollapse max 0.0657<br \/>\ncollapse count 0.0456<br \/>\ncollapse firstnm 0.0473<br \/>\ncollapse lastnm 0.0464<\/p>\n<p>The measurements are reported in seconds. The relative speed is quite stable to variations in number of groups and observations. Base on my analysis of the underlying algorthims collapse uses, the reason why firstnm is so slow is that an order-preserving sort has to be performed on the data, and order-preserving sorts are slow relative to non-preserving ones. To confirm this is true, I ran the test with just one group of 100k observations:<\/p>\n<p>collapse mean 0.0614<br \/>\ncollapse firstnm 0.0508<\/p>\n<p>And as expected, firstnm is now faster. The calculation of mean also slows down more than that of firstnm as the number of groups decrease.<\/p>\n<p>Base on my simulations, calculation of mean is faster when there are as little as 3 groups, so mean is the way to go in most cases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let&#8217;s say you have a variable that you know is constant within each group, what is the best way to preserve it during a collapse operation in Stata? You might think taking the first value (firstnm) must be the fastest, since it theoretically only requires 1 step per group. If that is the case, you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2,7],"tags":[],"class_list":["post-1074","post","type-post","status-publish","format-standard","hentry","category-political-economy","category-tech-zone"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/posts\/1074","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1074"}],"version-history":[{"count":2,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/posts\/1074\/revisions"}],"predecessor-version":[{"id":1076,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=\/wp\/v2\/posts\/1074\/revisions\/1076"}],"wp:attachment":[{"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1074"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1074"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ticoneva.com\/journal\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1074"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}