As promised, I've completed further analysis of the impact of various factors on median household income at the metropolitan area level, the results of which are displayed and discussed below. Rather than focus on median household income in general, I've gone a step further with this analysis by looking at variation in median household income by income tier -- e.g., lower income, middle income, and upper income tiers. The results are interesting (at least I think so...), and they arguably have significance for public policy. Also, if you don't care for all the glorious (ok...boring) details in the following analysis, just scroll on down to the conclusion.
In my previous post I alluded to what I viewed as a misguided hyper-focus on security, crime, and undocumented immigration in this current presidential election season (most notably because of the rhetoric of the Republican nominee) that has distracted from other important social and economic issues. Now, I certainly think the former issues are important -- we certainly shouldn't ignore them -- but if asked to name the issues that I see as most pressing for our nation at this time, my answer would be the latter issues. As such, I think the analysis that follows is both timely and prescient.
As always, don't mistake the statistical significance (or insignificance) of my results for causation (or lack thereof). My results may support or fail to support a number of plausible hypotheses about the relationships between the covariates in my models and the estimated responses observed for the dependent variables. But statistical analysis can only tell us whether two variables are correlated -- in the case of regression in particular, whether an independent variable has a statistically significant effect on the dependent variable while controlling for the effect of the other independent variables in the regression model. Statistical analysis cannot tell us whether causation is, or is not, taking place.
Now, with that little bit of epistemology out of the way, let's dig in!
For the lower income tier, median household income varied from as low as $18,110 to as high as $30,940. Meanwhile mean median household income came in at $24,090.
As for the middle income tier, median household income ranged from $64,550 to $81,280, with mean median household income coming in at $72,920 per metro.
Finally, for the upper income tier, median household income was as low as $149,600 (boy, that doesn't seem low to me!) and as high as $205,500. Meanwhile, the mean median household income for the upper income tier came in at $168,900 per metro.
In the lower income tier, median household income declined by $2,882 on average, with the largest decline being observed in the Springfield, IL, metro area (-$8,603). Among the few places where median household income increased in the lower income tier, the Burlington-South Burlington, VT, metro area experienced the greatest increase (+$3,268).
As for the middle income tier, median household income declined by an average of $5,050 per metro. The most substantial decline (-$13,560) was observed in the Sheboygan, WI, metro area, meanwhile, among metros that experienced an increase in median household income in the middle income tier, Grand Junction, CO, witnessed the greatest increase (+$3,970).
Finally, for the upper income tier, median household income declined by an average of $13,850 per metro (A note for Bernie fans: the upper income tier does not constitute the one tenth of one percent of Americans who qualify as hyper-rich, so these results do not suggest that the super wealthy have lost out on income like the rest of us!). Median household income declined the most (-$55,380) in the Naples-Immokalee-Marco Island, FL, metro area, and it experienced the most growth (+$12,270) in the Muskegon, MI, metro area.
I won't go too much into the model performance statistics save to say that none of these models performed poorly. However, if you compare each model's Adjusted R-squared (which tells us the percent of the variance in the dependent variables explained by each model), model 1 did the best while model 3 came in at second.
It bears reiterating that the results from these models don't tell us anything definitive about causation; only about correlation.
Unlike the demographic and GDP variables, values for educational attainment, and also field of study, equal the total number of adults aged 25+ per metro area that attained a given level of education and majored in a given field for their first degree. This presents some obvious challenges if we want to estimate a linear regression model (which is what OLS is) because the absolute number of adults with these educational characteristics per metro will vary substantially as a result of the substantial variance in the size of the total population of each metro. Taking the natural log of the number of adults with these educational characteristics (for example, the number of adults with only some high school education) is a useful way to "smooth" things out a bit so that we can take an otherwise uninterpretable relationship (see Figure 7) and turn it into a more meaningful one (see Figure 8).
Now, on with the analysis...
For model 2, among the variables that capture people's highest level of educational attainment (measured for adults aged 25+), only the number of adults with some high school and some college had a statistically significant effect on median household income in the lower income tier. According to model 2, a one unit increase in the natural log of adults aged 25+ with only some high school should result in about a $3,176 decrease in median household income in the lower income tier, holding everything else constant. Furthermore, a one unit increase in the natural log of adults aged 25+ with only some college, according to this model, should result in about a $3,551 increase in median household income in the lower income tier, holding all else equal.
For model 3, only the number of people aged 25+ with an associate's degree had an effect on median household income in the middle income tier. According to the model, a one unit increase in the natural log of the number of adults aged 25+ per metro with only an associate's degree should result in a $3,954 decline in median household income in the middle income tier, all else being equal.
Finally, for model 4, the more people without a high school degree (interestingly) was associated with a higher median household income in the upper income tier; although, its effect was only moderately significant. Furthermore, more people with an associate's degree was associated with a lower median household income in the upper income tier, and more people with a graduate degree was associated with a higher median household income in the upper income tier. The results from model 4 suggest that a one unit increase in the natural log of the adult population aged 25+ with an associate's degree for a given metro should result in a $7,510 decline in median household income in the upper income tier, all else being equal. Also, a one unit increase in the natural log of the adult population with a graduate degree should, according to model 4, result in a $11,155 increase in median household income in the upper income tier.
As for model 3, more people who earned their fist degree in one of the social sciences and more people who earned their first degree in engineering was associated with a higher median household income in the middle income tier. Everything else being equal, one unit increase in the natural log of the former, according to the model, should result in a $1,902 increase in median household income in the middle income tier, and one unit increase in the natural log of the latter should result in a $1,234 increase in median household income in the middle income tier.
In addition, more people who earned their first degree in education was associated with a lower median household income in the middle income tier. The results from model 3 suggest that one unit increase in the natural log of the adult population aged 25+ should result in a $2,159 decrease in median household income in the middle income tier, all else being equal.
Furthermore, the natural log of the number of people per metro who earned their fist degree in a physical or related science, and the natural log of the number of people per metro who earned their first degree in the visual and performing arts had an effect on median household income in the middle income tier; however, their affect was only moderately significant. The former was associated with a larger median household income in the middle income tier while the latter was associated with a smaller median household income.
Finally, for model 4, the natural log of the number of people who earned their first degree in psychology and who earned their first degree in the visual and performing arts had a statistically significant association with median household income in the upper income tier. According to the model, one unit increase in the former should result in $5,258 decline in median household income in the upper income tier, all else being equal, and one unit increase in the latter should result in a $3,327 increase in median household income in the upper income tier.
Such an in depth review of the work of other researchers, however, is beyond the scope of this post (and I'm sure I've bored you enough with my findings thus far!). Instead, I'll focus on the following main points:
In my previous post I alluded to what I viewed as a misguided hyper-focus on security, crime, and undocumented immigration in this current presidential election season (most notably because of the rhetoric of the Republican nominee) that has distracted from other important social and economic issues. Now, I certainly think the former issues are important -- we certainly shouldn't ignore them -- but if asked to name the issues that I see as most pressing for our nation at this time, my answer would be the latter issues. As such, I think the analysis that follows is both timely and prescient.
As always, don't mistake the statistical significance (or insignificance) of my results for causation (or lack thereof). My results may support or fail to support a number of plausible hypotheses about the relationships between the covariates in my models and the estimated responses observed for the dependent variables. But statistical analysis can only tell us whether two variables are correlated -- in the case of regression in particular, whether an independent variable has a statistically significant effect on the dependent variable while controlling for the effect of the other independent variables in the regression model. Statistical analysis cannot tell us whether causation is, or is not, taking place.
Now, with that little bit of epistemology out of the way, let's dig in!
Median Household Income Across Income Tiers
Variation in Median Household Income in 2014
Figures 1-3 display median household income in 2014 by income tier per metro area. It's plain to see that median household income in each income-tier varied substantially across metros included in my analysis (n = 229).For the lower income tier, median household income varied from as low as $18,110 to as high as $30,940. Meanwhile mean median household income came in at $24,090.
As for the middle income tier, median household income ranged from $64,550 to $81,280, with mean median household income coming in at $72,920 per metro.
Finally, for the upper income tier, median household income was as low as $149,600 (boy, that doesn't seem low to me!) and as high as $205,500. Meanwhile, the mean median household income for the upper income tier came in at $168,900 per metro.
Net Change in Median Household Income from 1999 to 2014
Median household income per income tier not only varied across metros in 2014, the net change in median household income per income tier also varied substantially from 1999 to 2014. However, as Figures 4-6 show, median household income, regardless of income tier, displayed a fairly consistent pattern of decline from 1999 to 2014.In the lower income tier, median household income declined by $2,882 on average, with the largest decline being observed in the Springfield, IL, metro area (-$8,603). Among the few places where median household income increased in the lower income tier, the Burlington-South Burlington, VT, metro area experienced the greatest increase (+$3,268).
As for the middle income tier, median household income declined by an average of $5,050 per metro. The most substantial decline (-$13,560) was observed in the Sheboygan, WI, metro area, meanwhile, among metros that experienced an increase in median household income in the middle income tier, Grand Junction, CO, witnessed the greatest increase (+$3,970).
Finally, for the upper income tier, median household income declined by an average of $13,850 per metro (A note for Bernie fans: the upper income tier does not constitute the one tenth of one percent of Americans who qualify as hyper-rich, so these results do not suggest that the super wealthy have lost out on income like the rest of us!). Median household income declined the most (-$55,380) in the Naples-Immokalee-Marco Island, FL, metro area, and it experienced the most growth (+$12,270) in the Muskegon, MI, metro area.
Results from Multiple Regression Analysis
Table 1 displays the results from four ordinary least squares (OLS) multiple regression models. Model 1 is the same one displayed and discussed in my previous post (see here), however I'm also displaying it here for comparison purposes. Also, you may note that I made a much more visually appealing table this time around (I admittedly got a little lazy with my last one).I won't go too much into the model performance statistics save to say that none of these models performed poorly. However, if you compare each model's Adjusted R-squared (which tells us the percent of the variance in the dependent variables explained by each model), model 1 did the best while model 3 came in at second.
It bears reiterating that the results from these models don't tell us anything definitive about causation; only about correlation.
Table
1. Summary of OLS model results.
|
|||||
Method: Ordinary Least Squares. | |||||
Dependent Variables: | |||||
Model 1 | median household income in 2014 in 2014 $ | ||||
Model 2 | median household income in the lower-income tier in 2014 in 2014 $ | ||||
Model 3 | median household income in the middle-income tier in 2014 in 2014 $ | ||||
Model 4 | median household income in the upper-income tier in 2014 in 2014 $ | ||||
Results: | |||||
Covariates | Model 1 | Model 2 | Model 3 | Model 4 | |
Estimate (Std. Error) | Estimate (Std. Error) | Estimate (Std. Error) | Estimate (Std. Error) | ||
(Intercept) | 64965.41 (9819.81) *** | 23541.5636 (2721.0173) *** | 76103.8019 (4153.8707) *** | 198926.17 (12936.59) *** | |
Demographics, 2014 | |||||
Share Minority (not white alone) | 66.61 (80.91) | -9.5771 (22.4189) | 30.1266 (34.2244) | -154.18 (106.59) | |
Share African American | -99.07 (100.61) | -36.4692 (27.8794) | 2.1800 (42.5604) | 224.05 (132.55) . | |
Share Hispanic | -193.48 (82.32) * | -1.1952 (22.8096) | -54.4592 (34.8209) | 215.65 (108.44) * | |
Unemployment, 2014 | -526.62 (256.09) * | -17.3160 (70.9613) | -285.3227 (108.3286) | 17.33 (337.37) | |
GDP, 2014 | |||||
Share Manufacturing | -20.43 (52.53) | 6.5498 (14.5557) | 4.4298 (22.2205) | -170.99 (69.20) * | |
Share Government Services | -134.98 (73.61) . | -76.6245 (20.3971) *** | 12.0148 (31.1380) | -200.24 (96.97) * | |
Share Private Serices | -37.08 (24.80) | -10.4525 (6.8732) | -0.5259 (10.4925) | -32.05 (32.68) | |
Highest Level of Education Achieved (adults 25+), 2014 | |||||
log(No High School) | 2024.03 (1920.08) | 279.8522 (532.0444) | -195.0302 (812.2122) | 4233.69 (2529.51) . | |
log(Some High School) | -6999.71 (3384.98) * | -3167.2643 (937.9610) *** | 671.2748 (1431.8794) | -7092.51 (4459.37) | |
log(High School Degree (or equivalent)) | -2917.65 (3353.82) | 25.3028 (929.3247) | -1497.4034 (1418.6954) | -4737.23 (4418.31) | |
log(Some College) | 8352.51 (4375.64) . | 3551.0292 (1212.4674) ** | 1774.2215 (1850.9374) | 9339.33 (5764.46) | |
log(Associate's Degree) | -10048.37 (2801.95) *** | 597.1776 (776.4056) | -3953.9051 (1185.2509) ** | -7509.96 (3691.28) * | |
log(Bachelor's Degree) | 4755.65 (4220.96) | -909.4645 (1169.6039) | 2133.1558 (1785.5025) | -8780.88 (5560.67) | |
log(Graduate Degree) | 4181.08 (2959.00) | -665.8564 (819.9227) | 485.3239 (1251.6837) | 11155.10 (3898.18) ** | |
Field of First Degree (adults 25+), 2014 | |||||
log(Computers, Mathematics, and Statistics) | 421.26 (1211.88) | -119.9034 (335.8046) | -199.9798 (512.6351) | 1248.55 (1596.52) | |
log(Biological, Agricultural, and Environmental Sciences) | -1756.63 (1531.86) | -612.4453 (424.4690) | -690.4857 (647.9890) | -2858.50 (2018.06) | |
log(Physical and Related Sciences) | 1045.99 (1200.38) | -638.4420 (332.6192) . | 857.8139 (507.7723) . | 1199.98 (1581.38) | |
log(Psychology) | 243.19 (1873.81) | -931.8314 (519.2233) . | 994.2070 (792.6397) | -5257.56 (2468.55) * | |
log(Social Sciences) | 4605.91 (1745.78) ** | 329.9595 (483.7464) | 1902.0308 (738.4811) * | -914.75 (2299.89) | |
log(Engineering) | 2457.10 (1085.62) * | 111.3277 (300.8191) | 1234.3531 (459.2267) ** | -510.82 (1430.19) | |
log(Business) | 5356.07 (2377.33) * | 927.3407 (658.7448) | 945.8070 (1005.6316) | 4549.24 (3131.89) | |
log(Education) | -4756.32 (1957.85) * | 647.8504 (542.5104) | -2158.8565 (828.1895) ** | -3104.16 (2579.27) | |
log(Literature and Languages) | -3463.93 (1537.64) * | 180.5665 (426.0729) | -1028.0055 (650.4375) | -602.91 (2025.69) | |
log(Liberal Arts and History) | 535.70 (1466.30) | 488.2000 (406.3039) | -871.4962 (620.2584) | 3029.51 (1931.70) | |
log(Visual and Performing Arts) | -3124.12 (1203.46) * | -0.8267 (333.4731) | -930.8792 (509.0758) . | 3327.06 (1585.44) * | |
log(Communication) | -1551.33 (1199.32) | -173.3479 (332.3240) | 149.1861 (507.3216) | 1967.96 (1579.98) | |
US Census Bureau Statistical Regions | |||||
Pacific | -1213.86 (2361.42) | -1465.0817 (654.3372) * | 667.3050 (998.9029) | -2055.17 (3110.93) | |
Mountain | -2504.93 (2221.79) | -1468.7108 (615.6455) * | -1.1665 (939.8367) | -6755.78 (2926.98) * | |
West North Central | 1361.57 (2354.76) | -1672.1755 (652.4905) * | 2093.4226 (996.0837) * | -6257.69 (3102.15) * | |
West South Central | -2141.37 (2114.9) | -1329.3158 (586.0282) * | 320.3698 (894.6232) | -4105.61 (2786.17) | |
East North Central | 2262.86 (1547.20) | -1441.7583 (428.7214) *** | 1823.5789 (654.4808) ** | -2999.10 (2038.28) | |
East South Central | -2060.53 (1707.01) | -716.2667 (473.0046) | 256.9544 (722.0829) | -2746.79 (2248.82) | |
Middle Atlantic and New England | 6159.42 (1817.83) *** | -6.0282 (503.7115) | 3453.5441 (768.9597) *** | -2022.78 (2394.81) | |
South Atlantic | NA | NA | NA | NA | |
Model Performance | Model 1 | Model 2 | Model 3 | Model 4 | |
Residual Standard Error | 5192 on 195 DF | 1439 on 195 DF | 2196 on 195 DF | 6840 on 195 DF | |
Multiple R2 | 0.6933 | 0.4642 | 0.5827 | 0.4412 | |
Adjusted R2 | 0.6414 | 0.3736 | 0.5121 | 0.3466 | |
F-statistic | 13.36 on 33 and 195 DF | 5.12 on 33 and 195 DF | 8.251 on 33 and 195 DF | 4.666 on 33 and 195 DF | |
p-value | < 2.2e-16 | 1.41E-13 | < 2.2e-16 | 3.71E-12 | |
Significance Codes: p < 0.1 ‘.’; p < 0.05 ‘*’; p < 0.01 ‘**’; p < 0.001 ‘***’ |
Effect of Demographic Factors
The model estimates for each of the demographic variables failed to reach statistical significance in models 2 and 3. However, for model 4 a larger share of the population that identified as African American was moderately associated with a higher median household income in the upper income tier per metro. Also, a larger share of the population that identified as Hispanic was significantly associated with a higher median household income in the upper income tier per metro.
Effect of Unemployment
As discussed in my previous post, unemployment had a significant and negative effect on median household income in general. Yet, when you break down median household income into income tiers, unemployment doesn't have a statistically significant effect.
Effect of the share of GDP from Manufacturing, Government Services, and Private Services
The share of GDP from each of these sectors failed to have a statistically significant effect on median household income in the middle income tier; however, a larger share of GDP from government services had a significant and negative association with median household income in the lower income tier. According to model 2, holding all variables constant a one unit increase in the share of GDP from government services should result in a decrease of about $76 in median household income in the lower income tier -- hardly a substantial decline, but a decline nonetheless.
Furthermore, both the share of GDP from manufacturing and from government services had a negative association with median household income in the upper income tier. A one unit increase in the share of GDP from manufacturing, according to model 4, should result in a decrease of about $170 in median household income in the upper income tier, all else being equal. Also, a one unit increase in the share of GDP from government services, according to model 4, should result in a decrease of about $200 in median household income in the upper income tier, all else being equal.
Furthermore, both the share of GDP from manufacturing and from government services had a negative association with median household income in the upper income tier. A one unit increase in the share of GDP from manufacturing, according to model 4, should result in a decrease of about $170 in median household income in the upper income tier, all else being equal. Also, a one unit increase in the share of GDP from government services, according to model 4, should result in a decrease of about $200 in median household income in the upper income tier, all else being equal.
Effect of Educational Attainment
Before I get into the results for these variables, I should take a second to point out that these variables have been modified via natural logarithmic transformation. This is technically a form of data manipulation, but it's a justifiable one that is often employed by statisticians and other researchers. Let me explain.Unlike the demographic and GDP variables, values for educational attainment, and also field of study, equal the total number of adults aged 25+ per metro area that attained a given level of education and majored in a given field for their first degree. This presents some obvious challenges if we want to estimate a linear regression model (which is what OLS is) because the absolute number of adults with these educational characteristics per metro will vary substantially as a result of the substantial variance in the size of the total population of each metro. Taking the natural log of the number of adults with these educational characteristics (for example, the number of adults with only some high school education) is a useful way to "smooth" things out a bit so that we can take an otherwise uninterpretable relationship (see Figure 7) and turn it into a more meaningful one (see Figure 8).
Now, on with the analysis...
For model 2, among the variables that capture people's highest level of educational attainment (measured for adults aged 25+), only the number of adults with some high school and some college had a statistically significant effect on median household income in the lower income tier. According to model 2, a one unit increase in the natural log of adults aged 25+ with only some high school should result in about a $3,176 decrease in median household income in the lower income tier, holding everything else constant. Furthermore, a one unit increase in the natural log of adults aged 25+ with only some college, according to this model, should result in about a $3,551 increase in median household income in the lower income tier, holding all else equal.
For model 3, only the number of people aged 25+ with an associate's degree had an effect on median household income in the middle income tier. According to the model, a one unit increase in the natural log of the number of adults aged 25+ per metro with only an associate's degree should result in a $3,954 decline in median household income in the middle income tier, all else being equal.
Finally, for model 4, the more people without a high school degree (interestingly) was associated with a higher median household income in the upper income tier; although, its effect was only moderately significant. Furthermore, more people with an associate's degree was associated with a lower median household income in the upper income tier, and more people with a graduate degree was associated with a higher median household income in the upper income tier. The results from model 4 suggest that a one unit increase in the natural log of the adult population aged 25+ with an associate's degree for a given metro should result in a $7,510 decline in median household income in the upper income tier, all else being equal. Also, a one unit increase in the natural log of the adult population with a graduate degree should, according to model 4, result in a $11,155 increase in median household income in the upper income tier.
Effect of Field of Study
Among the factors associated with field of study, model 2 none had a statistically significant on median household income in the lower income tier. However, two had a moderately significant effect: 1) the number of people whose first degree was earned in a physical or related science and 2) the number of people whose first degree was earned in psychology. The natural log of both had a moderately significant, negative effect on median household income in the lower income tier.As for model 3, more people who earned their fist degree in one of the social sciences and more people who earned their first degree in engineering was associated with a higher median household income in the middle income tier. Everything else being equal, one unit increase in the natural log of the former, according to the model, should result in a $1,902 increase in median household income in the middle income tier, and one unit increase in the natural log of the latter should result in a $1,234 increase in median household income in the middle income tier.
In addition, more people who earned their first degree in education was associated with a lower median household income in the middle income tier. The results from model 3 suggest that one unit increase in the natural log of the adult population aged 25+ should result in a $2,159 decrease in median household income in the middle income tier, all else being equal.
Furthermore, the natural log of the number of people per metro who earned their fist degree in a physical or related science, and the natural log of the number of people per metro who earned their first degree in the visual and performing arts had an effect on median household income in the middle income tier; however, their affect was only moderately significant. The former was associated with a larger median household income in the middle income tier while the latter was associated with a smaller median household income.
Finally, for model 4, the natural log of the number of people who earned their first degree in psychology and who earned their first degree in the visual and performing arts had a statistically significant association with median household income in the upper income tier. According to the model, one unit increase in the former should result in $5,258 decline in median household income in the upper income tier, all else being equal, and one unit increase in the latter should result in a $3,327 increase in median household income in the upper income tier.
Regional Affects
They pretty much speak for themselves.
Summary and Conclusion
What should we make of these results? First off, while regression analysis typically assumes that model estimates reflect a certain direction of causation (i.e., independent variable x has b effect on dependent variable y), this assumption doesn't strictly mean that the results from a given regression model tell us something "true" about causality in reality. Regression is one of many forms of analysis that allow researchers to support theoretical expectations with empirical evidence (or rather to attempt to disconfirm theoretical expectations with empirical evidence). Thus, it is best practice to inform one's interpretations of the meaning of results obtained via statistical analysis with the theoretical and empirical work of other researchers. Doing so can allow one to think critically, and in an informed way, about the plausibility of variable x having an effect on variable y, and about the possible and likely explanations that might account for variable x's effect on variable y.Such an in depth review of the work of other researchers, however, is beyond the scope of this post (and I'm sure I've bored you enough with my findings thus far!). Instead, I'll focus on the following main points:
- Once again, the connection between educational attainment and income is worth paying attention to. In short, my results suggest that more people with graduate degrees has the most pronounced and positive affect on median household income -- we often hear people say that the graduate degree is the new bachelor's degree, and plenty of research shows better employment outcomes for people with graduate degrees, so this finding makes sense.
- More people who earned their first degree in a social science or in engineering was associated with higher median household income in the middle income tier. This finding doesn't necessary mean that people who earned degrees in these fields earn more money, but it could certainly be possible.
- Also, the finding that more people who earned their first degree in the visual and performing arts was associated with higher median household income in the upper income tier is interesting, but I'm not convinced that this finding means that being a visual or performing artist is the ticket to the good life. This finding might actually indicate that greater affluence among the affluent is associated with greater demand for the visual and performing arts and other sorts of cultural activities. It further should be pointed out that more people with visual and performing arts degrees was associated with a lower median household income in general.
- The negative relationship between the share of GDP from government services and median household income is interesting and may potentially reflect lower wages made by many government employees. But my analysis doesn't allow me to say one way or the other if this is the case.
- If you're looking to make more money, don't use the results from the regional variables as a guide for deciding where to live. One of the limitations of these results is that I didn't control for cost of living. So, just because somewhere like the Middle Atlantic and New England are associated with higher median household income, consider whether the cost of living makes this raise worth it.
That's it for this time! If you have a suggestion for what I should analyze next, let me know.
Comments
Post a Comment