TexasPrimary2016: Economic Data - Part 1

Economic data behind the 2016 Texas presidential primary.

4 minute read Published: 21 Feb, 2017

Here is a look at some of my exploratory graphs from my analysis of the 2016 Texas Presidential Primary economic data. The economic data comes from FRED (Federal Reserve Economic Data) and is actually from 2014.

In this post, I’ll focus on plotting income data against population and unemployment data. In future posts I’ll explore the longer timeline of economic data - stretching back to the 1990’s.

Examining the Data

If you want to know more about the dataset used in this post, you can jump down to the About section at the bottom fo the page. But here is how the data looks:

FRED

## # A tibble: 254 × 5
##    CountyName   UnRate ResPop  PCPI  EMHI
##         <chr>    <dbl>  <dbl> <dbl> <dbl>
## 1    anderson 4.641667 57.742 31815 42471
## 2     andrews 2.983333 17.457 54928 66878
## 3    angelina 5.216667 87.875 37132 42128
## 4     aransas 5.541667 24.933 43292 44551
## 5      archer 4.508333  8.808 52139 58867
## 6   armstrong 3.158333  1.948 43290 55177
## 7    atascosa 4.925000 47.812 37172 52785
## 8      austin 4.750000 28.994 46741 58792
## 9      bailey 5.025000  6.960 44960 36821
## 10    bandera 4.766667 20.916 40868 50107
## # ... with 244 more rows

For this dataset, each column is a variable from 2014.

CountyName = Texas county (lower-case name)
UnRate = Unemployment Rate (%)
ResPop = Residential Population (1,000 persons)
PCPI = Per Capita Personal Income ($)
EMHI = Estimated Median Household Income ($)

Plotting the Data

The code used to generate the plots below comes from the ~/Code/FRED_plotData.R file.

Boxplots

Basic

An unfaceted look at each variable.

NOTE: I added a log scale to the y-axis in the population boxplot below. This is because many of Texas’ counties have small populations (min = 87 in Loving County) while only a handful or so have very large populations (max = 4,447,577 in Harris County).

Take-aways:

We can see that household incomes (EMHI) are higher than per capita incomes (PCPI).
PCPI has a larger outlier than EMHI, perhaps because PCPI is a per capita (average) variable while EMHI is median variable.

With Cuts

Cutting and grouping the population and unemployment data into nearly equal intervals (~32 counties each interval).

Take-aways:

First plot (population vs income)
- Very large counties have higher median EMHI (household income) than small to large counties.
- Small to medium size counties tend to have larger median PCPI (personal income) than any other size county. For instance, Hansford County has a PCPI of $75,035 but a small population of 5,537 people.
- The above two observations could be due to lower employment:population ratios in rural areas (making PCPI higher for rural areas) and a higher a number of employed persons per household in urban areas (making EMHI higher for urban areas).
Second plot (unemployment vs income)
- There is an (obvious) inverse relationship between unemployment and income:
- The higher the unemployment rate, the lower the income.
- The lower the unemployment rate, the higher the income.

Another view

In the two sets of plots below, I’ve changed the groupings into facets - so each boxplot gets its own area.

And flipping the facets in facet_grid() produces an interesting view, as well.

Density

Basic

I’m going to leave out histograms for now and focus on histograms. An unfaceted look at each variable:

Take-aways:

Both income variables are positively skewed (lean toward the left).

With Cuts

I’ll take the cutting procedure from the boxplots and apply it to the density plots. This time though I’ll add two lines:

median (solid)
mean (dashed)

Take-away:

In the largest 32 counties, EMHI values are spread wide (kurtically flat) and they are weakly distributed more toward lower values (positively skewed), for the most part.

Other views

Stacked

Take-away:

You can see a shift in the peaks of the various ranges. This is kind of like seeing a shift in the median of the ranges.

Filled

Take-away:

The tail-end distributions stand out more when using the filled density plot.

About the Data

The data used in this post is a merger of two data sets:

FRED_2012 = contains all income, population, unemployment, and education data
FRED_2014 = contains all income, population, and unemployment data; does NOT contain education data

I ended up merging the 2012 education data with the 2014 data.=, but did not use it for this post.

Getting the Data

Raw Data

Source files can be found in:

~/Code/Fred_getdata.R
~/Code/Fred_getdata2.R
~/Code/Fred_getdata3.R
~/Code/Fred_mergedata.R

You can download the .RData files from here and here.

Filtered Data

Here is how to merge the two data sets.

# Filter Data -------------------------------------------------------------

.FRED <- FRED_2014 %>%
  select(CountyName,
         UnRate,
         ResPop,
         PCPI,
         BachDegree,
         HSDegree,
         EMHI)
.FRED$BachDegree <- FRED_2012$BachDegree  # replace 2014 education data w/ 2012
.FRED$HSDegree   <- FRED_2012$HSDegree    # replace 2014 education data w/ 2012

Working Data Frame

And to keep things simple, for this post, I removed the education data.

# Filter Out Education Data -----------------------------------------------

FRED <- .FRED %>% select(-BachDegree, -HSDegree)

Published by Luke Smith 21 Feb, 2017 and tagged election analysis, political analysis, projects and r using 817 words.