Week 2: Histograms

Computational Homework

Every year in POS 5737, we have a histogram contest. I’ll give you a chance up improve your submissions after feedback. Then we have a department-wide vote.

Clark and Golder (2006) write the following:

These strategic considerations indicate that electoral institutions modify the relationship between socioeconomic cleavages and the number of parties. In particular, this framework indicates that there are two reasons why a country might have a small number of parties. First, it could be the case that the demand for parties is low because there are few social cleavages. In this situation, there would be few parties whether the electoral institutions were permissive or not. Second, it could be the case that the electoral system is not permissive. In this situation, there would be a small number of parties even if the demand for political parties were high. Only a polity characterized by both a high degree of social heterogeneity and a highly permissive electoral system is expected to produce a large number of parties. This line of reasoning generates the following hypothesis: Social heterogeneity increases the number of parties, but only when electoral institutions are sufficiently permissive.

We have the following data (from the authors) to evaluate this claim:

# load packages
library(tidyverse)

# load data
cg <- read_rds("data/parties.rds")

# quick look
glimpse(cg)

## Rows: 555
## Columns: 10
## $ country              <chr> "Albania", "Albania", "Albania", "Argentina", "Ar…
## $ year                 <dbl> 1992, 1996, 1997, 1946, 1951, 1954, 1958, 1960, 1…
## $ average_magnitude    <dbl> 1.00, 1.00, 1.00, 10.53, 10.53, 4.56, 8.13, 4.17,…
## $ eneg                 <dbl> 1.106929, 1.106929, 1.106929, 1.342102, 1.342102,…
## $ enep                 <dbl> 2.190, 2.785, 2.870, 5.750, 1.970, 1.930, 2.885, …
## $ upper_tier           <dbl> 28.57, 17.86, 25.80, 0.00, 0.00, 0.00, 0.00, 0.00…
## $ en_pres              <dbl> 0.00, 0.00, 0.00, 2.09, 1.96, 1.96, 2.65, 2.65, 3…
## $ proximity            <dbl> 0.00, 0.00, 0.00, 1.00, 1.00, 0.20, 1.00, 0.20, 1…
## $ social_heterogeneity <fct> Bottom 3rd of ENEG, Bottom 3rd of ENEG, Bottom 3r…
## $ electoral_system     <fct> Single-Member District, Single-Member District, S…

Do the following:

Prepare.
1. Open RStudio and start a new RStudio Project called hw02/ and initialize git.
2. Open GitHub Desktop, add the local repo hw02, and publish the initial files up to GitHub as a part of the pos5737 organization, but naming the repo hw02-first-last (not HW02-First-Last or HW02_First_Last; precision matters).
3. Create a R/ subdirectory of hw02/.
4. Open a new R script and save it as R/histogram-exercises.R.
Read the chapter “Histograms in R” from the notes (currently ch. 4) and do the review exercises throughout. Answer an conceptual questions using a well-formatted comment. Use comments and whitespace throughout to make the code readable. You can find a popular style guide here.
Read enough of Clark and Golder (2006) to understand (1) their theoretical model explaining ENEP and (2) their conclusions based on the data. Do not worry about all the details of the regression model. Develop a strategy to evaluate their (i.e., Duverger’s) theoretical model explaining ENEP using several histograms.
Prepare.
1. Create a data/ subdirectory of hw02/.
2. Go to the data page and download the data set parties.rds. Move (or save) parties.rds to hw02/data/. The data page has a link to the codebook.
Write. Open a new R script and save it as R/duverger.R. In this script, do the following:
1. Load the data. I find it helpful to use glimpse() to quickly create an overview of the data.
2. Use ggplot2 to create histogram(s) (or variants, such as density plots or beeswarm plots) to evaluate Clark and Golder’s argument. You can see several geoms in this cheatsheet. If you’re not sure if it’s a variant of a histogram, just ask. Try to do this with a single figure (probably with facets). Experiment to find an approach that clearly shows the patterns. Experiment with color, fill, alpha, and facets. Choose your theme carefully. Label the plot nicely.
3. Save the plot in a publication-quality format (.pdf preferred, otherwise .tiff). Eventually, I’ll print this off and tape it to my door for everyone to see. (I like to save my plots to a figs/ subdirectory.)
4. In a document, briefly summarize Clark and Golder’s argument and causal claim (i.e., less than 1/2 a page) and briefly evaluate the evidence your histogram offers for that claim (i.e., less than 1/2 a page). You may use Word, LaTeX, R Markdown, or something else, but make sure to include a PDF of the document.
Finalize. Add a helpful README to your project that renders nicely on GitHub. Your audience is me, the TA, and your classmates, so you can assume your reader is familiar with the assignment. (If you want to include your histogram in the README, that’s awesome, just create a separate .png version to use here.)

Week 2: Histograms

Conceptual Homework

Computational Homework