There are two versions of this file: an html and a pdf. If you need to print the document, then the pdf works a little better.
If you find a problem with this assignment, then please let me know, especially if the instructions don’t work or you find them confusing. This is important. This assignment can be particularly frustrating, so if the instructions are unclear, inaccurate, or incomplete, let me know so that I can update them and save others some frustration.
In this homework assignment, you will install a bunch of software, create a couple of accounts, and make sure things work. You don’t need to understand everything that’s happening. That’s expected and okay. You’ll learn more about these tools as we move through the semester.
For the conceptual homework, you need to read several things to read complete a few exercises. For this homework, we’re trying to work through many simple ideas, so keep your work concise–only dwell on ideas that you find unfamiliar or challenging. Keep your answers brief, especially for ideas that you find easy or those that are already familiar.
The computational assignment is tedious, but not difficult. This is an especially tedious assignment, so please start it early. This will give you several opportunities get stuck and seek help.
The readings this week give us enough background in research design to proceed with data analysis (while you continue to learn research design in 5736).
Read the following:
Exercises
This chapter gives us some basic terms to help us think about data (e.g., “cross-sectional,” “panel,” “observational”).
Exercises
This is an important article for us. It describes how to organize your data, which helps us in two ways. First, if you collect your own data, it gives you good advice for structuring your spreadsheet. Second, if we’re “just” cleaning up raw data into a usable form, this gives us a goal. Given that we want to share the data we use, we want our data in this form. Throughout this class, I’m going to insist that we follow their advice–so become familiar with it.
Exercise Broman and Woo give several rules. List the three that were most unexpected and briefly summarize why they might be important.
Exercises Do the assigned exercises FPP (only those from chs. 1-2). Answer briefly–these questions don’t require an in-depth discussion.
Is Simple Description Useful?
Exercises
Exercise For computational research, we have manually-edited spreadsheet cells and point-and-click analysis on one extreme. On the other extreme, we have computer scripts that manipulate the raw data into the tables and figures. What is the advantage of working near the latter extreme? How does git and GitHub facility this work?
Most social scientists are computational scientists in the sense that (1) they use computers to perform relevant computations and (2) share their code and data with others. Here are the tools I use (with macOS, see the italics for Windows alternatives where needed):
I feel pretty confident that these are the best tools for macOS, but you might want to consult with more senior graduate students for advice about Windows.
Note: In Part 1, you install a bunch of software and start your first GitHub repository. Keep using this same repository in Parts 2 and 3.
At some point in this assignment, find something that you don’t completely understand. Come to my office and talk to me about it!
You can consider this part of your computational homework for the week.
Register a GitHub account. Read the advice from Jenny Bryan about choosing a username and then use your .edu e-mail address to sign up for an account on github.com. You can associate multiple e-mail accounts with GitHub, so feel free to add in the primary account that you have permanent control over (e.g., your gmail account).
Apply for a Student Developer Pack. This gives you private repos for free.
Follow me on GitHub. I’m carlislerainey. Follow harleyroe and qwang1015, our TAs, as well. Once you follow me, I’ll add you to our POS 5737 private GitHub Organization.
Browse some repos on GitHub. See an ongoing project of mine of mine, a finished project of mine, and Josh Alley’s cool dissertation project (that I made a small contribution to). Remember that most GitHub repos contain software, not research projects. For examples, see the Linux OS, my website, and the ggplot2 R package.
To make things a little easier throughout the process, we want to do four things:
For reasons I don’t understand, neither macOS nor Windows shows file
extensions by default. (That’s the .docx
bit of the Word
file Essay for English 1101.docx
.)
On Windows, see the instructions here.
On macOS, open Finder. Click Finder > Preferences… Select Advanced tab and check the box to Show all filename extensions.
On Windows 7/8/10, you don’t need to do anything. When you need a command line at a folder, simply hold down the shift key and right-click a folder. The context menu contains an entry, Open command window here. (Later, once you’ve installed Git for Windows, you’ll have the option to open a Git Bash terminal as well.)
On macOS, make it easy to open a terminal in a specific directory by enabling New Terminal at Folder. Open Finder. Click Finder > Services > Services Preferences…. In the right box, navigate past Pictures, Messaging, and Development to Files and Folders. Check the box for New Terminal at Folder. Now right-click on a folder and you should see the option (near the bottom) to open a new terminal at that folder.
This is for macOS only.
The command prompt on macOS doesn’t look nice by default.
You probably don’t have the file ~/.bash_profile
, but
you need to check. Open a terminal and run the command
open -a TextEdit ~/.bash_profile
. If you don’t have it,
you’ll get a message saying so. If you already have it, just add the
line export PS1="\W > "
to the file on a new, separate
line.
If you don’t have it, then you need to create it. Run
echo 'export PS1="\W > "' > ~/.bash_profile
. Open a
new terminal. The prompt should look nicer.
Install (or Update) R. Choose the appropriate OS from CRAN and follow the instructions. On Windows, you get two versions of R: i386 and x64. This is normal. These are 32- and 64-bit versions We only use R indirectly through RStudio, so we never choose between the two. (But both work!)
Install (or Update) RStudio. You may choose either
the preview
version or the latest stable
version. I use the preview version. The lab computers
have the latest stable version. [edit: The lab is currently under
construction.] The preview version has new features; the latest stable
version is (I suppose) more robust. Choose a version, select your OS,
and follow the instructions.
If you already have R installed, update your
packages. Just open R or RStudio and run
update.packages(ask = FALSE, checkBuilt = TRUE)
.
Adjust One (Bad) Default. Click Tools > Global Options…. Select General. Under Workspace, set Save workspace to .RData on exit: to Never. Uncheck the box for Restore .RData into workspace at startup.
Install the tidyverse package. In the lower-right
pane, click the Packages tab to show the Packages
window. You see a list of available packages. Click the Install
button at the top of the Packages window, type “tidyverse” in
the middle box, and click Install. Make sure that tidyverse
installs successfully by entering the command
library(tidyverse)
in the console in the lower-left
pane.
Comment: Other instructors might suggest the equivalent approach of
entering the command install.packages("tidyverse")
in the
console, but I recommend the point-and-click method. It’s easier and
packages only need to be installed once per computer.
Comment: R packages allow software developers to distribute additional functionality to users. For example, you’ll use the ggplot2 package to create graphs. Hadley Wickham wrote ggplot2 as part of his dissertation in the statistics department at Iowa State. That was version 0.0.7 and we’re now on version 3.0.0. I’m excited for the next release because of this annoying bug.
Comment: When you run library(tidyverse)
, you get the
output below, which is both expected and desirable.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Run an R script. To start a new R script, open RStudio and click File > New File > R Script. This opens a blank R script in the upper-left pane. Copy-and-paste the code below into the R script. Use your mouse to highlight all eight lines (or press command + a) and then click Run (or press command + enter) at the upper-right of the script window. Make sure a scatterplot appears in the Plot tab of the lower-right pane. If so, you’ve got R and RStudio working.
# simple example script
# load packages
library(tidyverse)
# create two vectors of data
x <- c(2, 3, 5, 4, 1) # x "gets" a "collection" of 5 numbers
y <- c(4, 2, 1, 5, 6) # y "gets" a "collection" of 5 numbers
# plot x and y
qplot(x, y)
If you already have a favorite, that’s fine. Use it.
As RStudio develops, I find that I need another text editor rarely, if ever. In case, here are the one’s I recommend.
For Windows, I recommend downloading Sublime Text 3. Follow the instructions.
For macOS, I recommend downloading Atom. Follow the instructions.
Install git. For macOS, open a
terminal and run xcode-select --install
. This installs
Xcode command line tools, which includes git. (Note that you can also
install all of Xcode, but all of Xcode is a much larger installation.
You only need Xcode command line tools.) For Windows,
download Git for Windows and
follow the instructions.
Install GitHub Desktop. Download GitHub Desktop and follow the instructions.
Comment: There are three ways you can interact with git: (1) command line, (2) RStudio, and (3) a client like GitHub Desktop. RStudio gives you an easy, but limited, way to interact with git. The command line is complete, but complex. GitHub Desktop is the best of both worlds. It’s complete and easy-to-use.
Introduce yourself to git. (Only do this on your
personal computer.) At some point, you’ll want to use git from the
terminal/Git Bash. Go ahead and let git know who you are. In a terminal
(macOS) or Git Bash (Windows), run
git config --global user.name 'Jane Doe'
and
git config --global user.email 'jdoe@fsu.edu'
–be sure to
use the e-mail associated with your GitHub account. Run
git config --global --list
to verify the settings.
Associate your text editor with git. For
macOS, run
git config --global core.editor "atom --wait"
in a
terminal. For Windows, run
git config --global core.editor "'c:/program files/sublime text 3/subl.exe' -w"
in Git Bash.
Comment: When using git from the command line, it sometimes pops open a text editor. By default, git uses vim. Vim is a strange, scary place to wind up. If you find yourself in vim, you might never escape. Avoid that possibility entirely by associating a friendly text editor with git.
There are three similarly-named things here:
Open GitHub Desktop. Click File, New Repository….
Name the repo test-github
and set the local path to your
computer’s desktop. Initialize the repo with a README. Click the blue
Create repository button.
This first repo is just for practice–you’ll delete it later.
Click the button Publish repository at the top. Supply your GitHub username and password. Again, lick the button Publish repository. (Note that the local directory and the GitHub repo do not necessarily share their name, but I recommend you keep them the same.) Click the blue Publish repository button.
Comment: In my work, I associate one repo with a single research project (i.e., usually one manuscript). See my repos. Until later in the semester, I’ll instruct you to create a separate repo for each homework assignment (i.e., a new repo each week). You’ll create another repo for the research project.
When students first start to develop projects with git and GitHub, I recommend a change-review-commit-push model. Choose a manageable change you’d like to make to the project (like “Rewrite the introduction”), then make, review, commit, and push the change. Git and GitHub are extremely complicated, but powerful tools. Don’t jump all in on git and GitHub immediately (and maybe not ever). Keep it simple for a while with change-review-commit-push.
First, edit the file README.md
locally.
You should find a new folder test-github
on the Desktop.
Open the file README.md
with Atom (macOS),
Sublime Text 3 (Windows), or RStudio (either OS). Make
some change to the file and save it.
Comment: GitHub treats the file README.md
specially–it
displays it on the repo’s main page. I have a detailed
README.md
for my latent-dissent
project, for example. Because README.md
is a Markdown
file, you can use Markdown
syntax to style the document.
Second, review your change. Go back to GitHub
Desktop, which shows you a list of changed files. This list should
include README.md
because you changed it. This list should
only include README.md
because it’s the only file
you changed (it’s also the only file in the repo). If you click the
entry for README.md
, you can see the changes you made to
that file.
Third, commit your change. (Remember, a commit is like a snapshot of your files.) Once you’re happy with the changes, you want to commit those changes. Ideally, a commit is a three step process:
Comment: Committing is like taking a snapshot of the directory. You
already have one snapshot from when you initially created the repo on
GitHub. When you made a change to README.md
, you took
another snapshot. This allows you to return to these exact files later,
as needed. For an example, look at the commits
for my recent paper in Political Analysis.
Fourth, push these changes up to GitHub. Click Push origin in upper-right corner. This simply passes the update along to GitHub, which safely stores it for backup, browsing, and sharing.
Comment: You’ve got three things going on now–keep them separate in
your mind. You have the local files that you see on your
computer. You have a local history of the project (each commit)
stored in the file .git
. You have a copy of the
.git
file hosted on GitHub for safe storage and others to
access.
Comment: If multiple users are pushing commits to the same repo on GitHub, then you need a workflow that pulls changes down from GitHub in addition to pushing them up. There are lots of options, but you need a strategy. It should involve pulling down and pushing up changes often.
Lastly, confirm the local change propagated to the GitHub remote. In a web browser, navigate to your repo on GitHub. See that the changes you made to the README are there.
Make another change. Make a local change and save it, then stage it, commit it, and push it. Maybe try adding a file.
Delete. When you are done, delete your local directory and the GitHub repo. You can do this from GitHub Desktop by clicking Repository > Remove. Check the box Also move to trash and click Remove. This stops GitHub Desktop from monitoring the local directory and moves the local directory to the trash.
If you used a community computer, log out of GitHub Desktop when you are done. Be sure to both sign out of your GitHub.com account and remove your Name and Email from the Git tab. This keeps others from making changes as you (bad) and pushing to GitHub as you (worse).
(Only do this on your personal computer!). Follow these instructions.
For macOS, I recommend MacTeX. Download and follow the instructions. It includes the following:
For Windows, I recommend MiKTeX, TeXstudio, and JabRef. Download and follow the instructions. Hopefully, the installer asks you if you would like MiKTeX to install missing packages on-the-fly. Choose Yes rather than Ask Me. (See here for the details). In this setup, you have the following:
Open RStudio. Click File > New File >
Text File. Save as latex-test.tex
. Create a new
folder on the Desktop and save it there. Copy-and-paste the LaTeX code
below into latex-test.tex
. Click Compile PDF.
Check that this creates the pdf you expect. Delete the pdf file.
% a minimal latex document
\documentclass{article}
\begin{document}
First document. If this compiles into a pdf, then LaTeX seems to work.
\end{document}
Open latex-test.tex
with your preferred editor.
Right-click on latex-test.tex
, select Open with,
and choose your editor TeXShop (macOS) or TeXstudio
(Windows). In TeXShop, click the Typeset
button in the upper-left corner to compile into a pdf. In TeXstudio,
click the green play button. Check that this creates the pdf you expect.
Delete the pdf file.
At times, you might find it more convenient to develop
.tex
files from a dedicated editor like TeXShop for its
features. Other times, you might choose to develop .tex
files in RStudio to use a single interface.
Start by fixing one bad default. Open RStudio. Click Tools > Global Options…. Select the R Markdown tab on the left. Near the middle, change Evaluate chunks in directory: to Project. (This makes R Markdown document code chunks behave the same way as R code run in the terminal.)
Now try out R Markdown. Open RStudio. Click File > New File > R Markdown. Install any suggested packages. Fill in the Title and Author boxes however you like. Make sure the HTML bubble is selected (the default). This initiates an R Markdown document that serves as a helpful template.
Notice that the R Markdown file contains a mix of R code and Markdown syntax. Don’t sweat the details.
Save the file as rmarkdown-test.Rmd
to the Desktop.
Click the Knit button to knit the .Rmd
into an
.html
file, which you can view with a web browser.
Check that you can also knit to a pdf (and maybe Word) document by clicking the tiny triangle next to the Knit button and then clicking Knit to PDF or Knit to Word. (If you don’t have Word, this won’t work. Don’t worry about it. You should be able to knit to pdf, because you have LaTeX working.)
Sometimes, on Windows, R Markdown and MiKTeX will not work together properly. If R Markdown fails to generate the pdf, then let MiKTeX install missing LaTeX packages automatically. This might fix the problem.
When you are sure that R Markdown works for you, delete
rmarkdown-test.Rmd
from the Desktop and all the associated
files that knitr generated.
The steps below make up your graded homework for the week and allow you to see how we use R, LaTeX, GNU Make, and git to form a coherent, reliable, reproducible workflow.
Open RStudio. Click File > New Project. Then
click New Directory and New Project. Check the boxes
for Create git repository and Open in new session.
Name the directory hw01
. Create the project as a
subdirectory of wherever you like on your computer (perhaps as a
subdirectory of pos5737/homework/
for example).
You might find it helpful to review this summary of R projects to understand their role.
Open hw01
in GitHub Desktop by clicking File
> Add local repository and selecting hw01
.
You’ve just initialized everything, so type “Initialize project” in the
summary. Stage both .gitignore
and hw01.Rproj
by checking the boxes next to those files. Commit these changes by
clicking the blue Commit to master button near the bottom.
That’s your first commit. Yay!
Click the Publish to GitHub button in the upper-right
corner. In the Name box, type hw01-jane-doe
,
replacing jane
with your first name and doe
with your last name. Check the box to Keep this code private.
For organization, choose pos5737. (Note that if I haven’t added you to
the pos5737 organization on GitHub, you won’t have this option yet. If
that’s so, then please wait until I add you–perhaps send me a reminder
on Slack.) Click the blue Publish Repository button.
You’ve just created a repo on GitHub and pushed your local files there. You should be able to find your two files on GitHub now. Yay!
In RStudio, with the homework project open, click File >
New File > R Script. Copy-and-paste the script
below into the file and click the save button. Save the file as
create-plot.R
in hw01
. File names must
be exactly right. Run the code (in pieces if you want to see
how it works).
# create-plot.R: minimal R code to make a plot
# load packages
library(tidyverse)
# create variables x and y
x <- c(1, 2, 4, 6, 3)
y <- c(6, 2, 3, 1, 4)
# plot x and y
qplot(x, y)
# save plot in png format
ggsave("plot.png", height = 3, width = 4)
Remember that our workflow is change-review-commit-push. You just made a nice change, so open GitHub Desktop back up and review-commit-push. Include a nice commit message.
In RStudio, with the homework project open, click File >
New File > then Text File. Copy-and-paste the LaTeX
document below into the file and click the save button. Save the file as
doc.tex
to hw01
.
% doc.tex: a code to include a plot
\documentclass{article}
\usepackage{graphicx} % useful for including graphics
\begin{document}
You should see the figure below.
% add the plot
\includegraphics[width = 4in]{plot.png}
\end{document}
You can press the Compile PDF button if you like. If you ran
the file create-plot.R
in the previous step, it should
compile into a pdf with a scatterplot.
Remember that our workflow is change-review-commit-push. You just made a nice change, so open GitHub Desktop back up and review-commit-push. Include a nice commit message.
Do something interesting with the R script. Use Google to find something cool. Find an interesting R package and use a simple example from the help file. It doesn’t need to be fancy, just try something. Feel free to borrow heavily from someone else’s work you find online, just explain where you borrowed from and explain what’s happening using comments in the R code and prose in the LaTeX document (as best you understand it).
I should be clear that I assume you know nothing about R code at this point. You should just search for some examples on Google, try it, and hope it works. You might not understand what’s happening. I just want you to experiment a bit.
Hint: Look through some of the geoms on the ggplot2 webpage
(e.g., geom_point()
). Try to run some of the example code
there.
Or try this site.
If you break something beyond repair, you can (permanently) go back to your most recent commit. Close any open files in the project, open GitHub Desktop, right-click a file you’ve changed, and click Discard All Changes… (If it all goes to heck, then hop on Slack and I can help you out.)
Once you’ve done something interesting, clean and rebuild a final time to make sure it works properly. That’s the change. Now just review-commit-push.
Find your repo on GitHub and check that your files appear as you expect. Click the commits tab. Browse the previous versions of the project.
Make sure you read Broman and Woo (2018) before staring this portion. You must follow the advice of Broman and Woo (2018). Make sure you create both the data set and the data dictionary (perhaps as separate sheets in the same file).
Create a simple data set using a spreadsheet program (like Microsoft Excel or Google Sheets). You may name the file what you like, but short, descriptive names work best.
Meet as many of the following criteria as you can.
country_name
(character),
number_of_electoral_districts
(numeric),
legislature_size
(numeric), elected_executive
(Yes/No), and date_regime_established
(date). If I were
doing this exercise, I would use Wikipedia as the source. You could also
do US states with state_name
(character),
percent_uninsured
(numeric), expanded_medicaid
(Yes/No), and date_expanded_medicaid
(date).For this exercise, I want you to learn (by doing) what a well-formatted data set looks like. Keep that goal in mind.
If you use Excel, then save the .xlsx
file to hw01
. Remember to change-review-commit-push as you
work. Unfortunately, GitHub doesn’t render .xlsx
files
well, so it’s not easy to explore the history, but at least you have
it.
If you use Google Sheets, then click the
Share button on the top-right and then click Get shareable
link. Copy that link to the Google Sheet and put somewhere easily
findable in your project (README.md
is the best choice,
IMO). You can’t use git and GitHub for version control with Google
Sheets, but you can click File > Version history
> Name current version to take snapshots of your sheet at
particular points. You can go back to any named version later.
When you’ve completed the data set and data dictionary, export a
.csv
file of the data set and data dictionary and save them
to hw01
. Again, name the files well. This is an important
change, so definitely
review-commit-push at this point.
Find a raw data set that is common in your field.
The data sets above are only examples. However, it’s helpful to a data set that is (1) common in the field, (2) you will use in the future, and (3) other people in the class are using. I’ll ask you to apply some of the tools we learn later this semester to this data set.
You might find it helpful to review this chapter from the notes before/as you work.
hw01/data/raw/
). Add a thorough note to the
README.md
describing exactly how and when you
downloaded the data set(s).You may adjust the default arguments in the function that reads the data set, but do not alter the data set (yet) after loading it into R. Do not manually "fix" the raw data. It might read into R as total garbage--that's okay. If you want, inspect the resulting data frames using
summary(),
tidyverse::glimpse(), and/or
skimr::skim()`.
Save this script (use good filenames!) and review-commit-push.You have three things due by Sunday night.
I’ve restructured the course a bit, so feel free to share ways to improve the submission process.