Analyzing the 2015 California Health Interview Survey in R

A few years ago, I wrote about how to analyze the 2012 California Health Interview Survey in R. In 2012, plans for Covered California (Obamacare in California) were just beginning to take shape. Today, Covered California is a relatively mature program and it is arguably the most successful implementation of the Affordable Care Act in the United States. This month, UCLA’s Center for Health Policy released the 2015 California Health Interview Survey (CHIS for short). With this fantastic new data set, we can measure the impact of Covered California in its second year. In this brief post, I’ll review the basics of the working with CHIS data in R by way of a simple example. My hope is to inspire other R users to dive into this unique data set.

The CHIS Quickstart guide for R

Though CHIS is a complex survey, it’s simple to work with CHIS data in R. Here’s how to get started:

  • Head over to the CHIS site and create an account.
  • Once you’ve created a free account and accepted a bunch of terms of use agreements, you’ll be able to download the CHIS public use data files. You’ll want to download the Stata .dta versions, as these are the easiest to work with using R’s foreign package.
  • CHIS data is divided into 3 groups, child, adolescent, and adult. We’ll work with the adult data below.
  • You’ll also want to download the appropriate data dictionary for your data set. The dictionary provides excellent documentation about the hundreds of variables covered by CHIS. If it’s your first time working with CHIS, I recommend a quick skim of the entire dictionary to get a sense of the kinds of things covered by the survey.

Once you’ve downloaded the data, to bring it into R you can use the foreign package:

# Read CHIS file
file <- "~/projects/CHIS/chis15_adult_stata/Data/ADULT.dta"  # your file
CHIS   <- read.dta(file, convert.factors = TRUE)

The most important thing to understand about CHIS data is how to use the replicate weights RAKEDW0-RAKED80. I covered the use of replicate weights in detail in this post. The important points about replicate weights in CHIS are:

  • Use RAKEDW0 for estimating means and counts in CHIS. RAKEDW0 is designed so that it's sum across all rows in the CHIS data is equal to the total non-institutionalized adult population of California.
  • Use RAKEDW1-RAKED80 for estimating variances as described here.

As an example, let's start by getting counts of health insurance coverage by type. For this we have two insurance type variable INSTYPE and the new INS9TP which gives a more detailed breakdown of insurance types.

# tabulate the data
print(, CHIS, drop.unused.levels = TRUE)))
#              instype       Freq
# 1           UNINSURED  2910380.5
# 2 MEDICARE & MEDICAID  1561496.7
# 3   MEDICARE & OTHERS  1646743.6
# 4       MEDICARE ONLY  2129841.7
# 5            MEDICAID  6239539.9
# 6    EMPLOYMENT-BASED 12193686.7
# 8        OTHER PUBLIC   415154.8

One interesting health behavior that CHIS tracks is fast food consumption. To create the variable AC31, CHIS asked respondents about the number of times they ate fast food in the past week. This simple script explores how fast food consumption behavior interacts with health insurance coverage type:

Already with this superficial analysis we can see some interesting things. First we notice that the uninsured are eating fast food more often than the non-Medicaid insured. The uninsured's fast food behavior looks quite similar to the Medicaid population while the fast food behaviour of the employment-based insured resembles the behaviour of the private purchase group. And most importantly, everyone is eating too much fast food.


I hope this simple example inspires you to investigate CHIS data on your own. I think it would be especially interesting to see some further analysis of the nearly 3 million Californians who remain uninsured despite the relative success of Covered California. Some interesting background research on this topic can be found here and here. Feel free to get in touch if you are working with CHIS data to improve public health in California.