How Couples Meet and Stay Together

Abstract: 

How Couples Meet and Stay Together (HCMST) is a study of how Americans meet their spouses and romantic partners.

  • The study is a nationally representative study of American adults.
  • 4,002 adults responded to the survey, 3,009 of those had a spouse or main romantic partner.
  • The study oversamples self-identified gay, lesbian, and bisexual adults
  • Follow-up surveys were implemented one and two years after the main survey, to study couple dissolution rates. Version 3.0 of the dataset includes two follow-up surveys, waves 2 and 3.
  • Wave 4 is provided as a separate data file.

The study will provide answers to the following research questions:

  1. Do traditional couples and nontraditional couples meet in the same way? What kinds of couples are more likely to have met online?
  2. Have the most recent marriage cohorts (especially the traditional heterosexual same-race married couples) met in the same way their parents and grandparents did?
  3. How do the couple dissolution rates of nontraditional couples compare to the couple dissolution rates of more traditional same-race heterosexual couples?
  4. How does the availability of civil union, domestic partnership or same-sex marriage rights affect couple stability for same-sex couples? This study will provide the first nationally representative data on the couple dissolution rates of same-sex couples.
Principal Investigator: 
Rosenfeld, Michael J.
Funding Agency: 
Core funding from the US National Science Foundation, award SES-0751977
Supplementary funding from Stanford's Institute for Research in the Social Sciences
Supplementary funding from the UPS endowment at Stanford University
Waves 4 and 5 will be funded by the US National Science Foundation, award SES-1153867
How to Cite this Dataset: 

Rosenfeld, Michael J., Reuben J. Thomas, and Maja Falcon. 2011 and 2014. How Couples Meet and Stay Together, Waves 1, 2, and 3: Public version 3.04, plus wave 4 supplement version 1.02 [Computer files]. Stanford, CA: Stanford University Libraries.

Contact Email: 
Introduction: 

The HCMST data are freely available to users who register with SSDS/ Stanford Libraries.

* Note [12/06/2011] data.stanford.edu has a new web server, so if you find that you old login and password do not work, please clear your browser cache and cookies and try again. Thanks, and sorry for any difficulties.

Acknowledgements: 

I acknowledge core funding support from the U.S. National Science Foundation, and supplementary funding from Stanford's Institute for Research in the Social Sciences, and the UPS endowment at Stanford University. For research assistance, I thank Reuben Jasper Thomas, Elizabeth McClintock, Esra Burak, Kate Weisshaar, and Maja Falcon. For Web design and assistance, I am grateful to Ron Nakao and the Stanford Library. The following consultants contributed to the development of the survey instrument and the research design: Gary Gates, Jon Krosnick, Brian Powell, Daniel Lichter, Matthijs Kalmijn, Timothy Biblarz, and the staff of Knowledge Networks/GfK.

Methodology/Sampling
Universe: 
The universe for the HCMST survey is English literate adults in the U.S.
Unit of Analysis: 
Individual
Type of data collection: 
Survey Data
Time of data collection: 
Wave I, the main survey, was fielded between February 21 and April 2, 2009. Wave 2 was fielded March 12, 2010 to June 8, 2010. Wave 3 was fielded March 22, 2011 to August 29, 2011. Wave 4 was fielded between March and November of 2013. Dates for the background demographic surveys are described in the User's Guide, under documentation below.
Geographic coverage: 
United States of America
Smallest geographic unit: 
US region
Sample description: 

The survey was carried out by survey firm Knowledge Networks (now called GfK). The survey respondents were recruited from an ongoing panel. Panelists are recruited via random digit dial phone survey. Survey questions were mostly answered online; some follow-up surveys were conducted by phone. Panelists who did not have internet access at home were given an internet access device (WebTV). For further information about how the Knowledge Networks hybrid phone-internet survey compares to other survey methodology, see attached documentation.

The dataset contains variables that are derived from several sources. There are variables from the Main Survey Instrument, there are variables generated from the investigators which were created after the Main Survey, and there are demographic background variables from Knowledge Networks which pre-date the Main Survey. Dates for main survey and for the prior background surveys are included in the dataset for each respondent. The source for each variable is identified in the codebook, and in notes appended within the dataset itself (notes may only be available for the Stata version of the dataset).

Respondents who had no spouse or main romantic partner were dropped from the Main Survey. Unpartnered respondents remain in the dataset, and demographic background variables are available for them.

Sample response rate: 
Response to the main survey in 2009 from subjects, all of whom were already in the Knowledge Networks panel, was 71%. If we include the the prior initial Random Digit Dialing phone contact and agreement to join the Knowledge Networks panel (participation rate 32.6%), and the respondents’ completion of the initial demographic survey (56.8% completion), the composite overall response rate is a much lower .326*.568*.71= 13%. For further information on the calculation of response rates, and relevant citations, see the Note on Response Rates in the documentation. Response rates for the subsequent waves of the HCMST survey are simpler, using the denominator of people who completed wave 1 and who were eligible for follow-up. Response to wave 2 was 84.5%. Response rate to wave 3 was 72.9%. Response rate to wave 4 was 60.0%
Weights: 

See "Notes on the Weights" in the Documentation section.

Documentation
Data Download Link(s)
Click on a data file link to begin download of the compressed (zip) file of the original formatted data file (e.g., Stata:dta, SPSS portable:por, SAS export:xpt).
Data file link(s) will appear after you login: 

Create an account (link under "User Login" box above) to download data on the web site. Login to view the data file download link(s).

Errata: 

Note for SPSS and SAS users: we have replaced the portable versions of the SPSS and SAS files with the .sav and .sas7bdat versions, respectively, to accommodate the long variable names in the dataset. The Stanford research team does all their work in STATA, so if you find discrepancies between the SAS or SPSS versions of the dataset and the documentation, please let us know. Thus far we have found that SPSS truncates value labels to 32 characters.

Data Notes: 

Current Data Version 3.04 plus wave 4 supplement, version 1.02

Schedule of Future Additions to the HCMST dataset

  1. We have funding from the NSF for a 5th wave of HCMST to be fielded in the fall of 2014

Forthcoming and restricted HCMST data

Restricted data:

  • Disclosure of redacted full-text answers to q24 ("how couples meet") and q35 ("explain relationship quality"). Because of demand from users of HCMST, we have redacted the Q24 and Q35 text answers, and obtained IRB approval to share the redacted answers on a restricted basis. As of February, 2013, ICPSR is making the edited versions of full-text q24 and q35 available to researchers who get their own IRB approval to host the data. Contact ICPSR for access.
  • We are planning a wave 5 HCMST survey in late 2014.
  • We are planning (at a future date) to redact the text variables from wave 4 and append them to the restricted data hosted by ICPSR.
  • Geographic codes for ZIP code, as well as a variety of state-based variables, which have been suppressed from the public dataset in order to preserve respondent confidentiality, are available from ICPSR for users who obtain IRB approval.

Frequently Asked Questions:

Q) There are a variety of different kinds of questions in the dataset about sexual identity, whether the respondent is part of a same-sex couple, and what gender of person the respondent is sexually attracted to. The answers to these questions sometimes seem to provide contradictory information. Why?
A) There is some inherent ambiguity in the realm of sexual identity and in identifying same-sex couples. There is also the possibility that a small number of respondents don't understand the questions. PI Rosenfeld created several new variables for the dataset, same_sex_couple, potential_partner_gender_recodes, alt_partner_gender. These new variables represent the researcher's best guess as to the gender of the partner, and as to whether the couple is a same-sex couple. In creating these variables PI Rosenfeld relied mostly on the variables in the public data, and a little bit on the text answers that are not part of the public data.

Q) What is the variable that identifies the partnered respondents?
A) qflag

Q) Why do the variables for children in the household (such as ppt01, ppt25, etc) not yield exactly the same information about household members as the household roster variables (such as pphhcomp11_member2_relationship)?
A) There are several reasons for the discrepancies. First, the ppt01 and ppt25 variables were derived from answers provided by the head of household, while the household roster variables were derived from the survey respondent. Not all survey respondents are the head of household as far as Knowledge Networks is concerned (see variable pphhhead). Second, household survey that was the source of the ppt01 and ppt** variables may not have taken place at the same time as the Core Adult Profile which was the source of the household roster variables such as pphhcomp11_member2_relationship. Lastly, the ppt01 and ppt** variables are incremented over time, as the children in the household are presumed to age over time. So the ppt** variables are accurate for the time of wave 1 of the HCMST survey, whereas the household roster variables are accurate reports from the time of the Core Adult Profile, which took place earlier.

Changes, additions, and improvements to the dataset

Changes for version 2.0

  • Version 2.0 of the dataset includes new variables from wave II of the survey, the one year follow-up, along with the previously available variables from wave I, the main survey. See the new User's Guide under documentation for more information about variable layout.
  • Version 2.0 also includes a second round of background demographic data for most respondents in the dataset, see User's Guide for variable layout.
  • The Stanford research team has added a new variable, how_met_online, which categorizes the prior social connections (if any) between respondent and partner for respondents who met their partners online, based largely on an exhaustive re-analysis of the respondents open text answers to q24 (the open text answers are not yet available in the public dataset for respondent confidentiality reasons). See also the new variable either_internet_adjusted .
  • Version 2.0 includes a new couple weight, weight_couples_coresident, see the updated documentation on weights for more details.
  • For version 2.0 the Stanford research team has added two new date variables in YYYYMMDD format, HCM_main_interview_fin_date and w2_HCM_interview_fin_date. These variables are easier to read but less useful for analysis than the other date-time format variables already in the dataset.
  • The variable partner_deceased has been updated to reflect the discovery of a few more cases of respondents whose partner was already deceased at the time of the main survey.

Changes from version 2.0 to version 3.02

  • Version 3 includes wave 3 of the survey (the second follow-up survey) variables generally starting with w3_*, along with the third round of core adult profile data, variables generally starting with pp3_*
  • New variables describing the particular family members who played the intermediary role in respondent meeting partner, variables coded q24_fam*
  • The documentation for earlier versions offered a not-quite correct explanation for the variable ppnet. ppnet actually codes whether subject had their own internet access at home at the time of the profile survey, so this can change with each wave of the KN profile survey, so each profile survey will carry new versions of ppnet (see pp2_ppnet and pp3_ppnet).
  • Stanford research team decided to include newer versions of profile data for subjects’ race, and education with each new wave of profile data; see for instance pp2_ppeduc, pp3_ppeduc.
  • Variable q18a_3 label and description were clarified.
  • Variable w2_xss label and description were clarified.
  • Labels for how_long_ago_first_met, how_long_ago_first_cohab, etc were clarified to make clear that the unit is years.
  • re-coded the w2_broke_up and the w3_brokeup_actual to distinguish between break-up and partner deceased.
  • Added variables for interstate moves by subjects between pp1, pp2, and pp3, see for instance interstate_mover_pp1_pp2
  • Many clarifications to documentation and variable and value labels
News Coverage: 

* USA Today, Feb 11, 2010 story by Sharon Jayson on friends, the Web, and How Couples Meet

* Stanford Report, Feb 11, 2010 a feature story on How Couples Meet, with video

* San Jose Mercury News, Feb 14, 2010 Growing Number of Singles Find Their Valentines Online (link currently unavailable).

* NPR, "Computers are Becoming Cupid's Best Weapon," story by Jennifer Ludden August 16, 2010

* Reuters Newswire Being Online can Boost Your Chances of Being In Love , August 16, 2010

* Radio Nacional de Colombia story , August 16, 2010

* The Economist story "Love at First Byte", December 29, 2010.

* The Discovery Chanel story "Does Online Dating Work?", February 11, 2011

* The ABC News version of the Discovery Chanel Story Here, from February 12, 2011

* A New York Times article on online dating "Love, Lies and What They Learned" , from November 12 (online) and November 13 (print), 2011.