Skip to content

mathigatti/okCupidScraper

Repository files navigation

okCupid scraper

okCupid provides self descriptions, selfies and big questionnaires that are really useful for anyone interested in psychometrics. This project shows how to easily download thousands of users.

Requisites

Log in with your okcupid account and download cookies with Get cookies.txt. Place the okcupid.com_cookies.txt file in the scraper root folder. Replace chromedriver file with the one that corresponds to your OS and chrome version.

Then install required python packages

  • python -m pip install -r requirements.txt

Usage

This scraper has two scripts, the first one downloads the profile data (except the questions) of all users it can find by swiping in the okcupid web app. The second one goes through the scraped users and downloads their answered questions.

Find users and download their data

Using this script and changing your profile details, like gender, sexual orientation and location you can scrape pretty much all users in a given location in okCupid.

You can run it like this, users data will be downloaded into users folder

  • python users_by_discover.py

You can also try the users_by_question.py script, it search for users that answered specific questions, questions.csv has pretty much all okCupid questions, so I just end up searching for all the possible questions, in the practice users_by_discover.py was more effective into downloading big quantities of users.

Download users questions

You can run it like this, users answers will be downloaded into answers folder

  • python users_by_question.py

Parsing data

In the testing.ipynb notebook you can check some examples of how to process the data. Users data is downloaded as HTML so I use beautifulSoup to parse it and extract the relevant information. Users questions are in JSON format so it's easier to process.

How to cite this?

This source code was developed by Mathias Gatti (@mathigatti) if you publish something that used it remember to mention this project. For scientific publications you can cite it like this in APA notation.

Gatti, M. (2022). mathigatti/okCupidScraper: v1.0.0 (Version v1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.5889263

Applications

For now I just used it to scrape self descriptions and train an AI to generate new ones. You can check more about it here.

Related datasets