Notes on Texts and Features Used in Computational Stylistics Gender Studies

Shlomo Argamon
Jonathan Fine
Moshe Koppel
This page describes the features used in our corpus-based study on gender and writing, as discussed in our paper:

Shlomo Argamon, Moshe Koppel, Jonathan Fine and Anat R. Shimoni.
Gender, Genre, and Writing Style in Formal Written Texts.

A list of the texts used in the study may be found here.

A list of the texts used in our other study, reported on in:
Moshe Koppel, Shlomo Argamon, and Anat R. Shimoni.
Automatically Categorizing Written Texts by Author Gender
may be found here.

The lexical features used in both studies were a set of function words as well as many words not associated with a specific topic. They include many of the words usually used as "stopwords" in information retrieval. The list of 467 such features may be found here.

The POS n-grams used in both studies are listed here, and a description of the POS tags may be found here.

If you have any questions, do not hestitate to contact us.