In this post I discuss work from dissertation on using machine learning and the text of parliamentary questions to measure the constituency service of politicians. The code for this analysis can be found here.
In many countries, constituency service — politicians’ efforts to advance the interests of their constituency or individual constituents — is a key activity for politicians. Research suggests that constituency service helps politicians to cultivate a personal vote which can prove beneficial for retaining office in elections. Despite this, most research on politicians’ representation of constituents focuses on politicians’ voting records in legislatures, which provides limited information on the constituency service activities of politicians. This is especially true for countries like Ireland and the United Kingdom, which have disciplined political parties and voting against the party can potentially result in expulsion from the party.
An alternative approach is to utilize parliamentary questions to study the constituency service behavior of politicians. Parliamentary questions are a feature of most national legislatures and enable politicians to question government ministers on a wide variety of issues.1 This makes questions a useful resource for understanding the constituency activities of politicians. To better understand the content of such questions, I provide two examples below asked by TDs in Dáil Éireann in 2016.2
asked the Tánaiste and Minister for Social Protection Information the position regarding an application for a hearing aid by a person (details supplied) in County Kerry; and if she will make a statement on the matter (asked by Deputy Michael Healy-Rae)
asked the Tánaiste and Minister for Justice and Equality Information his views on the implications for border security in the event of Britain and Northern Ireland voting to exit the European Union; and if she will make a statement on the matter (asked by Deputy Micheál Martin)
The first question, submitted by Michael Healy-Rae from the Kerry constituency, is a good example of a constituency service question. Deputy Healy-Rae is inquiring about a Kerry constituent’s hearing aid application. Meanwhile, the second example question, submitted by Micheál Martin from the Cork South–Central constituency, pertains to the Brexit referendum and does not mention his constituency or an individual constituent.
To measure the constituency service activities of legislators using questions, I first built an original data set consisting of all questions submitted between 1973 and 2007 with my advisor.3 I plot the total number of questions submitted by TDs in each Dáil Éireann in this period below.
Given the large number of questions submitted, it is obviously not feasible to manually classify every question as constituency service or not. Therefore, I hand coded a random sample of texts, following rules established in other studies.4 A question is coded as a constituency service query if the constituency of the questioner, a location in the constituency of the questioner, or an individual who we can reasonably assume is a constituent are mentioned in a question. I then employed a machine learning model to classify the remaining questions.
I employed three machine learning algorithms commonly used for text classification: logistic regression, support vector machines, and the naive Bayes model.5 I used cross validation to find the various optimal hyperparameters for these models and compared their performance on the held out test set. I present these results in the table below.
|Lasso Logistic Regression||0.91||0.91||0.91||0.91|
We can see that the performance of each models is quite good and none of the models is drastically better or worse than another. Given the F-1 measure is highest for the logistic regression classifier, I used this model to classify the remaining unlabeled texts. One criticism often leveled at machine learning models is that these algorithms are black boxes and we don’t know what drives their performance. To provide some clarity about how the logistic regression model classifies questions, I present the terms with largest coefficients below.
These coefficients can be understood as representing how each occurrence of a given term in a question influences the likelihood of the question being classified as constituency service. Unsurprisingly, names of counties and cities and the term ‘person’, generally used when a legislator is discussing a specific case of an individual, are strong predictors of constituency service questions. Conversely, the top predictors of non-constituency service questions, include the words ‘ireland’ and ‘irish’ which suggests these questions focus on national issues.
Once questions have been classified as constituency service or not, we can then create an aggregate measure for each legislator and explore the relationship between constituency service focus and other political factors, providing new evidence on how elected politicians represent constituents.
The approach presented can be employed to produce new insights into the representational style of politicians in other countries also. More broadly, this project also highlights how text classification models can be leveraged to accurately generate new information from a large collection of documents at a relatively low cost in a variety of domains.
Notably, parliamentary questions are not a feature of the United States’ Congress. ↩
TDs are legislators that serve in Dáil Éireann. Dáil Éireann is the lower house of the Irish parliament. ↩
I utilized a stratified random sampling scheme where each Dáil Éireann served as a stratum. ↩