Research Projects

Tracing the History of Words. A Portrait of a Discipline Through Analyses of Keyword Counts in Large Corpora of Scientific Literature.

Ambito disciplinare Sociologia

Tipologia finanziamento Istituzionale

Ente Finanziatore ATENEO - Progetti di Ricerca di Ateneo

Data avvio: 16 February 2015

Data termine: 28 February 2017


The growing availability of large chronological corpora of scientific literature and recent developments of statistical analysis of textual data open a new strain of studies in many research fields. 
The main aim of this research project is to explore the opportunities of reading the temporal evolution of concepts, methods and applications, i.e., the history of a specific discipline, by means of the temporal evolution of relevant keywords included in papers published by mainstream scientific journals. In the light of the frequency of appearance of keywords in large amounts of scientific texts over time, finding methods to assess the "shape" of the history of those keywords would be useful in order to better understand the evolution of the main concepts and retrieve which were in the past and which are today the main research topics of a discipline.
The project aims at 
1) selecting a small significant set of early scientific journals that should be able to cover main topics and represent the temporal evolution of the disciplines involved in the project;
2) collecting large chronological corpora of scientific literature downloading papers published by outstanding and early scientific journals (titles, abstract and full-text);
3) comparing and contrasting the available methods for retrieving most relevant keywords in large corpora;
4) identifying new methods to assess the chronological development of selected sets of keywords;
5) achieving an overview of the relationship between time and keywords to check the existence of latent temporal patterns;
6) identifying keywords showing prototypical temporal patterns (e.g. neologisms, keywords that have disappeared, keywords whose frequency is growing, rarely used keywords, etc.) and clustering keywords portraying similar temporal patterns;
7) fostering a discussion on the concept of the “quality of life” of keywords within the scientific communities.
Following a preliminary study on the History of Statistics (Trevisani and Tuzzi 2014) the research project aims at focussing on corpora of scientific literature of two disciplines in the research domain of Humanities and Social Sciences: Philosophy and Sociology. The corpora will be derived from online archives (Thompson Reuters/ISI Web of Knowledge, JSTOR, etc.), with particular reference to scientific journals that should be able to portray main reserach topics.