I am currently pursuing two main research areas.
Modeling cognitive state. When we humans interact, we bring our own cognitive state to the conversation. Cognitive states (which have also been called “private states”) include beliefs, desires, intentions, plans, emotions. Beliefs can be complex, such as beliefs about entire narratives. But in conversation, we must also model our interlocutors’ cognitive state: for example, we don’t want to tell them something they already know, and if we want to convince them of something, we need to find arguments that they will find convincing. This ability to model others’ cognitive state has been called “theory of mind”. Conversation cannot proceed if there is no common ground, a set of beliefs which all interlocutors believe are mutually believed by all interlocutors.
The research of my group focuses on how to model cognitive states, including theory of mind and common ground, in conversational agents. We draws on insights from cognitive science and linguistics, with the goal of determining how dialog agents based on large language models can be made to show human-like behavior.Arabic and its dialects. Arabic is characterized by a co-existence of many dialects, largely distinguished on the basis of geography, but also socio-cultural factors. These dialects are the spoken language for everyday usage. While some dialects have received much attention, such as Cairene Egyptian, most others have not. If we want to build natural language processing systems for Arabic dialects, we can treat them as distinct languages, but this approach will not account for the continuous variation we observe. Instead, we need to be able to model the language at any specific geographic/socio-cultural point.
The research group at Stony Brook currently focuses on morphophonological rules that transform underlying representations (identical or similar across dialects) into surface forms (which can differ greatly between dialects). We attempt to learn these rules from data, and consider each dialect a low-resource variant. We leverage knowledge from similar, adjacent dialects. This work currenlty focuses on spoken Arabic, but we are also interested in how the spoken form affects choices made by native speakers writing in their dialect (dialects do not have an orthographic standard).
More generally, I am interested in morphology, syntax (mainly), and semantics. I am interested in linguistic analyses, in formalisms that can be used to describe them, and in processing models . Much of my work (including my thesis on German syntax) is presented in the framework of Tree Adjoining Grammar (TAG), or at least is heavily influenced by TAG. TAG was developed by Aravind Joshi starting in the 1980s. One of the many interesting properties of TAG is that it bridges phrase structure and dependency representations of syntax.
I have worked on both natural language generation and natural language understanding.
I am also interested in how language is used in context, for example in email conversations or on Twitter. I have worked on how discourse participants signal beliefs and sentiments.
I have worked on many different languages, including Arabic, English, German, and Hindi.
I have linked some representative papers to keywords on this page; click here for my full list of publications.