Ingesting Knowledge from Diverse Sources to Open Domain Social Conversations

Dilek Hakkani-Tur

Abstract: Following the recent advancements in language modeling and availability of large natural language datasets, the last decade has been flourishing for conversational AI research. The progress also helped emphasize the importance of reasoning over a diverse set of external knowledge and task completion resources for forming relevant, informative, and accurate responses, discussing with the users when the available solutions/information are not sufficient, and making proactive suggestions.

For ingesting knowledge in conversations, recent work has mainly grounded conversational responses on knowledge snippets from wikipedia and web documents, with the goals of preventing hallucination and providing users diverse and accurate responses. However, much of the world's knowledge is dynamic and it is spread across diverse resources. Some of these are already structured, such as knowledge graphs. But a majority of them are not structured, for example, news articles and books. And some of them also include subjective information, such as customer reviews. In this talk, I will discuss our recent work on integrating knowledge to conversation responses from such a diverse set of resources, challenges associated with these, and progress we made so far.

Bio: Dilek Hakkani-Tür is a senior principal scientist at Amazon Alexa AI focusing on enabling natural dialogues with machines. Prior to joining Amazon, she was a researcher at Google Research, Microsoft Research, International Computer Science Institute at University of California, Berkeley, and AT&T Labs - Research. She received her BSc degree from Middle East Technical Univ, and MSc and PhD degrees from Bilkent Univ, Department of Computer Engineering. Her research interests include conversational AI, natural language and speech processing, spoken dialogue systems, and machine learning for language processing. She has over 80 patents that were granted and co-authored more than 300 papers in natural language and speech processing. She received several best paper awards for publications she co-authored on conversational systems, from IEEE Signal Processing Society, ISCA, EURASIP and others. She served as an associate editor for IEEE Transactions on Audio, Speech and Language Processing (2005-2008), a member of the IEEE Speech and Language Technical Committee (2009-2014), an area editor for speech and language processing for Elsevier's Digital Signal Processing Journal and IEEE Signal Processing Letters (2011-2013), the Editor-in-Chief of the IEEE/ACM Transactions on Audio, Speech and Language Processing (2018-2021), and an IEEE Distinguished Industry Speaker (2021). She also served on the ISCA Advisory Council (2015-2019) and the IEEE Signal Processing Society Fellows Committee (2019-2022). She was elected as a fellow of the IEEE (2014) and ISCA (2014).

Insights on the relationship between usage frequency, user proficiency, and interaction quality for a virtual assistant

Jason Williams

Abstract: For a virtual assistant, it seems clear that users who have a higher-quality experience would tend to use the assistant more. But causality is less obvious — for example, does higher usage frequency result from higher-quality interactions, or is higher usage frequency a reflection of higher user proficiency? How does user proficiency change over time? In this talk I'll cover a quantitative investigation into the relationships between usage frequency, user proficiency, and interaction quality for a real-world virtual assistant. The insights from this study may help inform reward or loss functions for virtual assistants optimized with reinforcement or semi-supervised learning. This is joint work with colleagues Zidi Xiu, Kai-Chen Cheng, David Q. Sun, Jiannan Lu, Hadas Kotek, Paul McCarthy, Yuhan Zhang, Christopher Klein, and Stephen Pulman.

Bio: Jason D. Williams leads a team that builds language understanding for Siri at Apple, where he has been since 2018. Prior to Apple, he was a Research Manager at Microsoft Research, leading research groups on conversational systems and reinforcement learning. Jason has published over 60 peer-reviewed papers on dialog systems and related areas, with over 8,000 citations and five best paper/presentation awards. Jason initiated the Dialog State Tracking Challenge series in 2012; shipped components of the first release of Microsoft Cortana in 2014; and launched Microsoft's Language Understanding Service ( ( in 2015. Jason has previously served as an elected member of the IEEE Speech and Language Technical Committee (SLTC) in the area of spoken dialogue systems for 3 terms, President of SIGDIAL, senior area chair at ACL and EMNLP, and general chair and technical chair of IEEE ASRU.


Responsible & Empathetic Human Robot Interactions

Pascale Fung

Abstract: Conversational AI (ConvAI) systems have applications ranging from personal assistance, health assistance to customer services. They have been in place since the first call centre agent went live in the late 1990s. More recently, smart speakers and smartphones are powered with conversational AI with similar architecture as those from the 90s. On the other hand, research on ConvAI systems has made leaps and bounds in recent years with sequence-to-sequence, generation-based models. Thanks to the advent of large scale pre-trained language models, state-of-the-art ConvAI systems can generate surprisingly human-like responses to user queries in open domain conversations, known as chit-chat. However, these generation based ConvAI systems are difficult to control and can lead to inappropriate, biased and sometimes even toxic responses. In addition, unlike previous modular conversational AI systems, it is also challenging to incorporate external knowledge into these models for task-oriented dialog scenarios such as personal assistance and customer services, and to maintain consistency. In this talk, I will introduce state-of-the-art generation based conversational AI approaches, and will point out remaining challenges of conversational AI and possible directions for future research, including how to mitigate inappropriate responses. I will also present some ethical guidelines that conversational AI systems can follow.

Bio: Pascale Fung is a Professor at the Department of Electronic & Computer Engineering and Department of Computer Science & Engineering at The Hong Kong University of Science & Technology (HKUST). Prof. Fung received her PhD in Computer Science from Columbia University in 1997. She worked and studied at AT&T Bell Labs (1993~1997), BBN Systems & Technologies (1992), LIMSI, CNRS, France (1991), Department of Information Science, Kyoto University, Japan (1989~1991), and at Ecole Centrale Paris, France(1988). She is an elected Fellow of the Association for Computational Linguistics (ACL) for her "significant contributions towards statistical NLP, comparable corpora, and building intelligent systems that can understand and empathize with humans". She is an Fellow of the Institute of Electrical and Electronic Engineers (IEEE) for her "contributions to human-machine interactions" and an elected Fellow of the International Speech Communication Association for "fundamental contributions to the interdisciplinary area of spoken language human-machine interactions". She served as Editor and Associate Editor for Computer Speech and Language, IEEE/ACM Transactions on Audio, Speech and Language Processing, Transactions for ACL, IEEE Signal Processing Letters. She served as a Committee Member of the IEEE Signal Processing Society Speech and Language Technology Committee (SLTC) for six years. She is a past president t and a Board Member of the ACL Special Interest Group on Linguistics Data and Corpus Based Approaches in NLP (SIGDAT).