SSNLP 2020

The 2020 Singapore Symposium on Natural Language Processing





Welcome!

The 3rd annual edition of the Singapore Symposium on Natural Language Processing (SSNLP) will take place online on December 11, 2020.


Latest news

December 11 SSNLP 2020 is now live! join us here

November 21 SSNLP 2020 registration is now live! It's free, go register now!

November 20 Join our SSNLP 2020 Slack Workspace now!

November 1 We have confirmed three world-class academic speakers so far, with more on the way!

October 21 We just launched the website! Stay tuned for more details on registration and program schedule!

Programme

December 11, 2020 (SGT) December 11, 2020 (SGT)
08:10 - 08:25 Welcome and Opening Remarks
08:25 - 09:00 Knowledge-Robust and Multimodally-Grounded NLP
speaker:   Mohit Bansal   ::   chaired by:   Soujanya Poria
09:00 - 09:35 Low resourced but long tailed spoken dialogue system building
speaker:   Eric Fosler-Lussier   ::   chaired by:   Li Haizhou
09:35 - 10:10 Advances in Question Answering Research for Personal Assistants
speaker:   Alessandro Moschitti   ::   chaired by:   Gao Wei
10:10 - 10:45 A Typology of Ethical Risks in Language Technology with an Eye Towards Where Transparent Documentation Can Help
speaker:   Emily M. Bender   ::   chaired by:   Kokil Jaidka
10:45 - 11:20 Do pretraining language models really understand language?
speaker:   Minlie Huang   ::   chaired by:   Lei Wenqiang
11:20 - 11:55 Low Resource Machine Translation
speaker:   Pushpak Bhattacharyya   ::   chaired by:   Soujanya Poria
11:55 - 15:00 Lunch break
15:00 - 16:00 Panel discussion: On Low-Resource NLP
  chaired by:   Nancy Chen
16:00 - 16:35 Understanding Product Reviews: Topic-Specific Word Embedding Learning and Question-Answering
speaker:   Yulan He   ::   chaired by:   Jing Jiang
16:30 - 16:50 Closing Remarks

Keynote Speakers

The following speakers have accepted to give keynotes at SSNLP 2020. You can view the detailed information by clicking the images.

Title: Low Resource Machine Translation

Speaker: Pushpak Bhattacharyya

Abstract: AI now and in future will have to grapple continuously with the problem of low resource. AI will increasingly be ML intensive. But ML needs data often with annotation. However, annotation is costly. Over the years, through work on multiple problems, we have developed insight into how to do language processing in low resource setting. Following 6 methods- individually and in combination- seem to be the way forward:

  • Artificially augment resource (e.g. subwords)
  • Cooperative NLP (e.g., pivot in MT)
  • Linguistic embellishment (e.g. factor based MT, source reordering)
  • Joint Modeling (e.g., Coref and NER, Sentiment and Emotion: each task helping the other to either boost accuracy or reduce resource requirement)
  • Multimodality (e.g., eye tracking based NLP, also picture+text+speech based Sentiment Analysis)
  • Cross Lingual Embedding (e.g., embedding from multiple languages helping MT, close to 2 above)

The present talk will focus on low resource machine translation. We describe use of techniques from the above list and bring home the seriousness and methodology of doing Machine Translation in low resource settings.

Bio: Prof. Pushpak Bhattacharyya is Professor of Computer Science and Engineering Department IIT Bombay. Prof. Bhattacharyya's research areas are Natural Language Processing, Machine Learning and AI (NLP-ML-AI). Prof. Bhattacharyya has published more than 350 research papers in various areas of NLP. Author of the textbook 'Machine Translation' Prof. Bhattacharyya has shed light on all paradigms of machine translation with abundant examples from Indian Languages. Two recent monographs co-authored by him called 'Investigations in Computational Sarcasm' and 'Cognitively Inspired Natural Language Processing- An Investigation Based on Eye Tracking' describe cutting edge research in NLP and ML. Prof. Bhattacharyya is Fellow of Indian National Academy of Engineering (FNAE) and Abdul Kalam National Fellow. For sustained contribution to technology he received the Manthan Award of the Ministry of IT, P.K. Patwardhan Award of IIT Bombay and VNMM Award of IIT Roorkey. He is also a Distinguished Alumnus of IIT Kharagpur.

Title: Knowledge-Robust and Multimodally-Grounded NLP

Speaker: Mohit Bansal

Abstract: In this talk, I will present our group's recent work on NLP models that are knowledge-robust and multimodally-grounded. First, we will describe multi-task and reinforcement learning methods to incorporate novel auxiliary-skill tasks such as saliency, entailment, and back-translation validity (including bandit-based methods for automatic auxiliary task selection+mixing and multi-reward mixing). Next, we will discuss developing adversarial robustness against reasoning shortcuts and cross-domain/lingual generalization in QA and dialogue models (including auto-adversary generation). Lastly, we will discuss multimodal, grounded models which condition and reason on dynamic spatio-temporal information in images and videos, and action-based robotic navigation and assembling tasks (including commonsense reasoning for ambiguous robotic instructions).

Bio: Dr. Mohit Bansal is the Parker Associate Professor in the Computer Science department at UNC Chapel Hill. Prior to this, he was a research assistant professor at TTI-Chicago. He received his PhD from UC Berkeley and his BTech from IIT Kanpur. His research expertise is in statistical natural language processing and machine learning, with a particular focus on multimodal, grounded, and embodied semantics (including RoboNLP), human-like language generation and Q&A/dialogue, and interpretable and generalizable deep learning. He is a recipient of the 2020 IJCAI Early Career Spotlight, 2019 DARPA Director's Fellowship, 2019 Google Focused Research Award, 2019 Microsoft Investigator Fellowship, and 2019 NSF CAREER Award. His service includes Program Co-Chair for CoNLL 2019, Senior Area Chair for several ACL, EMNLP, AAAI conferences, and Associate Editor for CL, IEEE/ACM TASLP, and CSL journals.
Webpages: cs.unc.edu/~mbansal, murgelab.cs.unc.edu, https://nlp.cs.unc.edu/

Title: A Typology of Ethical Risks in Language Technology with an Eye Towards Where Transparent Documentation Can Help

Speaker: Emily M. Bender

Abstract: People are impacted by language technology in various ways: as direct users of the technology (by choice or otherwise); indirectly, when others use the technology; and in its creation, as annotators or contributors to training data sets (knowingly or not). In these roles, risks are borne differentially by different speaker populations, depending on how well the technology works for their language varieties and the extent to which they are subjected to marginalization. This talk explores strategies for mitigating these risks based on transparent documentation of training data.

Bio: Emily is a professor of linguistics at the University of Washington where she is the faculty director of the professional MS program in computational linguistics. Her research interests include the interaction of linguistics and NLP and the societal impact of language technology and how transparent documentation can help mitigate the effects of bias and the potential for trained systems to perpetuate systems of oppression. She is also actively working on how to best incorporate training on ethics and societal impact into NLP curricula.

Title: Understanding Product Reviews: Topic-Specific Word Embedding Learning and Question-Answering
Speaker: Prof. Yulan He

Abstract: In this talk, I will present our recent work on analysing product reviews. I will start with a novel generative model for jointly learning topics and topic-specific word embeddings from product reviews. In word embedding learning, traditional methods learn a single vector representation for each word, while deep contextualised approaches learn a separate vector representation for each occurrence of a word. Our proposed method sits in the middle that it learns different vectors for a word depending on which topic the word is associated with. The proposed model can be easily integrated with pre-trained contextualised word embeddings to capture the domain-specific semantics better compared to directly fine-tuning the pre-trained language models on the target domain, leading to improved sentiment classification performance. I will next present a cross-passage hierarchical memory network for generative question-answering on product reviews. It extends XLNet by introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidence, and the answer memory working as a buffer continually refining the generated answers. The proposed architecture outperforms the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing questions based on Amazon reviews.

Bio: Yulan He is a Professor and the Director of Research at the Department of Computer Science at the University of Warwick, UK. Her research interests lie in the integration of machine learning and natural language processing for text analytics. She has published over 170 papers on topics including sentiment analysis, topic/event extraction, clinical text mining, recommender systems, and spoken dialogue systems. She was a Program Co-Chair for EMNLP 2020. She is currently holding the Turing AI Fellowship. Yulan obtained her PhD degree in spoken language understanding from the University of Cambridge, and her MEng and BASc degrees in Computer Engineering from Nanyang Technological University, Singapore.

Title: Low resourced but long tailed spoken dialogue system building

Speaker: Eric Fosler-Lussier

Abstract: In this talk, I discuss lessons learned from our partnership with the Ohio State School of Medicine in developing a Virtual Patient dialog system to train medical students in taking patient histories. The OSU Virtual Patients unusual development history as a question-answering system provides some interesting insights into co-development strategies for dialog systems. I also highlight our work in “speechifying” the patient chatbot and handling semantically subtle questions when speech data is non-existent and language exemplars for questions are few.

Bio: Eric Fosler-Lussier is a Professor of Computer Science and Engineering, with courtesy appointments in Linguistics and Biomedical Informatics, at The Ohio State University. He is also co-Program Director for the Foundations of Artificial Intelligence Community of Practice at OSU's Translational Data Analytics Institute. After receiving a B.A.S. (Computer and Cognitive Science) and B.A. (Linguistics) from the University of Pennsylvania in 1993, he received his Ph.D. in 1999 from the University of California, Berkeley. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, and held visiting positions at Columbia University and the University of Pennsylvania. He currently serves as the IEEE Speech and Language Technical Committee Chair and was co-General Chair of ASRU 2019 in Singapore. Eric's research has ranged over topics in speech recognition, dialog systems, and clinical natural language processing, which has been recognized in best paper awards from the IEEE Signal Processing Society and the International Medical Informatics Association.

Title: Advances in Question Answering Research for Personal Assistants

Speaker: Alessandro Moschitti

Abstract: Automated Question Answering (QA) has traditionally been an interesting topic for NLP researchers as its solutions involve the use of several language components, e.g., syntactic parsers, coreference and entity resolution, semantic similarity modules, knowledge sources, inference and so on. In recent years, there has been a renewed interest in QA also thanks to the introduction of personal assistants and chatbots, for which, QA can play an essential technology role. In this talk, we will describe how current NLP breakthroughs, i.e., neural architectures, pre-training and new datasets, can be used to build QA systems of impressive accuracy in answering standard information intent questions. In particular, we will (i) describe the components to design a state-of-the-art QA system, (ii) provide an interpretation of why Transformer models are very effective for QA, (iii) illustrate our transfer and adapt (TANDA) approach to improve Transformer models for QA, and (iv) provide effective solutions, e.g., our Cascade Transformer, to make such technology efficient.

Bio: Alessandro Moschitti is a Principal Applied Research Scientist of Amazon Alexa leading the research on retrieval-based QA systems (since 2018), and a professor of the CS Dept. of the University of Trento, Italy (since 2007). He obtained his Ph.D. in CS from the University of Rome in 2003. He was a Principal Scientist of the Qatar Computing Research Institute (QCRI) for 5 years (2013-2018), and worked as a research fellow at The University of Texas at Dallas for 2 years (2002-2004). He was (i) a visiting professor for the Universities of Columbia, Colorado, John Hopkins, and MIT (CSAIL department); and (ii) a visiting researcher for the IBM Watson Research center (participating at the Jeopardy! Challenge 2009-2011). His expertise concerns theoretical and applied machine learning in the areas of NLP, IR and Data Mining. He has devised innovative structural kernels and neural networks for advanced syntactic/semantic processing and inference over text, documented by about 300 scientific articles. He has received four IBM Faculty Awards, one Google Faculty Award, and five best paper awards. He has led about 25 projects, e.g., MIT CSAIL and QCRI joint projects, and European projects. He was the General Chair of EMNLP 2014, a PC co-chair of CoNLL 2015, and has had a chair role in more than 50 conferences and workshops. He has been an action editor of TACL, and currently he is an action/associate editor of ACM Computing Survey and JAIR, and in the editorial board of MLJ and JNLE.

Title: Do pretraining language models really understand language?

Speaker: Minlie Huang

Abstract: Today pretraining language models have been dominant in various natural language understanding and generation tasks. In this talk, the speaker will try to answer the question: do pretraining language models really understand language? First of all, what is meaning, understanding and knowledge will be discussed, and then what has and has not been learned by existing pretraining models. Last, the speaker will discuss how NLU or NLG tasks will be done better with knowledge. Some solutions such as knowledge injection, domain-specific pretraining tasks, and explicit control of knowledge use will be discussed.

Bio: Dr. Minlie Huang is an associate professor at Tsinghua University. His research interests include natural language processing, particularly on dialog systems and language generation. He authored a Chinese book “Modern Natural Language Generation” and has published more than 80+ papers in premier conferences. He won Wuwenjun AI award in 2019, Alibaba Innovative Research Award in 2019, and Hanvon Youngth Innovation Award in 2018. He won SIGDIAL 2020 best paper, NLPCC 2020 best student paper, IJCAI-ECAI 2018 distinguished paper, nominee of ACL 2019 best demo paper. He served as ACL 2021 diversity&inclusion cochair, EMNLP 2021 workshop cochair, area chairs for ACL 2020/2016, EMNLP 2020/2019/2014/2011, AACL 2020, and Senior PC of IJCAI 2020-2017/IJCAI 2018(Distinguished SPC), AAAI 2021-2017, and associate editor for TNNLS, action editor for TACL. He was supported by several NSFC projects and one key NSFC project.
His homepage is at: http://coai.cs.tsinghua.edu.cn/hml/.

Panel Discussions

The following speakers have accepted to serve as panelists for the panel discussion at SSNLP 2020. You can view the detailed information by clicking the images.

Speaker: Monojit Choudhury

Bio: Dr. Monojit Choudhury is a Principal Researcher in Microsoft Research Lab India since 2007. His research spans many areas of Artificial Intelligence, cognitive science and linguistics. In particular, Dr. Choudhury has been working on technologies for low resource languages, code-switching (mixing of multiple languages in a single conversation), computational sociolinguistics and conversational AI. Dr. Choudhury is an adjunct faculty at International Institute of Technology Hyderabad and Ashoka University. He also organizes the Panini Linguistics Olympiad for high school children in India and is the founding co-chair of the Asia-Pacific Linguistics Olympiad. Dr. Choudhury holds a B.Tech and PhD degree in Computer Science and Engineering from Indian Institute of Technology, Kharagpur.

Speaker: Lidong Bing

Bio: Lidong Bing is leading the NLP team at R&D Center Singapore, Machine Intelligence Technology, Alibaba DAMO Academy. The team is working on a variety of NLP research and development projects that are tightly aligned with the globalization of Alibaba in Southeast Asia region. Prior to joining Alibaba, he was a Senior Researcher at Tencent AI Lab. He received a PhD degree from The Chinese University of Hong Kong and was a Postdoc Research Fellow in the Machine Learning Department at Carnegie Mellon University. His research interests include Low-resource NLP, Sentiment Analysis, Text Generation/Summarization, Information Extraction, Knowledge Base, etc.

Speaker: Bill Jun Lang

Bio: Dr Jun Lang is a Senior Expert for Search R&D at Taobao of Alibaba. He obtained the Ph.D. degree from Harbin Institute of Technology (HIT) in Jan. 2010. From Feb. 2010 to Feb. 2014, he was a Research Scientist at Human Language Technology Department (HLT) of Institute for Infocomm Research (I2R), Singapore, on Statistical Machine Translation R&D. His major research interests include Natural Language Processing, Information Extraction, Machine Translation, and Machine Learning. Currently, he is leading E-commerce Knowledge Graph Group of Taobao at Alibaba.com.

Speaker: Dat Quoc Nguyen

Bio: Dat Quoc Nguyen is a senior research scientist at VinAI Research, Vietnam. He is also an honorary fellow in the School of Computing and Information Systems at the University of Melbourne, Australia, where previously he was a research fellow. Before that, he received his PhD from the Department of Computing at Macquarie University, Australia. Dat Quoc Nguyen has been working on applications of machine learning to natural language processing. He has served as a PC member for top-tier NLP/AI conferences and authored over 30 highly-cited scientific papers.

Speaker: Attapol Rutherford

Bio: Attapol Te Rutherford is an Assistant Professor of Linguistics at Chulalongkorn University, Bangkok. Previously, he received his PhD in Computer Science from Brandeis University, USA and was a data scientist at LinkedIn. He is interested in NLP infrastructures for the Thai Language and NLP applications in computational legal studies and educational applications.

Organizers

PC Chair:

Gangeshwar Krishnamurthy, Institute of High Performance Computing

Wenqiang Lei, National University of Singapore

General Chair: Jing Jiang, Singapore Management University
Co-organizers:

Min-Yen Kan, National University of Singapore

Kokil Jaidka, Nanyang Technological University

Soujanya Poria, Singapore University of Technology

Ai Ti Aw, Institute for Infocomm Research

Francis Bond, Nanyang Technological University

Nancy Chen, Institute for Infocomm Research

Shafiq Joty, Nanyang Technological University

Hai zhou Li, National University of Singapore

Wei Lu, Singapore University of Technology and Design

Hwee Tou Ng, National University of Singapore

Jian Su, Institute for Infocomm Research

Gao Wei, Singapore Management University

Luu Anh Tuan, Massachusetts Institute of Technology

Partners


Registration

Registrations are open for the SSNLP 2020. Click here

Location

SSNLP 2020 will be going virtual! Register now to receive the link to the event.

Contact us

If you got any enquiries, please drop an email to Gangeshwar Krishnamurthy or Wenqiang Lei.