Open Data Science Siberia (ODS): Meet-Up for Machine Learning and Data Analysis Specialists

NSU hosted the ODS-meet-up, “Natural Language Understanding (NLU) in Russian: ELMO vs. BERT”. The lecturer was Ivan Bondarenko, Assistant in the Computing Systems Section at the Department of Mechanics and Mathematics, teacher of the “Neural Networks for Natural Languages Processing” course in the English-language Big Data Analytics and Artificial Intelligence MA Program. He shared with the audience his experiences applying the latest developments in the field of computational linguistics.

During the lecture Bondarenko explained,

Processing of natural languages ​​has reached a new level. The use of new models takes into account the meanings of words, context, and homonyms. This greatly simplifies the task of compiling a dictionary, which is especially important for languages ​​with a large number of word forms (cases, diminutive forms, etc.) including inflected Slavic languages. Using the “transfer learning” approach is considered a revolution in computational linguistics.

One of the applications of machine learning for processing natural languages ​​is the design of chat bots that automate the technical support process for complex technological equipment users. The algorithm must correctly understand the user and issue a response that matches the request. To do this, it is necessary to train a model on a large volume of texts processed by a domain specialist. If this subject area is very specific (for example, medicine or the oil and gas industry), then pre-processing of texts requires qualified specialists and is a costly and time-consuming task.

The lecturer paid a lot of attention to a relatively new machine learning approach. This approach emerged with the development of deep neural networks. It is based on the idea that a neural network is trained to solve one task, for which there is a very large training set. It is then used to solve another related problem in the same area, but for which there is only a very small training set available. This allows you to save the time specialists spend working on the task and it uses a significantly smaller amount of input data for training the model. The approach has proven itself in the fields of image analysis and computer vision. In 2017–2018, it was more widely applied in computational linguistics.

A video of the speech can be seen on the NSU Department of Mechanics and Mathematics Stream Data Analytics and Machine Learning channel.