Aimesoft to present the research “Automatic Transcript Generation from Presentation Slides” at PACLIC 2023

                                                                                                                                                                                                                                         Date: Dec 12, 2023
The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 2023) will be in Hong Kong, from December 2-5, 2023, at The Hong Kong Polytechnic University. This will be a hybrid (in-person with virtual presentation accommodations) event. PACLIC 37 emphasizes the synergy of theoretical frameworks and processing of natural language, providing a forum for researchers from different fields to share and discuss progress in scientific studies, development and application of the topics related to the study of languages.






Dr. Pham Quang Nhat Minh - Head of the Aimesoft Research Department for Multimodal Artificial Intelligence - reported on behalf of the research team on the study "Automatic Transcript Generation from Presentation Slides" (Automatically generate speaker notes from slide content) and the demonstration of the AimeTalk Virtual Presenter software developed by Aimesoft.


AimeTalk is a Multimodal AI-based virtual slide presentation software (or "virtual presenter"). It automates your slide presentation tasks. This virtual presenter software can perform slide presentations for you in conferences, meetings, or seminars. It can save labor costs and reduce errors in slide shows. It's interesting that you can create any avatar you like, your virtual face in online meetings. AimeTalk brings Artificial Intelligence (AI) and Robotic Process Automation (RPA) technologies to slide presentations. AimeTalk can create face animation from a single avatar face photo. This virtual avatar can read through the speaker notes and transitions through slides, along with playing videos, automating the slide presentation process. Speaker notes are read out by Text-to-Speech technology, whereas face animation is synthesized by advanced image processing and computer vision technologies. Furthermore, slide pages are automatically turned by RPA technology.
However, users must manually enter the caption text of each slide page in order to generate speaker notes (notes) before they can show slides using AimeTalk. This can occasionally be very difficult and time-consuming, requiring lengthy presentations that can contain up to a dozen pages of duplicate content. Aimesoft’s research team has suggested a way to automatically create these speaker notes from the content of slides based on the large language model (LLM), saving time on such manual processes and increasing automation for AimeTalk. The team's study is the first on speaker note generation from slide content in the field of natural language processing. Research has opened up the possibility of completely automating slide presentations. The language generation model will automatically synthesize and recognize to generate full sentences based on the context of the current slide page even if there is only brief content or titles/headings on the page.
Aimesoft's research team has proposed a method for automatically generating these speaker notes from the content of the slides based on Large Language Models (LLM). In the field of Natural Language Processing, the team's research is the first on the topic of generating speaker notes from slide content. The research has opened up the possibility of fully automating slide presentations.