AimeSpeech : Speech recognition (STT), speech synthesis (TTS), Speaker identification

AimeSpeech is a core speech processing framework inside the Aimenicorn software ecosystem. AimeSpeech includes a Speech Recognition engine (Speech to Text, STT engine), a Speech Synthesis engine (Text to Speech, TTS engine), Speaker identification library and other advanced speech processing libraries. AimeSpeech supports English, Japanese and Vietnamese. It can be run both on-premise and on-cloud. AimeSpeech is applied in various Multimodal AI products of Aimesoft, such as AimeHotel, AimeReception, Aime AIShop, ...




  • content
    Speech recognition (Speech-to-Text, STT)
  • content
    Emotion analysis
  • content
    Speech Synthesis (Text-to-Speech, TTS)
  • content
    Speaker identification
  • content
    Sentiment Analysis from Speech features
  • content
    Supports English, Japanese, Vietnamese


Aimesoft’s technologies to realize AimeSpeech

  • product
    Speech Recognition Technologies
    DNN (Deep Neural Networks), LSTM, GMM, HMM, Acoustic modelling, Language model
  • product
    Natural Language Processing technologies
    Tokenization, POS tagging, keyword extraction, word normalization
  • product
    Speech Synthesis Technologies
    HMM, Festival, Kaldi, Tacotron2, DNN, Vocoder
  • product
    Speaker Identification Technologies
    GMM, VGG network


Various applications of AimeSpeech