Innovative Technologies for Creating Multilingual Audio content in the Publishing Industry
Main Article Content
Abstract
Downloads
| Abstract views: 438 | PDF Downloads: 276 |
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Abualigah, L., Bashabsheh, M. Q., Alabool, H., & Shehab, M. (2020). Text Summarization: A Brief Review. In Studies in Computational Intelligence (pp. 1–15). Springer International Publishing.
Akhtar, Z. (2023). Deepfakes generation and detection: A short survey. Journal of Imaging, 9(1), 18. https://doi.org/10.3390/jimaging9010018
AL-Bakhrani, A. A., Amran, G. A., Al-Hejri, A. M., Chavan, S. R., Manza, R., & Nimbhore, S. (2023). Development of multilingual speech recognition and translation technologies for communication and interaction. In Advances in Intelligent Systems Research (pp. 711–723). Atlantis Press International BV. https://doi.org/10.2991/978-94-6463-196-8_54
Almutairi, Z., & Elgibreen, H. (2022). A review of modern audio Deepfake detection methods: Challenges and future directions. Algorithms, 15(5), 155. https://doi.org/10.3390/a15050155
Aparna, M., Srivatsa, S., Sai Madhavan, G., Dinesh, T. B., & Srinivasa, S. (2024). AI-Based Assistance for Management of Oral Community Knowledge in Low-Resource and Colloquial Kannada Language. In Big Data Analytics in Astronomy, Science, and Engineering (pp. 3–16). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-58502-9_1
Baevski, A., Schneider, S., & Auli, M. (2019). Vq-wav2vec: Self-supervised learning of discrete speech representations. In arXiv [cs.CL]. https://doi.org/10.48550/ARXIV.1910.05453
Bahja, M. (2020). Natural Language Processing Applications in Business. In E-Business. IntechOpen. https://doi.org/10.5772/intechopen.92203
Ballesteros, D. M., Rodriguez, Y., & Renza, D. (2020). A dataset of histograms of original and fake voice recordings (H-Voice). Data in Brief, 29(105331), 105331. https://doi.org/10.1016/j.dib.2020.105331
Ballesteros, D. M., Rodriguez-Ortega, Y., Renza, D., & Arce, G. (2021). Deep4SNet: deep learning for fake speech classification. Expert Systems with Applications, 184(115465), 115465. https://doi.org/10.1016/j.eswa.2021.115465
Beseghi, M. (2023). Subtitling for the deaf and hard of hearing, audio description and audio subtitling in multilingual TV shows. Languages, 8(2), 109. https://doi.org/10.3390/languages8020109
Bigioi, D., & Corcoran, P. (2023). Multilingual video dubbing—a technology review and current challenges. Frontiers in Signal Processing, 3. https://doi.org/10.3389/frsip.2023.1230755
Borsos, Z., Marinier, R., Vincent, D., Kharitonov, E., Pietquin, O., Sharifi, M., Roblek, D., Teboul, O., Grangier, D., Tagliasacchi, M., & Zeghidour, N. (2023). AudioLM: A language modeling approach to audio generation. ACM Transactions on Audio, Speech, and Language Processing, 31, 2523–2533. https://doi.org/10.1109/taslp.2023.3288409
Bugliarello, E., & Okazaki, N. (2020). Enhancing machine translation with dependency-aware self-attention. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.147
Deepak, G., Surya, D., Trivedi, I., Kumar, A., Lingampalli, A., & Vijayan, S. (2022). An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Computers & Electrical Engineering: An International Journal, 98(107736), 107736. https://doi.org/10.1016/j.compeleceng.2022.107736
Deshmukh, S., Elizalde, B., Singh, R., & Wang, H. (2023). Pengi: An Audio Language Model for audio tasks. In arXiv [eess.AS]. https://proceedings.neurips.cc/paper_files/paper/2023/file/3a2e5889b4bbef997ddb13b55d5acf77-Paper-Conference.pdf
Dixit, A., Kaur, N., & Kingra, S. (2023). Review of audio deepfake detection techniques: Issues and prospects. Expert Systems, 40(8). https://doi.org/10.1111/exsy.13322
Dobre, R. A., Preda, R. O., Badea, R. A., Stanciu, M., & Brumaru, A. (2020). Blockchain-Based Image Copyright Protection System using JPEG Resistant Digital Signature. In 2020 IEEE 26th International Symposium for Design and Technology in Electronic Packaging (SIITME). IEEE. https://doi.org/10.1109/siitme50350.2020.9292296
Elislah, N., & Irwansyah, I. (2022). Audiobook industry: Reading by using ear in the digital age. Jurnal Komunikasi Indonesia, 11(2), Article 2. https://doi.org/10.7454/jkmi.v11i2.1028
Giovannotti, P. (2023). Evaluating machine translation quality with conformal predictive distributions. In arXiv [cs.CL]. https://doi.org/10.48550/ARXIV.2306.01549
Have, I., & Pedersen, B. S. (2021). Reading Audiobooks. In Beyond Media Borders, Volume 1 (pp. 197–216). Springer International Publishing. https://doi.org/10.1007/978-3-030-49679-1_6
Huang, W.-C., Hayashi, T., Watanabe, S., & Toda, T. (2020). The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading ASR and TTS. In arXiv [eess.AS]. https://doi.org/10.48550/ARXIV.2010.02434
Iturregui-Gallardo, G. (2020). Rendering multilingualism through audio subtitles: shaping a categorisation for aural strategies. International Journal of Multilingualism, 17(4), 485–498. https://doi.org/10.1080/14790718.2018.1523173
Jafari, Z. (2023). The Role of AI in Supporting Indigenous Languages. AI and Tech in Behavioral and Social Sciences, 1(2), 4–11.
Jani, M. M., Panchal, S. R., Patel, H. H., & Raiyani, A. (2024). Multilingual speech recognition: An in-depth review of applications, challenges, and future directions. In Communication and Intelligent Systems (pp. 1–13). Springer Nature Singapore.
Karanasios, S., Nardi, B., Spinuzzi, C., & Malaurent, J. (2021). Moving forward with activity theory in a digital world. Mind Culture and Activity, 28(3), 234–253. https://doi.org/10.1080/10749039.2021.1914662
Kotsakis, R., Matsiola, M., Kalliris, G., & Dimoulas, C. (2020). Investigation of spoken-language detection and classification in broadcasted audio content. Information (Basel), 11(4), 211. https://doi.org/10.3390/info11040211
Kritikos, Y., Giariskanis, F., Protopapadaki, E., Papanastasiou, A., Papadopoulou, E., & Mania, K. (2023). Audio augmented reality outdoors. Proceedings of the 2023 ACM International Conference on Interactive Media Experiences, 199–204. https://doi.org/10.1145/3573381.3597028
Kumar, Y., Koul, A., & Singh, C. (2022). A deep learning approaches in text-to-speech system: a systematic review and recent research perspective. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-13943-4
Lakhotia, K., Kharitonov, E., Hsu, W.-N., Adi, Y., Polyak, A., Bolte, B., Nguyen, T.-A., Copet, J., Baevski, A., Mohamed, A., & Dupoux, E. (2021). On generative spoken language modeling from raw audio. Transactions of the Association for Computational Linguistics, 9, 1336–1354. https://aclanthology.org/2021.tacl-1.79.pdf
Latif, S., Shoukat, M., Shamshad, F., Usama, M., Ren, Y., Cuayáhuitl, H., Wang, W., Zhang, X., Togneri, R., Cambria, E., & Schuller, B. W. (2023). Sparks of Large Audio Models: A survey and outlook. In arXiv [cs.SD]. https://doi.org/10.48550/ARXIV.2308.12792
Lee, S.-M. (2023). The effectiveness of machine translation in foreign language education: a systematic review and meta-analysis. Computer Assisted Language Learning, 36(1–2), 103–125. https://doi.org/10.1080/09588221.2021.1901745
Liu, X., Zhu, Z., Liu, H., Yuan, Y., Cui, M., Huang, Q., Liang, J., Cao, Y., Kong, Q., Plumbley, M. D., & Wang, W. (2023). WavJourney: Compositional audio creation with Large Language Models. In arXiv [cs.SD]. http://arxiv.org/abs/2307.14335
Liu, Y., Zhang, J., Xiong, H., Zhou, L., He, Z., Wu, H., Wang, H., & Zong, C. (2020). Synchronous speech recognition and speech-to-text translation with interactive decoding. Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 34(05), 8417–8424. https://doi.org/10.1609/aaai.v34i05.6360
Llanes-Ortiz, G. (2023). Digital initiatives for indigenous languages: UNESCO Publishing. https://unesdoc.unesco.org/ark:/48223/pf0000387186
Lopez-de-Ipina, K., Barroso, N., Calvo, P. M., Hernandez, C., Ezeiza, A., Susperregi, U., & Fernández, E. (2020). Multilingual audio information management system based on semantic knowledge in complex environments. Neural Computing & Applications, 32(24), 17869–17886. https://doi.org/10.1007/s00521-019-04618-7
Mahum, R., Irtaza, A., & Javed, A. (2023). Text to speech synthesis using deep learning. In Intelligent Multimedia Signal Processing for Smart Ecosystems (pp. 289–305). Springer International Publishing. https://doi.org/10.1007/978-3-031-34873-0_12
Mao, L., Zhang, X., Ma, J., & Jia, Y. (2023). A comparative study on the audio-visual evaluation of the grand Song of the Dong soundscape. Heritage Science, 11(1). https://doi.org/10.1186/s40494-023-00876-w
Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence, 53(4), 3974–4026. https://doi.org/10.1007/s10489-022-03766-z
Morita, T., & Koda, H. (2020). Exploring TTS without T using biologically/psychologically motivated neural network modules (ZeroSpeech 2020). In Proceedings of Interspeech 2020 (pp. 4856–4860). https://doi.org/10.21437/Interspeech.2020-3127
Ni, J., Wang, L., Gao, H., Qian, K., Zhang, Y., Chang, S., & Hasegawa-Johnson, M. (2022). Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition. https://doi.org/10.13140/RG.2.2.19818.18884
Pandita, K., Thakur, P. K. S., & Annamalai, S. (2023). Contextual transcription and Summarization of audio using AI. Proceedings of the 5th International Conference on Information Management & Machine Intelligence. https://doi.org/10.1145/3647444.3647871
Patkar, U. C., Patil, S. H., & Peddi, P. (2020). Machine Translation of English to Ahirani Language: A Review.
Pluszyńska, A. (2020). Copyright management by contemporary art exhibition institutions in Poland: Case study of the Zachęta National Gallery of Art. Paper presented at the Sustainability. https://doi.org/10.3390/su12114498
Polyak, A., Wolf, L., Adi, Y., Kabeli, O., & Taigman, Y. (2021). High fidelity speech regeneration with application to speech enhancement. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP39728.2021.9414853
Raut, N. B., Pranesh, A. S., Nagulan, B., Pranesh, S., & Vasantharajan, R. (2023). An extensive survey on audio-to-text and text summarization for video content. In 2023 3rd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) (pp. 1251–1257). IEEE. https://doi.org/10.1109/ICIMIA60377.2023.10426376
Rusmanayanti, A. (2021). The Use of Audiobooks as Part of Digital Literacies in Indonesian Students’ Perception. Paper presented at the 2nd International Conference on Education, Language, Literature, and Arts (ICELLA 2021). https://www.atlantis-press.com/article/125961964.pdf
Saini, M., Arora, V., Singh, M., Singh, J., & Adebayo, S. O. (2023). Artificial intelligence inspired multilanguage framework for note-taking and qualitative content-based analysis of lectures. Paper presented at the Education and Information Technologies. https://link.springer.com/article/10.1007/s10639-022-11229-8
Smith, M. K. (2016). Issues in cultural tourism studies (3rd ed.). Routledge Is.
Son, J.-B., Ružić, N. K., & Philpott, A. (2023). Artificial intelligence technologies and applications for language learning and teaching. Journal of China Computer-Assisted Language Learning. https://doi.org/10.1515/jccall-2023-0015
Song, H.-K., Woo, S. H., Lee, J., Yang, S., Cho, H., Lee, Y., Choi, D., & Kim, K.-W. (2022). Talking face generation with multilingual TTS. In arXiv [cs.CV]. https://doi.org/10.48550/ARXIV.2205.06421
Spiteri Miggiani, G. (2021). English-language dubbing: challenges and quality standards of an emerging localisation trend. https://www.um.edu.mt/library/oar/handle/123456789/97095
Stadlmann, C., & Zehetner, A. (2021). Human Intelligence Versus Artificial Intelligence: A Comparison of Traditional and AI-Based Methods for Prospect Generation Marketing and Smart Technologies. In Proceedings of ICMarkTech 2020 (pp. 11–22). Springer. https://doi.org/10.1007/978-981-33-4183-8_2
Stahlberg, F. (2020). Neural Machine Translation: A Review. The Journal of Artificial Intelligence Research, 69, 343–418. https://doi.org/10.1613/jair.1.12007
Tan, X. (2023). Neural text-to-speech synthesis: Springer.
Tan, X., Qin, T., Soong, F., & Liu, T.-Y. (2021). A Survey on Neural Speech Synthesis. In arXiv [eess.AS]. https://doi.org/10.48550/ARXIV.2106.15561
Tan, Z., Wang, S., Yang, Z., Chen, G., Huang, X., Sun, M., & Liu, Y. (2020). Neural machine translation: A review of methods, resources, and tools. AI Open, 1, 5–21. https://doi.org/10.1016/j.aiopen.2020.11.001
Valizada, A., Jafarova, S., Sultanov, E., & Rustamov, S. (2021). Development and Evaluation of Speech Synthesis System Based on Deep Learning Models. Symmetry, 13(5), 819. https://doi.org/10.3390/sym13050819
Vayadande, K., Nemade, M., Parbhanikar, S., Rathod, S., Raut, A., & Thorat, R. (2023). Efficient Content Exploration on YouTube: Automatic Speech Recognition-Based Video Summarization. In 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE. https://doi.org/10.1109/iceca58529.2023.10395257
Yang, D., Tian, J., Tan, X., Huang, R., Liu, S., Chang, X., Shi, J., Zhao, S., Bian, J., Zhao, Z., Wu, X., & Meng, H. (2023). UniAudio: An audio foundation model toward universal audio generation. In arXiv [cs.SD]. http://arxiv.org/abs/2310.00704