مدل‌های پویای گفتار: نظریه، الگوریتم‌ها و کاربردها ۲۰۲۲
Dynamic Speech Models: Theory, Algorithms, and Applications 2022

دانلود کتاب مدل‌های پویای گفتار: نظریه، الگوریتم‌ها و کاربردها ۲۰۲۲ (Dynamic Speech Models: Theory, Algorithms, and Applications 2022) با لینک مستقیم و فرمت pdf (پی دی اف)

نویسنده	Li Deng

ناشر: Springer Nature

دسته: الکترونیک, سیگنال‌ها و پردازش سیگنال, مهندسی آکوستیک (صوت‌شناسی), مهندسی برق و مخابرات, مهندسی و فناوری

۳۰ هزار تومان تخفیف با کد «OFF30» برای اولین خرید

سال انتشار	2022
زبان	English
تعداد صفحه‌ها	105
نوع فایل	pdf
حجم	4.0MB

🏷️ قیمت اصلی: 200,000 تومان بود.قیمت فعلی: 129,000 تومان.

دانلود مستقیم PDF

ارسال فایل به ایمیل

پشتیبانی ۲۴ ساعته

هوش‌مصنوعی ترجمه کالیبو

توضیحات

معرفی کتاب مدل‌های پویای گفتار: نظریه، الگوریتم‌ها و کاربردها ۲۰۲۲

پویایی‌شناسی گفتار به ویژگی‌های زمانی در تمام مراحل فرآیند ارتباط گفتاری انسان اشاره دارد. این “زنجیره” گفتار با شکل‌گیری یک پیام زبانی در مغز گوینده آغاز می‌شود و با رسیدن پیام به مغز شنونده پایان می‌یابد. با توجه به پیچیدگی فرآیند پویای گفتار و اهمیت اساسی آن در ارتباط انسانی، این تک‌نگاشت در نظر دارد تا مطالب جامعی در مورد مدل‌های ریاضی پویایی‌شناسی گفتار ارائه دهد و به مسائل زیر بپردازد: چگونه فرآیند پیچیده گفتار را از نظر نقش عملکردی آن در ارتباط گفتاری درک کنیم؟ چگونه نقش ویژه زمان‌بندی گفتار را کمی‌سازی کنیم؟ چگونه پویایی‌شناسی به تغییرپذیری گفتار مربوط می‌شود که اغلب گفته می‌شود مانع جدی تشخیص گفتار خودکار است؟ چگونه فرآیند پویای گفتار را به صورت کمی درآوریم تا امکان تجزیه و تحلیل دقیق فراهم شود؟ و در نهایت، چگونه می‌توان دانش پویایی‌شناسی گفتار را در الگوریتم‌های رایانه‌ای تجزیه و تحلیل و تشخیص گفتار گنجاند؟ پاسخ به تمام این سوالات نیازمند ساخت و به‌کارگیری مدل‌های محاسباتی برای فرآیند پویای گفتار است. دلایل قانع‌کننده برای انجام مدل‌سازی پویای گفتار چیست؟ ما پاسخ را در دو جنبه مرتبط ارائه می‌دهیم. اول اینکه، تحقیق علمی در مورد کد گفتاری انسان به طور بی‌وقفه برای چندین دهه دنبال شده است. گفتار به عنوان یک حامل اساسی هوش و دانش انسانی، طبیعی‌ترین شکل ارتباط انسانی است. پیام‌های زبانی (و همچنین فرازبانی) در کد گفتاری جاسازی شده‌اند که از طریق چهار سطح زنجیره گفتار منتقل می‌شوند. زیربنای رمزگذاری و انتقال قوی پیام‌های زبانی، پویایی‌شناسی گفتار در هر چهار سطح است. مدل‌سازی ریاضی پویایی‌شناسی گفتار ابزاری موثر در روش‌های علمی مطالعه زنجیره گفتار فراهم می‌کند. چنین مطالعات علمی به درک این موضوع کمک می‌کند که چرا انسان‌ها به این شکل صحبت می‌کنند و چگونه انسان‌ها از طریق فرآیندهای پویای چند لایه، از افزونگی و تغییرپذیری برای افزایش کارایی و اثربخشی ارتباط گفتاری انسان بهره می‌برند. دوم، پیشرفت فناوری زبان انسانی، به ویژه در تشخیص خودکار گفتار طبیعی انسان، نیز انتظار می‌رود از مدل‌سازی محاسباتی جامع پویایی‌شناسی گفتار بهره‌مند شود. محدودیت‌های فناوری تشخیص گفتار فعلی جدی و شناخته شده است. یک ضعف رایج و مکرراً مورد بحث در مدل آماری زیربنای فناوری تشخیص گفتار فعلی، عدم وجود طرح‌های مدل‌سازی پویای کافی برای ارائه ساختار همبستگی در سراسر دنباله مشاهدات زمانی گفتار است. متأسفانه، به دلایل مختلف، اکثر فعالیت‌های تحقیقاتی فعلی در این زمینه تنها به اصلاحات و بهبودهای تدریجی در وضعیت موجود مبتنی بر HMM گرایش دارند. به عنوان مثال، در حالی که مدل‌سازی پویا و همبستگی به عنوان یک موضوع مهم شناخته می‌شود، با این وجود، بیشتر سیستم‌ها تنها از یک شکل فوق‌العاده ضعیف از پویایی‌شناسی گفتار استفاده می‌کنند؛ به عنوان مثال، پارامترهای دیفرانسیل یا دلتا. مدل‌سازی پویای گفتار قوی، که تمرکز این تک‌نگاشت است، ممکن است به عنوان یک راه حل نهایی برای این مشکل عمل کند. پس از فصل مقدمه، بدنه اصلی این تک‌نگاشت از چهار فصل تشکیل شده است. آن‌ها جنبه‌های مختلف نظریه، الگوریتم‌ها و کاربردهای مدل‌های پویای گفتار را پوشش می‌دهند و بررسی جامعی از کارهای تحقیقاتی در این زمینه در طول 20 سال گذشته ارائه می‌دهند. این تک‌نگاشت به عنوان مطالب پیشرفته پردازش گفتار و سیگنال برای تدریس در مقطع تحصیلات تکمیلی، برای متخصصان و مهندسان شاغل، و همچنین برای محققان و مهندسان مجرب متخصص در پردازش گفتار در نظر گرفته شده است.

فهرست کتاب:

۱. جلد

۲. صفحه حقوق مولف

۳. صفحه عنوان

۴. فهرست مطالب

۵. تقدیر و تشکر

۶. مقدمه

۷. یک چارچوب کلی مدلسازی و محاسباتی

۸. مدلسازی: از آکوستیک داینامیکس تا هیدن داینامیکس

۹. مدل‌ها با دینامیک‌های گفتار پنهان با مقادیر گسسته

۱۰. مدل‌ها با مسیرهای گفتار پنهان با مقادیر پیوسته

۱۱. کتاب‌شناسی

۱۲. درباره نویسنده

توضیحات(انگلیسی)

Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech “chain” starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into computerized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process. What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing