Audio signal processing: Difference between revisions

Browse history interactively

VisualWikitext

@@ Line 8: / Line 8: @@
 The motivation for audio signal processing began at the beginning of the 20th century with inventions like the [[telephone]], [[phonograph]], and [[radio]] that allowed for the transmission and storage of audio signals. Audio processing was necessary for early [[radio broadcasting]], as there were many problems with [[studio-to-transmitter link]]s.<ref>{{cite book|last=Atti|first=Andreas Spanias, Ted Painter, Venkatraman|title=Audio signal processing and coding|year=2006|publisher=John Wiley & Sons|location=Hoboken, NJ|isbn=0-471-79147-4|pages=464|url=https://books.google.com/books?id=Z_z-OQbadPIC|edition=[Online-Ausg.]}}</ref> The theory of signal processing and its application to audio was largely developed at [[Bell Labs]] in the mid 20th century. [[Claude Shannon]] and [[Harry Nyquist]]'s early work on [[communication theory]], [[Nyquist–Shannon sampling theorem|sampling theory]] and [[pulse-code modulation]] (PCM) laid the foundations for the field. In 1957, [[Max Mathews]] became the first person to [[Synthesizer|synthesize audio]] from a [[computer]], giving birth to [[computer music]].
-Major developments in [[Digital audio|digital]] [[audio coding]] and [[audio data compression]] include [[differential pulse-code modulation]] (DPCM) by [[C. Chapin Cutler]] at Bell Labs in 1950,<ref name="DPCM">{{US patent reference|inventor=C. Chapin Cutler|title=Differential Quantization of Communication Signals|number=2605361|A-Datum=1950-06-29|issue-date=1952-07-29}}</ref> [[linear predictive coding]] (LPC) by [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966,<ref>{{cite journal |last1=Gray |first1=Robert M. |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal=Found. Trends Signal Process. |date=2010 |volume=3 |issue=4 |pages=203–303 |doi=10.1561/2000000036 |url=https://ee.stanford.edu/~gray/lpcip.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |archive-date=2022-10-09 |url-status=live |issn=1932-8346|doi-access=free }}</ref> [[adaptive DPCM]] (ADPCM) by P. Cummiskey, [[Nikil Jayant|Nikil S. Jayant]] and [[James L. Flanagan]] at Bell Labs in 1973,<ref>P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", ''Bell Syst. Tech. J.'', vol. 52, pp. 1105—1118, Sept. 1973</ref><ref>{{cite journal |last1=Cummiskey |first1=P. |last2=Jayant |first2=Nikil S. |last3=Flanagan |first3=J. L. |title=Adaptive quantization in differential PCM coding of speech |journal=The Bell System Technical Journal |date=1973 |volume=52 |issue=7 |pages=1105–1118 |doi=10.1002/j.1538-7305.1973.tb02007.x |issn=0005-8580}}</ref> [[discrete cosine transform]] (DCT) coding by [[Nasir Ahmed (engineer)|Nasir Ahmed]], T. Natarajan and [[K. R. Rao]] in 1974,<ref name="DCT">{{cite journal |author1=Nasir Ahmed |author1-link=N. Ahmed |author2=T. Natarajan |author3=Kamisetty Ramamohan Rao |journal=IEEE Transactions on Computers|title=Discrete Cosine Transform|volume=C-23|issue=1|pages=90–93|date=January 1974 |doi=10.1109/T-C.1974.223784 |s2cid=149806273 |url=https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf |archive-date=2022-10-09 |url-status=live}}</ref> and [[modified discrete cosine transform]] (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the [[University of Surrey]] in 1987.<ref>J. P. Princen, A. W. Johnson und A. B. Bradley: ''Subband/transform coding using filter bank designs based on time domain aliasing cancellation'', IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.</ref> LPC is the basis for [[perceptual coding]] and is widely used in [[speech coding]],<ref name="Schroeder2014">{{cite book |last1=Schroeder |first1=Manfred R. |title=Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date=2014 |publisher=Springer |isbn=9783319056609 |chapter=Bell Laboratories |page=388 |chapter-url=https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> while MDCT coding is widely used in modern [[audio coding formats]] such as [[MP3]]<ref name="Guckert">{{cite web |last1=Guckert |first1=John |title=The Use of FFT and MDCT in MP3 Audio Compression |url=http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-date=2022-10-09 |url-status=live |website=[[University of Utah]] |date=Spring 2012 |access-date=14 July 2019}}</ref> and [[Advanced Audio Coding]] (AAC).<ref name=brandenburg>{{cite web|url=http://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|title=MP3 and AAC Explained|last=Brandenburg|first=Karlheinz|year=1999|url-status=live|archive-url=https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|archive-date=2017-02-13}}</ref>
+Major developments in [[Digital audio|digital]] [[audio coding]] and [[audio data compression]] include [[differential pulse-code modulation]] (DPCM) by [[C. Chapin Cutler]] at Bell Labs in 1950,<ref name="DPCM">{{US patent reference|inventor=C. Chapin Cutler|title=Differential Quantization of Communication Signals|number=2605361|A-Datum=1950-06-29|issue-date=1952-07-29}}</ref> [[linear predictive coding]] (LPC) by [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966,<ref>{{cite journal |last1=Gray |first1=Robert M. |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal=Found. Trends Signal Process. |date=2010 |volume=3 |issue=4 |pages=203–303 |doi=10.1561/2000000036 |url=https://ee.stanford.edu/~gray/lpcip.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |archive-date=2022-10-09 |url-status=live |issn=1932-8346|doi-access=free }}</ref> [[adaptive DPCM]] (ADPCM) by P. Cummiskey, [[Nikil Jayant|Nikil S. Jayant]] and [[James L. Flanagan]] at Bell Labs in 1973,<ref>P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", ''Bell Syst. Tech. J.'', vol. 52, pp. 1105—1118, Sept. 1973</ref><ref>{{cite journal |last1=Cummiskey |first1=P. |last2=Jayant |first2=Nikil S. |last3=Flanagan |first3=J. L. |title=Adaptive quantization in differential PCM coding of speech |journal=The Bell System Technical Journal |date=1973 |volume=52 |issue=7 |pages=1105–1118 |doi=10.1002/j.1538-7305.1973.tb02007.x |bibcode=1973BSTJ...52.1105C |issn=0005-8580}}</ref> [[discrete cosine transform]] (DCT) coding by [[Nasir Ahmed (engineer)|Nasir Ahmed]], T. Natarajan and [[K. R. Rao]] in 1974,<ref name="DCT">{{cite journal |author1=Nasir Ahmed |author1-link=N. Ahmed |author2=T. Natarajan |author3=Kamisetty Ramamohan Rao |journal=IEEE Transactions on Computers|title=Discrete Cosine Transform|volume=C-23|issue=1|pages=90–93|date=January 1974 |doi=10.1109/T-C.1974.223784 |bibcode=1974ITCmp.100...90A |s2cid=149806273 |url=https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf |archive-date=2022-10-09 |url-status=live}}</ref> and [[modified discrete cosine transform]] (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the [[University of Surrey]] in 1987.<ref>J. P. Princen, A. W. Johnson und A. B. Bradley: ''Subband/transform coding using filter bank designs based on time domain aliasing cancellation'', IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.</ref> LPC is the basis for [[perceptual audio coding]] and is widely used in [[speech coding]],<ref name="Schroeder2014">{{cite book |last1=Schroeder |first1=Manfred R. |title=Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date=2014 |publisher=Springer |isbn=9783319056609 |chapter=Bell Laboratories |page=388 |chapter-url=https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> while MDCT coding is widely used in modern [[audio coding formats]] such as [[MP3]]<ref name="Guckert">{{cite web |last1=Guckert |first1=John |title=The Use of FFT and MDCT in MP3 Audio Compression |url=http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-date=2022-10-09 |url-status=live |website=[[University of Utah]] |date=Spring 2012 |access-date=14 July 2019}}</ref> and [[Advanced Audio Coding]] (AAC).<ref name=brandenburg>{{cite web|url=http://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|title=MP3 and AAC Explained|last=Brandenburg|first=Karlheinz|year=1999|url-status=live|archive-url=https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|archive-date=2017-02-13}}</ref>
 == Types ==
@@ Line 45: / Line 45: @@
 Audio effects alter the sound of a [[musical instrument]] or other audio source. Common effects include [[Distortion (music)|distortion]], often used with electric guitar in [[electric blues]] and [[rock music]]; [[Dynamics (music)|dynamic]] effects such as [[volume pedal]]s and [[Audio compressor|compressors]], which affect loudness; [[Linear filter|filters]] such as [[wah-wah pedal]]s and [[graphic equalizer]]s, which modify frequency ranges; [[modulation]] effects, such as [[Chorus effect|chorus]], [[flanger]]s and [[Phaser (effect)|phasers]]; [[Pitch (music)|pitch]] effects such as [[Pitch shifter (audio processor)|pitch shifters]]; and time effects, such as [[reverb]] and [[Delay (audio effect)|delay]], which create echoing sounds and emulate the sound of different spaces.
-Musicians, [[audio engineer]]s and record producers use effects units during live performances or in the studio, typically with electric guitar, bass guitar, [[electronic keyboard]] or [[electric piano]]. While effects are most frequently used with [[Electric instrument|electric]] or [[Electronic musical instrument|electronic]] instruments, they can be used with any audio source, such as [[Acoustic music|acoustic]] instruments, drums, and vocals.<ref>{{Cite book|last1=Horne|first1=Greg|url=https://books.google.com/books?id=cHALQ_CO5P0C|title=Complete Acoustic Guitar Method: Mastering Acoustic Guitar c|publisher=Alfred Music|year=2000|isbn=9781457415043|page=92}}</ref><ref>{{Cite book|last1=Yakabuski|first1=Jim|url=https://books.google.com/books?id=QwcLdjCCXHkC|title=Professional Sound Reinforcement Techniques: Tips and Tricks of a Concert Sound Engineer|publisher=Hal Leonard|year=2001|isbn=9781931140065|page=139}}</ref>
+Musicians, [[audio engineer]]s and record producers use effects units during live performances or in the [[recording studio]], typically with electric guitar, bass guitar, [[electronic keyboard]] or [[electric piano]]. While effects are most frequently used with [[Electric instrument|electric]] or [[Electronic musical instrument|electronic]] instruments, they can be used with any audio source, such as [[Acoustic music|acoustic]] instruments, drums, and vocals.<ref>{{Cite book|last1=Horne|first1=Greg|url=https://books.google.com/books?id=cHALQ_CO5P0C|title=Complete Acoustic Guitar Method: Mastering Acoustic Guitar c|publisher=Alfred Music|year=2000|isbn=9781457415043|page=92}}</ref><ref>{{Cite book|last1=Yakabuski|first1=Jim|url=https://books.google.com/books?id=QwcLdjCCXHkC|title=Professional Sound Reinforcement Techniques: Tips and Tricks of a Concert Sound Engineer|publisher=Hal Leonard|year=2001|isbn=9781931140065|page=139}}</ref>
 ===Computer audition===
-Computer audition (CA) or machine listening is the general field of study of [[Algorithm|algorithms]] and systems for audio interpretation by machines.<ref>{{cite book |url=http://www.igi-global.com/book/machine-audition-principles-algorithms-systems/40288 |title=Machine Audition: Principles, Algorithms and Systems |publisher=IGI Global |year=2011 |isbn=9781615209194}}</ref><ref>{{cite web |title=Machine Audition: Principles, Algorithms and Systems |url=http://epubs.surrey.ac.uk/596085/1/Wang_Preface_MA_2010.pdf}}</ref> Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer [[Paris Smaragdis]], interviewed in ''[[MIT Technology Review|Technology Review]]'', talks about these systems {{--}} "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."<ref>[http://www.technologyreview.com/blog/VideoPosts.aspx?id=17438 Paris Smaragdis taught computers how to play more life-like music]</ref>
+Computer audition (CA) or machine listening is the general field of study of [[algorithm]]s and systems for audio interpretation by machines.<ref>{{cite book |url=http://www.igi-global.com/book/machine-audition-principles-algorithms-systems/40288 |title=Machine Audition: Principles, Algorithms and Systems |publisher=IGI Global |year=2011 |isbn=9781615209194}}</ref><ref>{{cite web |title=Machine Audition: Principles, Algorithms and Systems |url=http://epubs.surrey.ac.uk/596085/1/Wang_Preface_MA_2010.pdf}}</ref> Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer [[Paris Smaragdis]], interviewed in ''[[Technology Review]]'', talks about these systems {{--}} "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."<ref>[http://www.technologyreview.com/blog/VideoPosts.aspx?id=17438 Paris Smaragdis taught computers how to play more life-like music]</ref>
-Inspired by models of [[Hearing (sense)|human audition]], CA deals with questions of representation, [[Transduction (machine learning)|transduction]], grouping, use of musical knowledge and general sound [[semantics]] for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of [[signal processing]], [[auditory modelling]], music perception and [[cognition]], [[pattern recognition]], and [[machine learning]], as well as more traditional methods of [[artificial intelligence]] for musical knowledge representation.<ref name="Tanguiane1993">{{Cite book |last=Tanguiane (Tangian) |first=Andranick |title=Artificial Perception and Music Recognition |date=1993 |publisher=Springer |isbn=978-3-540-57394-4 |series=Lecture Notes in Artificial Intelligence |volume=746 |location=Berlin-Heidelberg}}</ref><ref name="Tangian1994">{{Cite journal |last=Tanguiane (Tanguiane) |first=Andranick |year=1994 |title=A principle of correlativity of perception and its application to music recognition |journal=Music Perception |volume=11 |issue=4 |pages=465–502 |doi=10.2307/40285634 |jstor=40285634}}</ref>
+Inspired by models of [[Hearing (sense)|human audition]], CA deals with questions of representation, [[Transduction (machine learning)|transduction]], grouping, use of musical knowledge and general sound [[semantics]] for the purpose of performing intelligent operations on audio and music signals by the computer. Technically, this requires a combination of methods from the fields of [[signal processing]], [[auditory modelling]], music perception and [[cognition]], [[pattern recognition]], and [[machine learning]], as well as more traditional methods of [[artificial intelligence]] for musical knowledge representation.<ref name="Tanguiane1993">{{Cite book |last=Tanguiane (Tangian) |first=Andranick |title=Artificial Perception and Music Recognition |date=1993 |publisher=Springer |isbn=978-3-540-57394-4 |series=Lecture Notes in Artificial Intelligence |volume=746 |location=Berlin-Heidelberg}}</ref><ref name="Tangian1994">{{Cite journal |last=Tanguiane (Tanguiane) |first=Andranick |year=1994 |title=A principle of correlativity of perception and its application to music recognition |journal=Music Perception |volume=11 |issue=4 |pages=465–502 |doi=10.2307/40285634 |jstor=40285634}}</ref>
 == See also ==

Audio signal processing: Difference between revisions

Navigation menu

Search