Conversion of Sound to Electrical Signal
The process begins by converting the mechanical sound wave into an electric current. A microphone does just that. There a many different types of microphones, however a very commonly used one is the dynamic microphone. Inside, there is a movable coil resting in a magnetic field, and attached to a diaphragm. As sound waves enter the microphone, the diaphragm oscillates causing the connected movable coil to oscillate within the magnetic field, which produces a varying electric voltage in the coil. Usually more than one diaphragm is used in the microphone so they each can react precisely to a certain band of frequencies. The louder the sound, the higher the amplitude of the sound waves. The higher the pitch, the faster the frequency of the sound wave. The diaphragm and connected coils convert these characteristics of the mechanical sound wave into an electrical signal.Simple diagram of a microphone. |
Sampling
Now that we have the electrical signal with varying voltage, the sampling process can begin. The sound wave represented by the electrical signal flowing in through the microphone is continuous but it must be converted into binary form to be stored digitally on a computer. Binary is discrete not continuous. The wave must be represented by a certain number of ones and zeros, no in between values. To do this, an analog to digital converter (ADC) measures the voltage level of the continuous electric signal (which corresponds to the amplitude of the sound wave) at a moment in time and represents that voltage as a certain number of ones and zeros. This number of ones and zeros is the bit depth. Take a bit depth of 16 for example, which is the standard bit depth of cd's and most consumer digital audio. The measurement of the voltage aka the amplitude of the sound wave is 16 numbers either 1 or 0. That means their are 2 possible values for each number 16 numbers. 2^16 is 65536. At a bit depth of 16, there are 65536 possible combinations of 1's and 0's that can represent the amplitude of the sound wave at the moment it is measured. The ADC continues to measure the voltage at certain time intervals, storing the value each time as 16 binary digits. The frequency of this measurement is called the sampling rate. If the voltage is measured 44,100 times per second the sampling rate is 44.1 kHz (kHz is 1000 cycles per second) which is also the standard sampling rate of cd's and most consumer audio. In sum the ADC takes a measurement of the sound wave amplitude voltage 44,100 times per second and records it as a 16 digit binary number in a digital audio file. This sampling strategy is known as pulse code modulation.Visual representation of digital audio at cd quality. |
There you have it! A digital audio file is comprised of millions of binary numbers at a certain bit depth and frequency. You might have seen that digital audio files are classified by number of kilobits per second (kbps). The calculation for this number can be easily obtained by multiplying the bit depth of the file times the sampling rate, times the number of channels which is most often 2 for stereo audio. A cd for example would be 16 * 44,100 * 2 = 1,411,200 bits per second or 1,411.2 kbps. If you ripped cd audio files to a computer you would get this file resolution. The entire size of a cd audio file can be calculated by multiplying 1,411,200 bits per second times the number of seconds in the file. So a 3 minute track would be 1,411,200 * 180 = 254,016,000 bits, divided by 8 bits per byte and 100,000 bytes per megabyte = 31.752 megabytes. A 16 gigabyte ipod would only be able to hold about 500 of these files. This may seem like a very large file for one audio track, which is why audio compression was created.
Compression
Audio compression files such as MP3 or AAC use algorithms that compute which data in the uncompressed audio file are unnecessary and correspond to sound that cannot be discerned by the human ear. The algorithms remove a certain amount of data to adhere to a certain number of kbps such as 128, 192, 256, or 320. At compression to any resolution below 128 kbps, important data must be removed and the effect is discernible as distortion, similar to a pixelated image that becomes more blurry when the pixels become larger. Some people with sensitive ears may hear distortion at 128 kbps or even 192 kbps and prefer a higher bit rate. You may have also heard of vbr compression or variable bit rate. Vbr refers to a compression algorithm that adheres to an average bit rate but uses differing bit rates throughout the file depending on the complexity of the audio. Moments of silence would have extremely low bit rate, while active and complex moments would have a very high bit rate. The size of vbr compressed audio files therefore varies. The benefit ov vbr is that vbr encoded files have higher resolution when it matters, so a lower bit rate vbr file would have comparable sound quality to a higher bit rate cbr file(constant bit rate), while being smaller in size. iTunes music is encoded in 256 vbr AAC format. A 3 minute track would be about 256,000 * 180 / 8 / 100,000 = 5.76 megabytes. But usually the size is >5.76 because there are more active parts of the track that quiet or silent parts. My 64 gb iPod can therefore hold about 10,000 of these files.Here's a link to another blog discussing compressed audio sound quality .
Conversion from Digital back to Sound
All of that is great but how can you hear a digital audio file when it is just a bunch of 1's and 0's? The answer is a DAC: digital to analog converter. A DAC does essentially the opposite of an ADC. A DAC converts the binary digits of each sample in the digital audio file back into a voltage level. However because binary values in digital audio are discrete points, a continuous voltage stream would not be possible. The DAC essentially fills in the spaces between the discrete points in the digital file using a reconstruction filter that interpolates the data between the points. In other words, the DAC uses information from the discrete points to fill in the blanks between them with a continuous voltage. All of our computers, cell phones, and iPods contain DAC's. This continuous electric signal from the DAC is then interpreted by a speaker in the reverse way a dynamic microphone works. The electrical signal is applied to a coil that is resting in a magnetic field and the coil oscillates back and forth based on the voltage. The coil is attached to a membrane, in this case the speaker cone, and the cone oscillates displacing the air around it into a sound wave. Speakers also have multiple membranes so they can reproduce the sound wave precisely for certain bands of frequency. They are often called subwoofers if they handle very low bass frequencies or tweeters if they handle the high treble frequencies.Circuit board from a high quality amplifier which powers speakers. |
The DAC from the circuit board which can convert up to a 24 bit 192 kHz digital signal into the electrical signal needed for the speakers. |
Analog Vs. Digital Audio
As you can see digital audio is the most convenient as it can be easily stored, accessed and converted into various compressed forms. However there is much debate as to whether digital audio is "better" than analog audio. Analog audio is stored in a physical format in which the continuous voltage stream can be directly read by a device and played by a speaker. No ADC's, DAC's, or discrete quantization involved. The two main types of analog audio are cassette tapes in which the voltage stream is stored on magnetic tape, and vinyl records in which the voltage stream is stored a in a modulating spiral groove on polyvinyl chloride discs. The argument against digital audio is that there is inherent alteration of the sound wave representation when it is quantized into discrete values. However with 24 bit 192 kHz professional audio recording standards, there are 16,777,216 possible discrete values for the amplitude measurement, measured at 192,000 times per second. To the human ear this is a completely indistinguishable alteration. Another argument against digital audio is that ADC's and DAC's introduce some of their own noise when they measure and convert the voltage or binary values. This introduced noise is measured by signal to noise ratio. However high quality ADC's/DAC's introduce such a small amount of noise that the alteration is completely indistinguishable. Analog seems better then, but the drawback to analog audio is that the storage method decays over time unlike digital audio. A perfectly encoded analog medium that did not wear out would produce the closest representation to the original sound wave as possible. However that is not the case, as the grooves in vinyl get worn and the magnetic tape becomes demagnetized. Sometimes it just comes down to personal preference. Vinyl records often produce a warm bassy distortion when played, which some people prefer over the "cold" accuracy of digital audio.Visual representation of noise introduced due to quantization. The actual amount of noise is much much smaller but this is a simple representation. |
In my opinion digital is the way to go. It's just so convenient. 6500 songs in my pocket right now. Try that with records.
Thanks for reading, I hope you found the topic as interesting as I did.
Feel free to comment any questions you may have.