The Crazy Riot Of Linux Sound Technologies
Getting a computer running Linux to make the sounds you want it to make can be a real challenge. The problem seems inherent in the solution - open source software (and the Unix philosophy) tends to fragment into very specific functional components. So a low level hardware driver related function is cleanly separated from a user level function. With Linux sound there are actually many layers and some projects encompass several layers. This creates some choice, flexibility, developer modularity, and, well, confusion.
"OpenAL is a cross-platform 3D audio API appropriate for use with gaming applications and many other types of audio applications." This library is focused on creating correct sound levels to model multichannel sound of events in 3d space (and explosion on your left comes more out of the speaker on your left). This is probably most useful for virtual reality applications and games. Output: ALSA (default), native, sdl, arts (unstable), ESD, OpenAL enabled hardware devices (like some Audigy and X-Fi cards)
This is the original Open Sound System. There are ports to it on BSD. The naming conventions that were established with this in the dark ages persist today even though other facilities actually manage the traditional /dev/ locations. There is also OSSv4 which is not, apparently, completely obsolete. Generally it’s best to use one of the new sound models though. I think the general criticism was that OSS applications had sound device locking problems.
Advanced Linux Sound Architecture. This is the main standard for low level Linux sound processing. It can output directly to kernel hardware drivers (which start with snd_). It can also output to legacy OSS devices. Inputs: PulseAudio, JACK, GStreamer, Xine, SDL, ESD Outputs: Hardware drivers, OSS
A "sound server". This correctly implies that it will serve audio events over a network connection, though in practice this seems hard to set up a realistic example. It also mixes, sets volumes (per application and per sound device) and prevents bad applications (e.g. Flash) from monopolizing the sound output resources. It is also useful for sending audio output to multiple sound devices and can capture audio from many sources. Inputs: GStreamer, Xine, ALSA Outputs: ALSA, JACK, ESD, OSS
Enlightenment Sound Daemon. This is similar to PulseAudio in attempted scope. It has good mixing but limited support and is being replaced by PulseAudio.
"PortAudio is a free, cross-platform, open-source, audio I/O library. It lets you write simple audio programs in C or C++ that will compile and run on many platforms including Windows, Macintosh OS X, and Unix (OSS/ALSA). It is intended to promote the exchange of audio software between developers on different platforms." Outputs: Windows (MME, DirectSound, and ASIO), ALSA, OSS, JACK, Mac OSX Core Audio Input: Audacity
Supposed to be low-latency. Needs applications that are JACK aware. Doesn’t go directly to sound device output. Basically just seems to route between JACK aware applications (which is no small thing). Inputs: GStreamer, PulseAudio, ALSA Outputs: OSS, FFADO, ALSA
Simple DirectMedia Layer. "Simple DirectMedia Layer (SDL) is a cross-platform, free and open source multimedia library written in C that presents a simple interface to various platforms' graphics, sound, and input devices." There is a joystick subsystem (Hmm…).
This is the back end library for the Xine media player. This library supported a very wide range of different codecs and the library is a good choice for projects needing broad support of multifarious formats. It can, in theory, output to JACK, but sometimes that support is not compiled in. Inputs: Phonon Outputs: PulseAudio, ALSA, ESD, (JACK)
An advanced encoder/decoder stack. Inputs: Phonon Outputs: ALSA, PulseAudio, JACK, ESD
This is a framework that was used by QT to make "cross-platform" capable stuff where the details of sound production would be transparent to the programmer. The details were generally passed on to GStreamer for actual sound production. Does for QT what SDL does for games. Apparently this is interface wasn’t too popular overall and may be dropped. Inputs: Qt and KDE apps Outputs: GStreamer, Xine
Free Firewire Audio Drivers. This basically links high end studio stuff (fancy physical audio gear) to JACK. Apparently in high end stuff firewire is (was?) considered the good way to go. I think this one should be safely ignored by most people. Inputs: JACK Outputs: Fancy Audio Hardware
Analog Real Time Synthesizer. This was an audio framework for KDE (old versions). Mercifully it has conceded defeat. It was replaced with Phonon. The sound daemon was called artsd. If you come across this, the documentation is old. It is safely ignored.
"VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVD, Audio CD, VCD, and various streaming protocols."
"Linux Audio Developer’s Simple Plugin API is a standard that allows software audio processors and effects to be plugged into a wide range of audio synthesis and recording packages. For instance, it allows a developer to write a reverb program and bundle it into a LADSPA plugin library. Ordinary users can then use this reverb within any LADSPA-friendly audio application. Most major audio applications on Linux support LADSPA."
Might stand for "Disposable Soft Synth Interface". "DSSI (pronounced dizzy) is an API for audio processing plugins, particularly useful for software synthesis plugins with user interfaces. DSSI is an open and well-documented specification developed for use in Linux audio applications, although portable to other platforms. It may be thought of as LADSPA-for-instruments, or something comparable to VSTi."
Open Sound Control "…a protocol for communication among computers, sound synthesizers, and other multimedia devices that is optimized for modern networking technology…" Seems like this could be obsolete. Couldn’t quite figure its place out.
Virtual Studio Technology. From wikipedia: " an interface for integrating software audio synthesizer and effect plugins with audio editors and hard-disk recording systems. VST and similar technologies use digital signal processing to simulate traditional recording studio hardware with software. Thousands of plugins exist, both commercial and freeware, and VST is supported by a large number of audio applications." Might be some proprietary nonsense with this one. Perhaps the proprietary inspiration for LADSPA. Also VSTi is the "i"nstrument plugin. Just being complete with the weird acronyms one encounters.
Using A Computer As A Remote Listening Device
Imagine that you have two computers that can ping each other on some network (could be the wild internet, could be a home LAN). There is the computer you’re sitting at which we’ll call "home" and the one that’s not near you which we’ll call "away". Now imagine that you want to listen to what’s going on around "away" while sitting at "home".
Start by telling the "home" computer that you want it to wait for a network connection (which presumably will be from "away") and when it makes the connection to pipe it to something that will send the data to the speakers for you to hear. You need to choose a port, here I’ll use 7777. You need sudo to use ports lower than 1000.
[home]$ nc -l 7777 | aplay -
That should appear to do pretty much nothing, but it’s actually waiting for a network connection from somewhere.
Next we log in to "away" and fire up the microphone and send it to home (substitute the name of your host in the home position).
[away]$ rec -t raw -b8 -c1 -r8k - | nc home 7777
The 7777 is the port number and can be whatever you like that is not being used or blocked by a firewall. The -b8 is for 8-bit. The -c1 is for mono channel. The -r8k is for 8kps sampling rate. All of these settings are pretty modest and designed to keep things simple for efficient transfer of basic noises, especially talking. Moving a recording of a violin concerto over a network is a different problem.
I found that the sound quality was kind of scratchy. I made that better by recording some very quiet time on "away" and using that to generate a noise reduction profile that could then be used to filter the recording process.
[away]$ rec -t wav - > noise-profile-silent-sample.wav [away]$ sox noise-profile-silent-sample.wav -n trim 0 1.5 noiseprof away.noise-profile [away]$ rec -t raw -b8 -c1 -r8k - noiseprof away.noise-profile | nc home 7777
Ubuntu’s Problem With Multiple Sound Users
It can happen that everything works fine in controlled testing but while deploying in a live situation, nothing works. It used to be that if some user was logged in and playing sound (or had a Flash thing going on in a browser) that other logged in users couldn’t access the sound device. Apparently this was cured and was only a problem for people who had legacy accounts which were members of the audio group. You can cure it by making sure no one is in the audio group:
$ sudo gpasswd -d xed audio
Counter intuitive, but seems to work. Fixing this will allow this remote listening strategy to work even if someone else is logged onto "away" and busy watching videos or something.
What’s With Those Damn .m4a Files?
Ok, these audio files are not cool. They often don’t work with mp3 infrastructure. Here’s how to convert them to mp3s:
ffmpeg -y -i ./BWV0666.m4a -ab 192000 -ac 2 ./BWV0666.mp3
I don’t really know if there’s a serious performance hit, but the resulting file sizes from the set I did were smaller. Probably just saved some space without any noticable loss of quality.
Here is a nice description of all the overly complex details of ALSA. Things like what does hw:0,1 mean? (Answer "first device on the second soundcard/device".)
Also to get useful information, try aplay -l or aplay -L or cat /proc/asound/cards
I tend to have problems getting mpg123 to work since that’s the player I tend to use. The trick is to do something like this:
mpg123 -oalsa -C Jimi_Hendrix-Red_House.mp3
Which option argument should you use after the -o? Try the --list-modules option to see. In the same style look at --test-cpu but pretty much everyone should have fancy decoding tricks in their CPUs by now.
Interesting Sound/Music Applications
ReZound This is a very awesome sound editor. You can do all kinds of crazy things with this and I can only understand 1% of it. I have used it to crop sound files that had undesirable stuff at the ends, and isolate certain sounds in a longer sound file. I also managed to get it to slow down a song by 50% while bringing up the pitch by an octave so that I could try to figure out some guitar part. Unfortunately that produced a seg fault on my system when I tried to do to much at once. Probably my machine doesn’t have enough memory for this kind of thing. Still a great piece of software. Note that the right mouse button is used to define the right selection boundary. This can be a bit non-standard.
playitslowly This is a simple but quite functional program that basically plays songs at different speed while adjusting the pitch to remain constant. I have used this and it works great (contrast to my experience with rezound). It is ideal for figuring out lyrics, guitar riffs, and drum parts, etc.
SooperLooper "SooperLooper is a live looping sampler capable of immediate loop recording, overdubbing, multiplying, reversing and more. It allows for multiple simultaneous multi-channel loops limited only by your computer’s available memory." Uses JACK.
Hydrogen Drum "Hydrogen is an advanced drum machine for GNU/Linux. It’s main goal is to bring professional yet simple and intuitive pattern-based drum programming."
Freewheeling "Freewheeling allows us to build repetitive grooves by sampling and directing loops…"
kluppe "kluppe is a loop-player and recorder, designed for live use."
qtractor "Qtractor is an Audio/MIDI multi-track sequencer application written in C++ with the Qt4 framework. Target platform is Linux, where the Jack Audio Connection Kit (JACK) for audio, and the Advanced Linux Sound Architecture (ALSA) for MIDI, are the main infrastructures to evolve as a fairly-featured Linux desktop audio workstation GUI, specially dedicated to the personal home-studio."
Ardour "Digital Audio Workstation. record mix edit collaborate."
BEAST Stands for "BEtter Audio SysTem" and it comes with BSE, "Better Audio Engine" but it’s really an audio synthesizer and multi-track editor.
Rosegarden "Rosegarden is a well-rounded audio and MIDI sequencer, score editor, and general-purpose music composition and editing environment. Rosegarden is an easy-to-learn, attractive application that runs on Linux, ideal for composers, musicians, music students, and small studio or home recording environments."