This is an article of the category "Let's talk about Next-Gen. Future ideas for Linux Audio" where I elaborate(!) on realistic near future improvements or unrealistic science fiction scenarios.
„Does this program import midi files?“ is a very common question. In fact it might be a FAQ for all sequencers, trackers and even wave editors(oh yes... you know that happens).
Despite all its shortcomings (which will be described in this article in some detail) the midi format is the very devil to kill. It is still used as primitive but widespread exchange format between programs or as final format in the antique General Midi Standard. .mid in all its variants and purposes is still hip!
It would be wonderful to have a software library, open source license of course, which is capable of reading midi data and generate missing & more information to end up with a good universal file format which holds many information and is a perfect starting point to import the music data into an actual program, be it a standard piano roll sequencer or something more sophisticated, like a notation editor or a harmony/song generator from an existing melody.
These missing, but important, information are essential for the music itself and therefore for programs dealing with music. For example the question if a certain note is a G-sharp (gis) or A-flat (aes). On the piano both share the same key; in the midi protocol both share the same integer value. But in musical reality those are two different tones. This is true not only for non-equal tuning (like just intonation), where the g-sharp has a higher pitch (Hz value) than a-flat but also on a perfectly tuned synthetic instrument with equal temperament. If a human player on such an instrument sees a G-sharp she or he will play it differently than an A-Flat. Maybe louder, maybe modify the tempo a bit to emphasis it…
Point is: there is a meaningful difference in musical parameters that the midi protocol (and therefore software following it strictly) treats as the same thing.
„Wait a minute“, you‘ll probably say, „there are many programs who do midi import. I have never encountered an A-flat where I expected a G-sharp“. This would mean two things: You don‘t know many programs and the one you know were aware that this information may be retrieved by interpreting the context the note is in. If you see the melody: „A e f# ? a“ you can bet that this is a g sharp, as leading note to the final a.
If a human does this process by ear, listening to music and then writing it down as notation of any kind, this is called transcription. And this is what we want to simulate.
libTranscribe - Why do we want it?
A lib that is able to gain musical data indirectly is luckily a project of clear boundaries with simple programming interfaces: Data goes in, more data comes out.
Even without using it in another program, just as standalone command line program like imageMagicks convert, this is useful: transcribe in.mid out.musicXML .
You can bet that even programs that already have both midi import and musicXML import will benefit from a dedicated converter.
To what extent audio to midi conversion will be implemented into libTranscribe will be discovered during development and implementation. Right now it is clear that some information are already lost in the step audio to midi. If you convert audio to midi outside of libTranscribe, like through aubio, you maybe end up with a sub-optimal starting point. But the question is of what kind these losses are. Maybe exactly this data turns out to be generated most easily and most accurately anyway and we never had a real problem in the first place. So for now, we just assume that we already have a midi file, no matter where it came from.
A bonus effect: Even if there never existed a midi and you did everything by hand this lib can be used as „spell checking“ or even a test if you really wrote a piece which sounds like a Mozart minuet, like you intended.
Vast Encoded Musical Knowledge - Crowdsourcing FTW!
Such a project obviously needs musical knowledge and experience of many kinds. 80s New Wave is different from Mozart. 80s New Wave is even different from the German „Neue Deutsche Welle“ from the same time! And young Mozart is different from ‚old‘ Mozart (not very old though).
So it makes sense to create a modular libTranscribe system where many people and A.Is can work on their special field without getting in each others way. Obviously not all musical styles have to on the same level of implementation, be it that a music style is very hard to describe (Wagner harmony and pitches may be really hard to do, Palestrina will be a piece of cake) or just that nobody did the work yet.
Background information: Why is midi import mostly bad and you end up doing it by hand and ear in the end anyway.
So.. why do we need a shared library for that task? After all each program does its midi export without one as well and that works just fine.
Midi import and export are not synchronous tasks.
If you want to generate a midi file from relatively complete data, like music xml (which sucks huge time itself! see explanation below) all you have to do is to „forget“ certain data and get inaccurate. You don‘t need times signatures, key signatures and other musical meta-data. You don‘t need to know if a note is a staccato quarter note or a eighth note followed by a rest because now you are not dealing with rhythm anymore but with duration in absolute seconds.
Now imagine to convert the same file back to musicXML. What is the time signature? 3/4 or 6/8? You have to make an educated guess now.
Is this a general problem or specific to a computer?
It is exactly the same for a human. Writing logically correct music from insufficient data takes training and can never be 100%. It is what we do as humans when listening to music. It may take a while to find out if a piece is in 2/4 or 4/4. You must wait for the right moment when rhythm and melody leaves no doubt so that you can reverse engineer the meter.
This is not about „ear training“. Identifying pitches and rhythm by ear is not a useful thing, even if countless people and programs try to sell that to you. If you reached the master level in an ear training software or university course you just learned the basics for the real tasks. The craft for the art. Only combined with knowledge about music theory and music history you can really use this skill.
Because now you need to add semantic and logical information, based on those basic parameters like pitch, rhythm and dynamics. And the context switches for every piece. Even within a piece of music.
State of the Art
There are many questions and problems that need to be solved. The more knowledge goes into such a library the better will be the results of a low information density format like midi to a high density format (which is not musicXml, again, see the end of this article).
A music listening A.I. with expert knowledge in all musical styles and genres may be a few decades or centuries away but in the mean time we can work at least step by step to improve the situation. Keeping it simple in the beginning. Every step should make at least midi import easier right now already.
Of course: Don‘t forget the human part. We don‘t need the all knowing program, at least not yet. If you know already if a piece of music is from the 16th century Italy, 18th century Vienna or 20th century Hollywood: Very good, just input this as precondition to the transcription.
If you see this in relation to where we are now (to my knowledge) even something as deciding if a chord is C-Eb-G or C-D#-G is an improvement. (the answer is not 100% C-Eb-G. This can not be solved with a look-up table. D# is the leading note to E which is the major third of a C Major chord. Leading notes into thirds are not uncommon).
Now you know...
By now it should be a bit easier to understand why many people give the advice „Just write it down by hand, don‘t convert midi data“ if you want something like notation in the end. You simply can‘t trust the current midi conversion methods.
So, even more basic than the example above, a library that would give you a list of the places where it was uncertain would be a step in the right direction.
Current midi importers suck also because it is too much work and unviable for the developers. You don‘t need it for the really serious work, like composing new pieces or creating high quality notation editions on PDF or paper. Data entry is the smallest problem in both of these examples and it really doesn‘t matter if it takes a a few days more if you have to input the data by hand instead of importing midi.
An open source approach can help here by dividing the work and multiplying the usefulness if used by more than one program.
Let‘s do it!
MusicXML is not a good. And I mean that aside from the fact that it is XML. Many people would already disregard a file format as bad simply because they consider XML itself bad-
Many aspects (playback fine control) that midi CAN do are lost in mxml. I chose mxml as an example throughout the article because it is widely known through the propaganda of it beeing a universal notation exchange format. So we really don‘t want convert midi to mxml, that would be putting out a fire with gasoline.
We want the format that has midi as well as mxml as subset.
Last week I wrote about a near-future scenario where the idiom of Linux Audio modularity is taken to the next level. The biggest problem was to hide certain connections from the user.
After a week of research and talking to experienced JACK developers this is my current knowledge about the topic:
- Netjack and similar tools are not the right solution. It may considered just a lack of feature that some of them do not forward MIDI and Transport. But the real issue is that this is meant to run one server on one computer and we need one server for each bundle. This just shows that this solution would have ended in a hack with virtual IP addresses, temporary Linux user accounts and other abominations. There are doors we don‘t use :)
- I began to write a client that, without any networking, connected to two jack servers and then copied data from one server to another. This however turned out not to be a very solid and future proof solution either. Introducing latency etc.
- From that train of thought it was suggested by a knowledgeable person in #lad@freenode (I did not ask him for a quote so no name yet, sorry) that a client could be just a normal jack client on one side (the ‚bundle‘ sides way to receive the finalized audio stream for example) and a jack driver (like dummy, net or alsa) on the main jack servers side. This sounds like a clean solution, but needs a new jack release at least. Maybe not that unrealistic, but somebody needs to do this. Feel free to contact me if you want to be a part in a glorious future! (by writing this driver/client).
- There is Jack Meta data which is able to hide connections. This implementation requires no extra jack server (which is good) but needs more work on the client sides. Jack connection managers like QJackCtl and Catia need to follow the meta data. Since we would inject the „hidden“ property not from within a jack client like ZynAddSubFx but from our own bundle-manager we could switch on and off this hidden property so that updates and editions can be made. Since we already know which clients belong to which bundle there is no need to add a group tag to the meta data. So at least the jack control programs don‘t have to to know anything about that.
The last solution might have some (unknown) problems of its own and is not the one that comes closest to an „virtual machine“ idiom so it may be a bit flaky when connections show up in the wrong places or you will use a way to store jack connections that grabs the hidden connections as well and messes up your session.
But it is also the most realistic. Not perfect, but good enough. And this is all we need.
Building Meta Applications and „Out-Of-The-Box“ bundles through extended Linux Audio Session Management and a virtual Jack machine
This is an article of the category "Let's talk about Next-Gen. Future ideas for Linux Audio" where I elaborate(!) on realistic near future improvements or unrealistic science fiction scenarios.
Many users frequently wish for more „Out-Of-The-Box“™ experience within the Linux Audio program stack (and many users wish for the opposite). I can understand both points as a musician, user and programmer who needs both aspects while keeping the administration aspect down.
A real world comparison: A hardware based recording- or producing studio.
Technically it doesn't hurt if cables are all over the place, connecting dozens of outboard- and standalone gear. The sound and the result will be good, and that is the point that matters most.
But hiding the cables in an organised manner behind wall panels and in canals under the floor have a certain appeal as well! It looks tidy, no latent fear to trip over the cables and destroy equipment and the (subjective) impression of a compact, functional and distraction-free creative environment.
The latter is an extremely important psychological basis for the musician or composer that is dangerous to underestimate. It leads to a relaxed attitude, to an impression of an open, inviting room that only waits to be filled with music.
Furthermore there are practical reasons: If you have all your hardware in a trunk with wheels or in a flight case, connected internally with short wires, then you get an extended field of application. You can move it around, pack it in a car, place it on the stage etc.
Speaking of the stage: Many musicians have a fixed setup for their instrument on stage (or even behind the stage). That is equipment, interconnected always in the same order, only with slightly varying settings. Once you have all that in a neat case you can close the lid and from then on rely on the equipment as one big block „in-the-box“. All you need is one plug for you instrument on the outside, one output (or two for stereo), for the combined signal after your FX-chain and maybe a control port (MIDI) in for switching presets.
Again: technically there is no difference to carrying around all pieces separately, connecting them every time before a show or recording. The sound will be just the same. It is „just“ a matter of convenience and psychological relaxation.
The point is: Nobody will come to you and say how much better it would be to have all the cables and every piece of hardware in front of you. The benefits are too convincing: Robustness, compactness, little prone to failure and, maybe most important, trust in your system. This is far better to have than the unlikely possibility that you need to quickly change a small thing. Especially if you set up all your gear beforehand through knowledge and experience of past recordings, productions, rehearsals and live shows.
And the best part: Everything is still modular! This is actually a case _against_ multi-effect hardware where you get the complete package or nothing. If there is a new hardware release for your reverb part, just exchange it. No problem.
And any time, when you are not doing creative tasks but take your time for the administration phase, you can switch, reset, tune etc. to your heart‘s content.
Just imagine additional hardware of the same type would cost nothing! You could just buy a second set and recombine it differently. And just imagine all those hardware trunks would take up no physical space so that you can have as many as you want...
The reader should have gotten by now that we are talking almost about software. Many instances, little additional costs, little physical space. Naturally, the next question is:
But why everything outboard? Just use a small computer instead!
We live in a time and age where computer parts are very small, powerful and cheap. Instead of a big shelf full of hardware just do everything in software and use a (small) computer. For the musician not much changes: Of course the physical dimensions get smaller, but we still have the same inputs and outputs from above.
Imagine a small ARM-box like a Raspberry Pi (maybe a bit more powerful) where a synthesizer is running, let‘s say ZynAddSubFX, and additionally a midi-arpeggiator, both connected internally with jack midi. Does the musician need to know that there are two separate programs? No. From the outside (of the box) it just looks like an arpeggiating synth.
Let‘s transition even further! If we do not mix and edit in the analogue realm (like on a live stage) but instead in a composer studio we could replace the audio out with an Ethernet cable and route directly through netjack. This is a huge improvement since we get sample accurate timings and no degradation through the digital>analogue>digital conversion.
Still, the main argument of this article is true: Once the setup is complete your small effect-computer (or sampler, or synth etc.) becomes a trustworthy device, immune against software accidents („Oh, I clicked disconnect all in QJackCtl“), crashing sequencers or the need to work with flash and ALSA audio so you can view youtube videos. It is just a separate system. We have it for the convenience, not for the additional CPU power.
And again: If you want you can hook up screen and keyboard and change the whole system (or log in via SSH). This is not about keeping the user out and prevent him or her from modifying.
And now do the same in software, on one computer! But keep the convenience!
The problem is not solved yet. First, even today, additional computers cost additional money and you need a bit of space, and again, those dreadful cables. The main desktop PC you already have for recording and producing is most likely powerful enough to satisfy all your audio and midi requirements.
And indeed: Nearly anything that was said above can be done on a single system and you can keep the benefits, including the psychological. But we need a bit more work.
Here are the requirements, followed by what we have and what we still need.
- Different programs, doing their jobs well, principally able to run stand-alone.
- The possibility to connect all those programs directly.
- Save and load the current state: what programs are running, what are their settings, save connections.
- Load and save presets for all these programs at once (through a CLI-UI, GUI, OSC, Midi etc.)
- Don‘t expose the graphical part of the program (GUI, terminal window etc.) to the user
- The whole package, including presets, can be duplicated, moved and used on different computers and architectures (x86, 64bit, ARM)
- The whole package can be treated as single program that satisfies all 7 requirements (recursively)
- provide only selected inputs and outputs for connections with other programs (like your mixer, sequencer or DAW)
This is not a small list. But thanks to the work of many individuals over the last years most of these points are already implemented. In fact only the last two are missing. Here is why:
- This just means any program that you start on the OS-Level, like Carla, Ardour, Laborejo or ZynAddSubFx.
- JACK Audio and JACK Midi are the widely accepted. This is a no-brainer.
- This is the work of a Session Manager. „Non Session Manager“ (NSM) is the best and working solution here.
- Pure terminal programs are not the problem here. They can run in the background just fine. For additional convenience or functionality you can even automatically create a tmux session for those. X applications can run in a background or embedded X server as well. This may be resource intensive, but who knows. Computer resources may become so plenty one day that even this becomes irrelevant. Imagine the whole package runs in Xephyr, including an open tmux session for cli programs in a terminal in that X session. NSM already has an option and a button for an optional GUI. So if you insist you can simply click that and see what is going on.
- Solved by NSM already. Properly working clients can load and switch their states without closing the whole program. This means, for our scenario, that each preset is its own NSM Session. Switching presets is fast and stable then. (And even if the client did not implement this functionality: The downside is only that it takes longer to switch presets because this one client needs to be closed and reopened).
- Non Sessions are self-sufficient by design. They save everything except the binary programs (Carla, Ardour, Laborejo etc.) in its own session dir. Every preset and asset (e.g. sample libraries) is in the same place and can be moved, copied, compressed and decompressed by the user with his or her file manager of choice.
- NSM can run in multiple instances, even across the network. What we need is a client program that starts this local, encapsulated, server, loads all the settings and restores the internal connections. This client program can be seen by the outside world and is what you start manually (or as part of a global non session). There are no technical problems here, this just needs a one or two weeks of work.
- Now the real problem: We want all the internal clients to connect to each other through JACK but we don‘t want those ports and connections („cables“) visible in our main, user-accessible, connection manager like QJackCtl, jack_lsp or Catia. Except those mentioned „outside-world“ inputs and outputs! According to Paul Davis („las“), the maintainer of JACK, this is not impossible but difficult and certainly not trivial.
Here are some ideas for more or less „clean“ solutions.
Run an extra JACK server with the dummy plugin (instead of ALSA or Firewire) for the internal clients. The external ports (in and out and control) then connect through a yet unknown functionality or through some special client that is able to connect to two JACK servers. Maybe a proof of concept can be hacked together by using netjack through localhost on the same computer. I really don‘t know.
What do we gain, what is the perspective?
For the daily audio production this already has some benefits. Like already mentioned, those are mostly convenience-based and psychological, which is just what many people need to feel in control over their work (less „cables“, less desktop windows, less options, less choices).
Beyond that is is very useful that users can truly build effect chains, synchronised sound generators and other „meta-programs“ without any programming knowledge (since all the editor you need is what the NSM GUI already offers).
Those meta programs become the building blocks of future sessions and productions and can easily be copied, shared and backed up because everything results in a single file (tar or tar.gz if you want). They are portable across distributions, systems and architectures because no executable program is saved, just the data.
Easier ways to create new software
For me personally, from a developers point of view, the greatest benefit is the development of completely new Meta-Applications in rapid time and the creation of „Out-Of-The-Box“ bundles.
In Linux it is customary and best practice that a single program does not try to fulfil all needs and reinvent the wheel each time but that you can rely on the work of others. This can be done through libraries or through the command line pipe „|“ that combines small tools like ls and grep to big applications in a bash program. In this sense JACK is the equivalent of this pipe.
A midi or audio developer that as a good idea for a new synthesizer does not need to implement audio effects like reverb because that can be done in LV2 hosts, he does not need to offer a midi filter, because that can be done with mididings, and he does not need to create a builtin arpeggiator, because we have arpage, flarps, qmidiarp etc.
The problem so far was that this „good idea“ strictly requires these programs to work together in a specific way. The developer today has two choices: Either just concentrate on writing the synthesizer and release it together with a tutorial and readme that says how to use it or implement all the functionality above again.
The first method is preferable but it lacks that out-of-the-box experience many developers and users want and need so there is a percentage that indeed goes all the way reinventing the wheel time after time.
With the proposed system he could just deliver a single file that can be loaded by NSM (or maybe a special version of NSM, backwards-compatible but enhanced with this articles content) that, for the normal users, appears to be the complete program.
This could be a simple extra client that does nothing but administration or the special functionality could be built into the synthesizer directly. I prefer the first version though.
You get to eat and to keep the cake!
And nothing is lost! I can‘t stress this enough: The synthesizer of the example above itself does not impose any particular work flow on the advanced user. The extra layer, the complete package, is just that: extra. Nothing prevents you from just starting the synth standalone and do whatever you want with it, without the LV2 reverb, without the arpeggiator.
Extended functionality, bundles, even commercially sold presets or assets do not get in the way. You don‘t need to delete all the „convenience-crap“ first if you like a clean slate.
And these are not even different versions of the same programs. Not forks. Not one „pro version“ or a „student version“. Not one commercial notation program with a built-in sampler library already connected and ready to go and one „open source“ version that just gives you jack midi ports.
There is no right and wrong here, that is the point. One day you will need convenient out of the box experiences and the other you want bare bone modular software that lets you tweak every detail. My point is: Those can be the same software, the same binary program, if we work a bit on the administrative environment.
Let‘s do it!
I have decided to skip Laborejo 1.0, stop working on it and instead do the "single worst strategic mistake that any software any software company can make: They decided to rewrite the code from scratch."
Work on Laborejo 2.0 has started. The vision stays the same: A notation based sequencer with strong Lilypond connections.
But I have learned many things about using, writing and designing software in the last two years, by doing Laborejo, LisaloQt and other stuff. And I can't stand working with the huge mistakes I've made in the Laborejo-Core anymore. It makes the program slow and overly complex.
The new core which I finished more or less right now is a different thing. Half the lines of code, orders of magnitude more speed, and that is without any optimisation. I have still a palettte of tricks and tweaks I learned from Laborejo1 which can still be applied to the new code.
That said, of course I consider it not a mistake to start a new codebase. Once I get to the high level stuff again I can copy and paste many functions.
Whats in for you and how do you get it?
Right now the repository is private and unusable outside of my system (hardcoded paths etc.).
What you get is a faster and more robust program, which is also prettier, that runs with fewer dependencies and still doesn't get in your way.
The same as before, just better.
just now, behind me