Demo page of paper : “ Bujard Balthazar, Nika Jérôme, Obin Nicolas, Bevilacqua Frédéric, Learning Relationships between Separate Audio Tracks for Creative Applications, Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), 2025"

This web page presents the main results from the 2025 AIMC paper "Learning Relationships between Separate Audio Tracks for Creative Applications" (Bujard et al., 2025), along with some audio examples. The related code can be found here.

Model summary

Figure 1

Figure 1 – Model summary : audio input (guide) is encoded by the Perception module, the Decision learns symbolic relationships from a dataset of paired tracks and predicts a symbolic specification given the encoded input, the Action module uses the symbolic specification and a 'memory' audio file to generate the musical response with corpus-based concatenative synthesis.

The audio examples below are composed of : 'guide', 'memory' and output audios corresponding to, respectively, the input of the model, the memory used by Dicy2 and the mix 'guide' + model output. The 'Selected examples' are well selected audios that present good musical properties (harmonicity, tonality, variations). Then, the "Examples illustrating future work directions" present all 9 configurations (window_size X alphabet size) for some examples that enable some discussion regarding the main results and future works of the paper. Each example is completed with a comment summarizing the results of highlighted examples.

Readers and listeners are invited to read the corresponding paper before delving into the following examples, specifically Sections 6 (Results) and 7 (Discussion).

The main results and conclusions are summarized in the following :

- The primary factor influencing the model's performance was found to be the alphabet size

- Increased alphabet size is associated with enhanced diversity...

- But, modeling the relationships between tracks becomes more arduous.

- The window size did not exhibit clear patterns, but MoisesDB (containing western pop music, i.e. pulsed music) favoured window sizes close to its pulsation (bpm).